Bash – Quoting in ssh $host $FOO and ssh $host “sudo su user -c $FOO” type constructs


I often end up issuing complex commands over ssh; these commands involve piping to awk or perl one-lines, and as a result contain single quotes and $'s. I have neither been able to figure out a hard and fast rule to do the quoting properly, nor found a good reference for it. For instance, consider the following:

# what I'd run locally:
CMD='pgrep -fl java | grep -i datanode | awk '{print $1}'
# this works with ssh $host "$CMD":
CMD='pgrep -fl java | grep -i datanode | awk '"'"'{print $1}'"'"

(Note the extra quotes in the awk statement.)

But how do I get this to work with, e.g. ssh $host "sudo su user -c '$CMD'"? Is there a general recipe for managing quotes in such scenarios?..

Best Answer

Dealing with multiple levels of quoting (really, multiple levels of parsing/interpretation) can get complicated. It helps to keep a few things in mind:

  • Each “level of quoting” can potentially involve a different language.
  • Quoting rules vary by language.
  • When dealing with more than one or two nested levels, it is usually easiest to work “from the bottom, up” (i.e. innermost to outermost).

Levels of Quoting

Let us look at your example commands.

pgrep -fl java | grep -i datanode | awk '{print $1}'

Your first example command (above) uses four languages: your shell, the regex in pgrep, the regex in grep (which might be different from the regex language in pgrep), and awk. There are two levels of interpretation involved: the shell and one level after the shell for each of the involved commands. There is only one explicit level of quoting (shell quoting into awk).

ssh host …

Next you added a level of ssh on top. This is effectively another shell level: ssh does not interpret the command itself, it hands it to a shell on the remote end (via (e.g.) sh -c …) and that shell interprets the string.

ssh host "sudo su user -c …"

Then you asked about adding another shell level in the middle by using su (via sudo, which does not interpret its command arguments, so we can ignore it). At this point, you have three levels of nesting going on (awk → shell, shell → shell (ssh), shell → shell (su user -c), so I advise using the “bottom, up” approach. I will assume that your shells are Bourne compatible (e.g. sh, ash, dash, ksh, bash, zsh, etc.). Some other kind of shell (fish, rc, etc.) might require different syntax, but the method still applies.

Bottom, Up

  1. Formulate the string you want to represent at the innermost level.
  2. Select a quoting mechanism from the quoting repertoire of the next-highest language.
  3. Quote the desired string according to your selected quoting mechanism.
    • There are often many variations how to apply which quoting mechanism. Doing it by hand is usually a matter of practice and experience. When doing it programatically, it is usually best to pick the easiest to get right (usually the “most literal” (fewest escapes)).
  4. Optionally, use the resulting quoted string with additional code.
  5. If you have not yet reached your desired level of quoting/interpretation, take the resulting quoted string (plus any added code) and use it as the starting string in step 2.

Quoting Semantics Vary

The thing to keep in mind here is that each language (quoting level) may give slightly different semantics (or even drastically different semantics) to the same quoting character.

Most languages have a “literal” quoting mechanism, but they vary in exactly how literal they are. The single quote of Bourne-like shells is actually literal (which means you can not use it to quote a single quote character itself). Other languages (Perl, Ruby) are less literal in that they interpret some backslash sequences inside single quoted regions non-literally (specifically, \\ and \' result in \ and ', but other backslash sequences are actually literal).

You will have to read the documentation for each of your languages to understand its quoting rules and the overall syntax.

Your Example

The innermost level of your example is an awk program.

{print $1}

You are going to embed this in a shell command line:

pgrep -fl java | grep -i datanode | awk …

We need to protect (at a minimum) the space and the $ in the awk program. The obvious choice is to use single quote in the shell around the whole program.

  • '{print $1}'

There are other choices though:

  • {print\ \$1} directly escape the space and $
  • {print' $'1} single quote only the space and $
  • "{print \$1}" double quote the whole and escape the $
  • {print" $"1} double quote only the space and $
    This may be bending the rules a bit (unescaped $ at the end of a double quoted string is literal), but it seems to work in most shells.

If the program used a comma between the open and close curly braces we would also need to quote or escape either the comma or the curly braces to avoid “brace expansion” in some shells.

We pick '{print $1}' and embed it in the rest of the shell “code”:

pgrep -fl java | grep -i datanode | awk '{print $1}'

Next, you wanted to run this via su and sudo.

sudo su user -c …

su user -c … is just like some-shell -c … (except running under some other UID), so su just adds another shell level. sudo does not interpret its arguments, so it does not add any quoting levels.

We need another shell level for our command string. We can pick single quoting again, but we have to give special handling to the existing single quotes. The usual way looks like this:

'pgrep -fl java | grep -i datanode | awk '\''{print $1}'\'

There are four strings here that the shell will interpret and concatenate: the first single quoted string (pgrep … awk), an escaped single quote, the single-quoted awk program, another escaped single quote.

There are, of course many alternatives:

  • pgrep\ -fl\ java\ \|\ grep\ -i\ datanode\ \|\ awk\ \'{print\ \$1} escape everything important
  • pgrep\ -fl\ java\|grep\ -i\ datanode\|awk\ \'{print\$1} the same, but without superfluous whitespace (even in the awk program!)
  • "pgrep -fl java | grep -i datanode | awk '{print \$1}'" double quote the whole thing, escape the $
  • 'pgrep -fl java | grep -i datanode | awk '"'"'{print \$1}'"'" your variation; a bit longer than the usual way due to using double quotes (two characters) instead of escapes (one character)

Using different quoting in the first level allows for other variations at this level:

  • 'pgrep -fl java | grep -i datanode | awk "{print \$1}"'
  • 'pgrep -fl java | grep -i datanode | awk {print\ \$1}'

Embedding the first variation in the sudo/*su* command line give this:

sudo su user -c 'pgrep -fl java | grep -i datanode | awk '\''{print $1}'\'

You could use the same string in any other single shell level contexts (e.g. ssh host …).

Next, you added a level of ssh on top. This is effectively another shell level: ssh does not interpret the command itself, but it hands it to a shell on the remote end (via (e.g.) sh -c …) and that shell interprets the string.

ssh host …

The process is the same: take the string, pick a quoting method, use it, embed it.

Using single quotes again:

'sudo su user -c '\''pgrep -fl java | grep -i datanode | awk '\'\\\'\''{print $1}'\'\\\'

Now there are eleven strings that are interpreted and concatenated: 'sudo su user -c ', escaped single quote, 'pgrep … awk ', escaped single quote, escaped backslash, two escaped single quotes, the single quoted awk program, an escaped single quote, an escaped backslash, and a final escaped single quote.

The final form looks like this:

ssh host 'sudo su user -c '\''pgrep -fl java | grep -i datanode | awk '\'\\\'\''{print $1}'\'\\\'

This is a bit unwieldy to type by hand, but the literal nature of the shell’s single quoting makes it easy to automate a slight variation:


sq() { # single quote for Bourne shell evaluation
    # Change ' to '\'' and wrap in single quotes.
    # If original starts/ends with a single quote, creates useless
    # (but harmless) '' at beginning/end of result.
    printf '%s\n' "$*" | sed -e "s/'/'\\\\''/g" -e 1s/^/\'/ -e \$s/\$/\'/

# Some shells (ksh, bash, zsh) can do something similar with %q, but
# the result may not be compatible with other shells (ksh uses $'...',
# but dash does not recognize it).
# sq() { printf %q "$*"; }

ap='{print $1}'
s1="pgrep -fl java | grep -i datanode | awk $(sq "$ap")"
s2="sudo su user -c $(sq "$s1")"

ssh host "$(sq "$s2")"