Why subprocess.call works with any string only if parameters are separated in an array?

If I prepare a command line like this:

cmd = "ffmpeg -i '" + param1 + "' -o '" + param2 + "'

And try to use it with subprocess.call or os.system it may work or not depending of which characters param1 or param2 contain. When it fails the exception raised is similar to this:UnicodeDecodeError 'ascii' codec can't decode byte 0xXX. But if I do this:

subprocess.call(["ffmpeg", "-i", param1, "-o", param2], ..., ...)

It works always.

Now I'm using the second method as it works always but I want to understand why the first method doesn't work.

Things I tried:

  • Using decode() and decode("utf8") in param1 and param2.
  • Using cmd.encode("ascii") when passing cmd to os.system or subprocess.call
  • unset LANGin terminal prior to run the script (somebody solved a similar problem doing that, so I tried it too)

More info:

This all started when I needed to query a mysql database for a path to a certain directory. The table uses UTF-8. At first I did not specify the encoding when establishing the connection to the database, this results in paths not found when they contain non ASCII characters. For example the word "Vídeos" was decoded as "V\XXXeos" where\XXXis the code of the "í" character. When I started to specify the encoding when connecting to the database the non English words looked correctly when printed but the UnicodeDecodeError started to raise when trying to use those strings in calls to os.system or subprocess.call if the command line is build by concatenating strings.

Python 2.7 Mythbuntu 32 bits

Interpreting a command string (intoargcandargv, i.e. a dynamically sized array of strings, as expected by C programs) is the job of the shell, and different shells have different rules about how to interpret a particular string.subprocess.calldoes not invoke subprocesses in a new subshell by default; instead, it creates a new process directly, and therefore does not know how to interpret your command, so you must pass an array directly.

You can makesubprocess.callexecute in a subshell by passingshell=True:

subprocess.call(cmd, shell=True)

Regarding encodings, yes, it is the responsibility of the program being called to interpret the string encoding of the arguments. If you've set UTF-8 everywhere, it should just work:

>>> import subprocess
>>> subprocess.call("echo 字", shell=True)
字
0
>>> subprocess.call(u"echo 字", shell=True)
字
0
>>> subprocess.call(["echo", "字"])
字
0
>>> subprocess.call([u"echo", u"字"])
字
0
What Others Are Reading