Python Tutorial: subprocesses module – 2020 – BogoToBogo

Thread module

Here are my additional Python tutorials on:

  1. Computer vision with OpenCV 3
  2. Machine learning with scikit-learn Image
  3. and video processing with FFmpeg

A running program is called a process. Each process has its own system state, which includes memory, lists of open files, a program counter that keeps track of the instruction being executed, and a call stack used to contain the local variables of the functions.

Typically, a process

executes instructions one after another in a single control flow sequence, which is sometimes referred to as the main thread of the process. At any given time, the program is only doing one thing.

A program can create new processes using library functions such as those found in the operating system or thread modules such as os.fork(), subprocess. Popen(), etc. However, these processes, known as threads, run as completely separate entities, each with its own private system state and main execution thread.

Because a thread is independent, it runs concurrently with the original process. That is, the process that created the thread can continue to work on other things while the thread carries out its own work behind the scenes.

The thread module allows us

to:

  1. generate new processes
  2. connect to your

  3. input/output/error pipelines
  4. Get your return codes

It offers a higher-level interface to some of the other modules available, and is intended to replace the following functions:

os.system() os.spawn*()

  1. os.popen
  2. *() popen2.*()

  3. commands.*()

We cannot use UNIX commands in our Python script as if they were Python code. For example, echo name is causing a syntax error because echo is not a built-in Python instruction or function. So, in the Python script, we’re using the print name instead.

To execute UNIX commands we need to create a thread that executes the command. The recommended approach to invoking threads is to use convenience functions for all the use cases they can handle. Or we can use Popen’s underlying interface that can be used directly.

The easiest way to run the UNIX command is to use os.system().

>>> import os >>> os.system(‘echo $HOME’) /user/khong 0 >>> # or we can use >>> os.system(‘echo %s’ %’$HOME’) /user/khong 0

As expected, we got $HOME as stdout (to a terminal). In addition, we got a return value of 0, which is the result of running this command, which means that there was no error in the execution.

os.system(‘command with args’) passes the command and arguments to the shell of our system. Using this, you can run multiple commands at once and set up pipes and input/output redirects. :

os.system(‘command_1 < input_file | command_2 > output_file’) If we

run the code above os.system(‘echo $HOME’) in the Python IDLE, we only see the 0 because the stdout means a terminal. To see the output of the command we must redirect it to a file, and read from it:

>>> import os >>> os.system(‘echo $HOME > outfile’) 0 >>> f = open(‘outfile’,’r’) >>> f.read() ‘/user/khong\n’

Open a pipeline command to or from. The return value is an open file object connected to the pipeline, which can be read or written depending on whether the mode is ‘r‘ (default) or ‘w’. The bufsize argument has the same meaning as the argument for the built-in open() function. The output state of the command (encoded in the format specified for wait()) is available as the return value of the close() method of the file object, except that when the output state is zero (error-free termination), None is returned.

>>> import os >>> stream = os.popen(‘echo $HOME’) >>> stream.read() ‘/user/khong\n’

os.popen() does the same thing as os.system, except that it gives us a file-like stream object that we can use to access the standard input/output for that process. There are 3 other popen variants that handle I/O slightly differently.

If we pass everything as a string, then

our command is passed to the shell; if we pass them as a list, then we don’t have to worry about escaping anything

.

However, it has been deprecated since version 2.6: This feature is deprecated. Use the thread module. docs.python.org

This is basically like the Popen class and takes all the same arguments, but just waits until the command completes and gives us the return code

. subprocess.call(args, *, stdin=None, stdout=None, stderr=None, shell=False)

Run the command described by args. Wait for the command to complete, and then return the returncode attribute.

>>> import os >>> os.chdir(‘/’) >>> import subprocess >>> subprocess.call([‘ls’,’-l’]) total 181 drwxr-xr-x 2 root root 4096 Mar 3 2012 bin drwxr-xr-x 4 root root 1024 Oct 26 2012 boot …

Command-line arguments are passed as a list of strings, avoiding the need to escape quotation marks or other special characters that could be interpreted by the shell.

>>> import subprocess >>> subprocess.call(‘echo $HOME’) Traceback (last most recent call): … OSError: [Errno 2] No such file or directory >>> >>> subprocess.call(‘echo $HOME’, shell=True) /user/khong 0 If you set the shell

argument to true, the thread will generate an intermediate shell process and prompt you to run the command. In other words, using an intermediate shell means that variables, glob patterns, and other special shell features in the command string are processed before the command is executed. Here, in the example, $HOME was processed before the echo command. Actually, this is the case with the command with shell expansion while the ls -l command is considered as a simple command.

Here is an example code (PyGoogle/FFMpeg/iframe_extract.py). Download YouTube videos and then extract I-frames to subfolder

: ”’ iframe_extract.py – download video and extract i-frame ffmpeg Usage: (ex) python iframe_extract.py -u https://www.youtube.com/watch?v=dP15zlyra3c This code does two things: 1. Download using youtube-dl cmd=[‘youtube-dl’, ‘-f’, videoSize, ‘-k’, ‘-o’, video_out, download_url] 2. Extract i-frames via ffmpeg cmd = [ffmpeg,’-i’, inFile,’-f’, ‘image2′,’-vf’, “select=’eq(pict_type,PICT_TYPE_I)'”,’-vsync’,’vfr’, imgFilenames] ”’ from __future__ import unicode_literals import youtube_dl import sys import so import subprocess import argparse import glob if sys.platform == “Windows”: FFMPEG_BIN = “ffmpeg.exe” MOVE = “move” MKDIR = “mkdir” else: FFMPEG_BIN = “ffmpeg” MOVE = “mv” MKDIR = “md” def iframe_extract(inFile): # ffmpeg -i inFile -f image2 -vf \ # “select=’eq(pict_type,,PICT_TYPE_I)'” -vsync vfr oString%03d.png # infile : video filename # (ex) ‘FoxSnowDive-Yellowstone-BBCTwo.mp4’ imgPrefix = inFile.split(‘.’)[0] # imgPrefix : image file # start extracting i-frames home = os.path.expanduser(“~”) ffmpeg = home + ‘/bin/ffmpeg’ imgFilenames = imgPrefix + ‘%03d.png’ cmd = [ffmpeg,’-i’, inFile,’-f’, ‘image2′,’-vf’, “select=’eq(pict_type,PICT_TYPE_I)'”, ‘-vsync’, ‘vfr’, imgFilenames] # create iframes print “creating iframes ….” subprocess.call(cmd) # Move the extracted iframes to a subfolder # imgPrefix is used as subfolder name that stores images iframe cmd = ‘mkdir -p ‘ + imgPrefix os.system(cmd) print “make subdirectoy”, cmd mvcmd = ‘mv ‘ + imgPrefix + ‘*.png ‘ + imgPrefix print “moving images to subdirectoy”, mvcmd os.system(mvcmd) def get_info_and_download(download_url): # Get metainfo of the video and then download using youtube-dl ydl_opts = {} # get meta information of the video with youtube_dl.YoutubeDL(ydl_opts) as ydl: meta = ydl.extract_info(download_url, download=False) # rename the file # remove special characters from the file name print(‘meta

Lorem ipsum dolor sit amet...

=%s’ %meta[‘title’]) out = ”.join(c for c in meta[‘title’] if c.isalnum() or c ==’-‘ or c ==’_’ ) print(‘out=%s’ %out) extension = meta[‘ext’] video_out = out + ‘.’ + extension print(‘video_out=%s’ %video_out) videoSize = ‘bestvideo[height<=540]+bestaudio/best[height<=540]’ cmd = [‘youtube-dl’, ‘-f’, videoSize, ‘-k’, ‘-o’, video_out, download_url] print(‘cmd=%s’ %cmd) # download the video subprocess.call(cmd) # Sometimes the output file has format code in the name, such as ‘out.f248.webm’ # So, in this case, we want to rename it ‘out.webm’ found=False extension_list= [‘mkv’, ‘mp4’, ‘webm’] for e in extension_list: glob_str = ‘*.’ + e for f in glob.glob(glob_str): if out in f: if os.path.isfile(f): video_out = f found = True break if found: break # call iframe-extraction : ffmpeg print(‘before iframe_extract() video_out=%s’ %video_out) iframe_extract(video_out) return meta def check_arg(args=None): # Command Line Options # Currently, Only the url parser = argparse option is used. ArgumentParser(description=’download video’) parser.add_argument(‘-u’, ‘-url’, help=’download url’, required=’True’) parser.add_argument(‘-i’, ‘-infile’, help=’input to iframe extract’) parser.add_argument(‘-o’, ‘-outfile’, help=’output name for iframe image’) results = parser.parse_args(args) return (results.url, results.infile, results.outfile) # Example of use: # Syntax: python iframe_extract.py -u url # (ex) python iframe_extract.py -u https://www.youtube.com/watch?v=dP15zlyra3c if __name__ == ‘__main__’: u,i,o = check_arg(sys.argv[1:]) meta = get_info_and_download(u) Your browser does not support the video tag. subprocess.check_call(args, *, stdin=None, stdout=None, stderr=None, shell=False)

The check_call() function works as call() except that the exit code is checked, and if it indicates that an error occurred, a CalledProcessError exception is thrown.

>>> import thread >>> subprocess.check_call([‘false’]) Crawl (last most recent call): … thread. CalledProcessError: Command ‘[‘false’]’ returned a non-zero output state 1 subprocess.check_output(args, *, stdin=None, stderr=None, shell=False, universal_newlines=False) The standard input and output

channels for the call() initiated process are bound to the parent input and output. That means the calling program can’t capture the output of the command. To capture the output, we can use check_output() for further processing.

>>> import subprocess >>> output = subprocess.check_output([‘ls’,’-l’]) >>> print output total 181 drwxr-xr-x 2 root root 4096 Mar 3 2012 bin drwxr-xr-x 4 root root 1024 Oct 26 2012 boot … >>> output = subprocess.check_output([‘echo’,’$HOME’], shell=True) >>> print output /user/khong

This feature was added in Python 2.7.

The creation and management of processes underlying this module is handled by the Popen class. It offers a lot of flexibility for developers to handle less common cases not covered by convenience features.

thread. Popen() runs a child program in a new process. On Unix, the class uses behavior similar to os.execvp() to run the child program. On Windows, the class uses the Windows CreateProcess() function.

Class thread. Popen(args, bufsize=0, executable=None, stdin=None, stdout=None, stderr=None, preexec_fn=None, close_fds=False, shell=False, cwd=None, env=None, universal_newlines=False, startupinfo=None, creationflags=0)

  1. args: must be a sequence of program arguments or a single string. By default, the program to run is the first element in args if args is a stream. If args is a string, the interpretation depends on the platform. It is recommended to pass args as a sequence.
  2. shell

  3. :shell argument (default to False) specifies whether to use the shell as the program to execute. If shell is True, it is recommended to pass args as a string rather than as a stream. On Unix with shell=True, the default shell is /bin/sh.
    1. If args is a string, the string specifies the command to execute through the shell. This means that the string must be formatted exactly as it would have when typed to the shell prompt. This includes, for example, cite or backslash that escapes file names with spaces in them.
    2. If args is a sequence, the first element specifies the command string, and any additional elements will be treated as additional arguments to the shell itself. That is, Popen makes the equivalent of: Popen([‘/bin/sh’, ‘-c’, args[0], args[1], …])
  4. bufsize: if given, has the same meaning as the argument corresponding to the built-in open() function: 0 means unbuffered 1 means buffered
      line any other positive value means using a buffer

    1. of (approximately) that size
    2. A negative bufsize means

    3. using the system default, which usually means fully buffered
    4. The default value for bufsize is 0 (unbuffered)

  5. executable: Specifies a replacement program to run. It is rarely necessary.
  6. stdin, stdout, and stderr: Specify the standard
    1. input IDs of the executed program
    2. , standard output, and standard error file, respectively.

    3. Valid values are PIPE, an existing file descriptor (a positive integer), an existing file object, and None
    4. .

    5. PIPE indicates that a new pipeline must be created for the child.
    6. With the default setting of None, no redirection will occur; The child’s file IDs will be inherited from the parent.
    7. In addition, stderr

    8. can be STDOUT, indicating that stderr data from the child process must be captured in the same file handle as for stdout.
  7. preexec_fn: Set to an invocable object, this object will be called in the child process just before the child element is executed. (Unix only)
  8. close_fds: True, all file descriptors except 0, 1, and 2 will close before running the child process. (Unix only). Or, in Windows, if close_fds is true, the child process will not inherit identifiers. Note that in Windows, we cannot set close_fds to true and also redirect standard identifiers by setting stdin, stdout, or stderr.
  9. cwd: none The child’s current directory will be changed to CWD before running. Note that this directory is not considered when searching for the executable, so we cannot specify the path of the program in relation to cwd.
  10. env: is none, it must be a mapping that defines the environment variables for the new process; these are used instead of inheriting the environment from the current process, which is the default behavior
  11. .

  12. universal_newlines: True, the stdout and stderr file objects open as text files in universal newline mode. Lines can be terminated by either the Unix end-of-line convention, the old Macintosh convention, or the Windows convention. All these external representations are seen as ‘n’ by the Python program.
  13. startupinfo: This will be a STARTUPINFO object, which is passed to the underlying CreateProcess function.
  14. creationflags: can be CREATE_NEW_CONSOLE or CREATE_NEW_PROCESS_GROUP. (Windows only)

This is intended to be a replacement for os.popen, but it’s more complicated. For example, we use

thread. Popen(“echo Hello World”, stdout=subprocess. PIPE, shell=True).stdout.read() instead of os.popen(“echo Hello World”).read()

But it is complete and has all the options in a unified class instead of different os.popen functions.

>>> import thread >>> proc = thread. Popen([‘echo’, ‘”Hello world!”‘], … stdout=thread. PIPE) >>> stddata = proc.communicate() >>> stddata (‘”Hello world!”\n’, None

) Note that the communicate() method returns a tuple (stdoutdata, stderrdata) : (‘”Hello world!”\n’ ,None). If we do not include stdout=subprocess. PIPE or stderr=thread. PIPE on Popen’s call, we’ll just get None back.

Popen.communicate(input=None) Popen.communicate() interacts with the process: Send data to stdin. Read the stdout and stderr data, until the end of the file is reached. Wait for the process to finish. The optional input argument must be a string to be sent to the child process, or None, if no data must be sent to the child.

So, actually, we could have done the following:

>>> import thread >>> proc = thread. Popen([‘echo’, ‘”Hello world!”‘], … stdout=thread. PIPE) >>> (stdoutdata, stderrdata) = proc.communicate() >>> stdoutdata ‘”Hello world!”\n’

or we can explicitly specify which one we want from proc.communicate():

>>> import subprocess >>> proc = subprocess. Popen([‘echo’, ‘”Hello world!”‘], … stdout=thread. PIPE) >>> stdoutdata = proc.communicate()[0] >>> stdoutdata ‘”Hello world!”\n’

The simplest code for the previous example might be to send the stream directly to the console:

>>> import subprocess >>> proc = subprocess. Popen([‘echo’, ‘”Hello world!”‘], … stdout=thread. PIPE) >>> proc.communicate()[0] ‘”Hello world!”\n’

The following code is to test the

stdout and stderr behavior: # std_test.py import sys sys.stdout.write(‘Test message to stdout\n’) sys.stderr.write(‘Test message to stderr\n’)

If we execute it:

>>> proc = thread. Popen([‘python’, ‘std_test.py’], … stdout=thread. PIPE) >>> Test message for stderr >>> proc.communicate() >>> Note that the message to

stderr is displayed as it is generated, but the message to stdout is read through the pipeline. This is because we only set up one pipe for stdout.

So, let’s have both stdout and stderr accessed from Python:

>>> proc = thread. Popen([‘python’, ‘std_test.py’], … stdout=thread. PIPE… stderr=thread. PIPE) >>> proc.communicate() (Test message to stdout\n’, Test message to stderr\n’)

The communicate() method only reads data from stdout and stderr, until the end of the file is reached. Then, after all messages have been printed, if we call communicate() again we get an error: >>> proc.communicate() Traceback (last most recent call

): … ValueError: I/O operation on a closed file If we want messages to stderr to be

piped to stderr, we do: stderr=subprocess. STDOUT.

>>> proc = thread. Popen([‘python’, ‘std_test.py’], … stdout=thread. PIPE… stderr=thread. STDOUT) >>> proc.communicate() (‘Test message to stdout\r\nTest message to stderr\r\n’, None)

As we see in the output, we do not have stderr because it has been redirected to stderr

. Writing in a

process can be done in a very similar way. If we want to send data to the stdin of the process, we need to create the Popen object with stdin=subprocess. PIGEONHOLE.

To

test it we will write another program (write_to_stdin.py) that simply prints Received: and then repeats

the message we sent you: # write_to_stdin.py import sys input = sys.stdin.read() sys.stdout.write(‘Received: %s’%input)

To send a message to stdin, we pass the string we want to send as input argument to communicate():

>>> proc = thread. Popen([‘python’, ‘write_to_stdin.py’], stdin=subprocess. PIPE) >>> proc.communicate(‘Hello?’) Received: Hello? (None, None)

Note that the message created in the write_to_stdin.py process was printed in stdout, and then the return value (None, None) was printed. That’s because no pipes were laid for stdout or stderr.

Here is another output after we specify stdout=subprocess. PIPE and stderr=thread. PIPE as before to configure the pipeline.

>>> proc = thread. Popen([‘python’, ‘write_to_stdin.py’], … stdin=thread. PIPE… stdout=thread. PIPE… stderr=thread. PIPE) >>> proc.communicate(‘Hello?’) (‘Received: Hello?’, ”) >>> p1 = subprocess. Popen([‘df’,’-h’], stdout=subprocess. PIPE) >>> p2 = thread. Popen([‘grep’, ‘sda1’], stdin=p1.stdout, stdout=subprocess. PIPE) >>> p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits >>> output = p2.communicate()[0] >>> output ‘/dev/sda1 19G 9.2G 8.3G 53% /\n’

The p1.stdout.close() call after starting p2 is important for p1 to receive a SIGPIPE if p2 exits before p1.

Here’s another example for a pipelined command. The code gets the window ID for the currently active window. The command looks like this:

xprop -root | awk ‘/_NET_ACTIVE_WINDOW\(WINDOW\)/{print $NF}’

Python code:

# This code executes the following awk to get a window handle for the currently active X11 window # xprop -root | awk ‘/_NET_ACTIVE_WINDOW\(WINDOW\)/{print $NF}’ import subprocess def py_xwininfo(): winId = getCurrentWinId() print ‘winId = %s’ %winId def getCurrentWinId(): cmd_1 = [‘xprop’, ‘-root’] cmd_2 = [‘awk’, ‘/_NET_ACTIVE_WINDOW\(WINDOW\)/{print $NF}’] p1 = thread. Popen(cmd_1, stdout = thread. PIPE) p2 = thread. Popen(cmd_2, stdin = p1.stdout, stdout=thread. PIPE) id = p2.communicate()[0] return id if __name__ == ‘__main__’: py_xwininfo()

Output:

winId = 0x3c02035 Avoid shell=True

by all means. shell=

True means executing the code through the shell. In other words, executing programs through the shell means that all user input passed to the program is interpreted according to the syntax and semantic rules of the invoked shell. At best, this only causes inconvenience to the user, because the user has to obey these rules. For example, paths that contain special shell characters, such as quotation marks or blank spaces, must be escaped. At worst, it causes security leaks, because the user can run arbitrary programs.

shell=True is sometimes convenient for making use of shell-specific features such as hyphenation or parameter expansion. However, if such a feature is required, use other modules (for example, os.path.expandvars() for parameter expansion or shlex for hyphenation). This means more work, but avoids other problems. – from the actual meaning of ‘shell=True’ on the thread.

If args is a string, the string specifies the command to execute through the shell. This means that the string must be formatted exactly as it would have when typed to the shell prompt. This includes, for example, citing or retaining escape file names with spaces in them:

>>> proc = thread. Popen(‘echo $HOME’, shell=True) >>> /user/khong

The string is in exactly the format that would be written at the shell prompt

: $ echo $Home

Therefore, the following would not work:

>>> proc = thread. Popen(‘echo $HOME’, shell=False) Traceback (last most recent call): … OSError: [Errno 2] No such file or directory

Follow would also not work:

>>> thread. Popen(‘echo “Hello world!”‘, shell=False) Traceback (last most recent call): … OSError: [Errno 2] No such file or directory

That’s because we’re still passing it as a string, Python assumes the whole string is the name of the program to run and there’s no program called echo”Hello world!” so it fails. Instead, we have to pass each argument separately.

Psutil. Popen(*args, **kwargs) is a more convenient interface for the stdlib thread. Popen().

“Starts a thread and treats it exactly as when the thread is used. Popen class but also provides all the properties and methods of psutil. Process class in a single interface”. – see http://code.google.com/p/psutil/wiki/Documentation

The following code executes python -c “print ‘hi, psutil'” on a thread:

>>> import psutil >>> import subprocess >>> proc = psutil. Popen([“/usr/bin/python”, “-c”, “print ‘hi, psuti'”], stdout=subprocess. PIPE) >>> proc <psutil. Popen(pid=4304, name=’python’) at 140431306151888> >>> proc.uids user(real=1000, effective=1000, saved=1000) >>> proc.username ‘khong’ >>> proc.communicate() (‘hi, psuti\n’, None) docs.python.org thread – Work with additional processes

  1. sub-process – working with the Python thread – Shells, processes, flows, pipelines, redirects and more