Factor Language Blog

Improved process launcher

Monday, November 12, 2007

The io.launcher vocabulary now supports passing environment variables to the child process.

Launching processes is a complex and inherently OS-specific task. For example, until version 1.5, Java had a bunch of methods in the Runtime class for this, each one taking a different set of arguments:

Process exec(String command)
Process exec(String[] cmdarray)
Process exec(String[] cmdarray, String[] envp)
Process exec(String[] cmdarray, String[] envp, File dir)
Process exec(String command, String[] envp)
Process exec(String command, String[] envp, File dir)

The fundamental limitation here was that it would always launch processes with stdin/stdout rebound to a new pipe; so there was no easy way to launch the process to read/write the JVM’s stdin/stdout file descriptors, you had to spawn two threads to copy streams. In 1.5, they added a new ProcessBuilder class, but it doesn’t seem to offer any new functionality except for merging stdout with stderr. It still always opens pipes.

The approach I decided to take in Factor is similar to the ProcessBuilder approach, but more lightweight. The run-process word takes one of the following types:

  • A string – in which case it is passed to the system shell as a single command line
  • A sequence of strings – which becomes the command line itself
  • An association – a “process descriptor”, containing keys from the below set.

The latter is the most general. The allowed keys are all symbols in the io.launcher vocabulary:

  • +command+ – the command to spawn, a string, or f
  • +arguments+ – the command to spawn, a sequence of strings, or f. Only one of +command+ and +arguments+ can be specified.
  • +detached+ – t or f. If t, run-process won’t wait for completion.
  • +environment+ - an assoc of additional environment variables to pass along.
  • +environment-mode+ - one of prepend-environment, append-environment, or replace-environment. This value controls the function used to combine the current environment with the value of +environment+ before passing it to the child process.

While run-process spawns a process which inherits the current one’s stdin/stdout, <process-stream> spawns a process reading and writing on a pipe.

Idiomatic usage of run-process uses make-assoc to build the assoc:

[
    { "ls" "/etc" } +arguments+ set
    H{ { "PATH" "/bin:/sbin" } } +environment+ set
] H{ } make-assoc run-process

I’m happy with the new interface. With two words it achieves more than Java’s process launcher API, which consists of a number of methods together with a class.

Not only is the interface simple but so is the implementation.

The run-process first converts strings and arrays into full-fledged descriptors, then calls run-process* which is an OS-specific hook.

In the implementation of this hook, the fact that any assoc can be treated as a dynamically scoped namespace really into play. The io.launcher implementation has a pair of words:

: default-descriptor
    H{
        { +command+ f }
        { +arguments+ f }
        { +detached+ f }
        { +environment+ H{ } }
        { +environment-mode+ append-environment }
    } ;

: with-descriptor ( desc quot -- )
    default-descriptor [ >r clone r> bind ] bind ; inline

Passing a descriptor and a quotation to with-descriptor calls it in a scope where the various +foo+ keys can be read, assuming their default values if they’re not explicitly set in the descriptor.

So, if we look at the Unix implementation for example,

M: unix-io run-process* ( desc -- )
    [
        +detached+ get [
            spawn-detached
        ] [
            spawn-process wait-for-process
        ] if
    ] with-descriptor ;

It calls various words, all of which simply use get to read a dynamically scoped variable. These are resolved in the namespace set up by with-descriptor. For example, there is a word

: get-environment ( -- env )
    +environment+ get
    +environment-mode+ get {
        { prepend-environment [ os-envs union ] }
        { append-environment [ os-envs swap union ] }
        { replace-environment [ ] }
    } case ;

It retrieves the environment set by the user, together with the environment of the current process, and composes them according to the function in +environment-mode+. There is no need to pass anything around on the stack; any word can call get-environment from inside a with-descriptor.

Notice that constructing a namespace with make-assoc, and then passing it to a word which binds to this namespace and builds some kind of object, is similar to the “builder design pattern” in OOP. However, it is simpler. Instead of defining a new data type with methods, we essentially just construct a namespace then run code in this namespace.