Tool creation
CWL command line tools can be created easily using s4n. The simplest approach is to just add the s4n create as prefix to the command.
A command line tool consists of a baseCommand which is usually any kind of executable which accepts inputs and writes outputs. In- and Outputs can be of multiple kinds of value type but Files are generally the most used kind. The baseCommand can be a single term like echo or a list like [python, script.py]. All of this is handle by SciWIn Client.
!!! note
The following examples assume that they are being executed in a git repository with clean status. If there is no repository yet, use s4n init to create an environment.
Wrapping echo
Section titled “Wrapping echo”A common example is to wrap the echo command for its simplicity. To create the tool echo "Hello World" is prefixed with s4n create.
s4n create echo "Hello World"#!/usr/bin/env cwl-runner
cwlVersion: v1.2class: CommandLineTool
inputs:- id: hello_world type: string default: Hello World inputBinding: position: 0
outputs: []baseCommand: echoThe baseCommand was correctly determined as echo and an input slot was created named with the value of the input as slug hello_world. This could be renamed by editing the file, but for now we leave it as is. Currrently the tool is not producing any outputs. Assuming we want to create a file using the echo command which is a common use case a redirection > operator can be used. To not redirect the s4n output it needs to be shielded by a backslash \>.
Assuming the following yaml file needs to be created
message: "Hello World"The usual command would be echo 'message: "Hello World"' > hello.yaml. To create the command line tool the command will be
s4n create --name echo2 echo 'message: "Hello World"' \> hello.yaml#!/usr/bin/env cwl-runner
cwlVersion: v1.2class: CommandLineTool
inputs:- id: message_hello_world type: string default: message: Hello World inputBinding: position: 0
outputs:- id: hello type: File outputBinding: glob: hello.yamlstdout: hello.yaml
baseCommand: echoThe stdout part will tell the tool to redirect output to the file called hello.yaml. Noticed the --name option? This is used to specify the file name of the to be created tool.
Wrapping a python script
Section titled “Wrapping a python script”A common usecase is to wrap a script in an interpreted language like python or R. Wrapping a python script follows the same principles like shown in the previous example where the echo command was wrapped.
s4n create --name echo_python python echo.py --message "SciWIn rocks!" --output-file out.txtimport argparse;
parser = argparse.ArgumentParser(description='Echo your input')parser.add_argument('--message', help='Message to echo', required=True)parser.add_argument('--output-file', help='File to save the message', required=True)
args = parser.parse_args()
with open(args.output_file, 'w') as f: f.write(args.message) print(args.message)#!/usr/bin/env cwl-runner
cwlVersion: v1.2class: CommandLineTool
requirements:- class: InitialWorkDirRequirement listing: - entryname: echo.py entry: $include: '../../echo.py' - class: InlineJavascriptRequirement
inputs:- id: message type: string default: SciWIn rocks! inputBinding: prefix: '--message'- id: outputfile type: string default: out.txt inputBinding: prefix: '--output-file'
outputs:- id: out type: File outputBinding: glob: $(inputs.outputfile)
baseCommand:- python- echo.pyAs shown in echo_python.cwl the outputBinding for the output file is set to $(inputs.outputfile) and therefore automatically gets the name given in the input. s4n also automatically detects the usage of python and adds the used script as an InitialWorkDirRequirement which makes the script available for the execution engine.
Wrapping a long running script
Section titled “Wrapping a long running script”Sometimes it is neccessary to run a highly complicated script on a remote environment because it would take to long on a simple machine. But how to get the CWL file than? In the example python file the script will sleep for 1 minute and then writes a file. One could use the s4n create command as shown above and just wait 60 seconds. But what if the calculation takes a week? This is possible for example in quantum chemical calculations like DFT.
There is the --no-run flag which tells s4n to not run the script. However this will not create an output and therefore can not detect any output files.
s4n create --no-run python sleep.pyfrom time import sleep
sleep(60)
with open('sleep.txt', 'w') as f: f.write('I slept for 60 seconds')#!/usr/bin/env cwl-runner
cwlVersion: v1.2class: CommandLineTool
requirements:- class: InitialWorkDirRequirement listing: - entryname: sleep.py entry: $include: '../../sleep.py'
inputs: []outputs: []baseCommand:- python- sleep.pyFor this cases there is the possibility to specify outputs via the commandline using the -o or --outputs argument which tells the parser to add a output slot.
s4n create --name sleep2 --no-run -o sleep.txt python sleep.py#!/usr/bin/env cwl-runner
cwlVersion: v1.2class: CommandLineTool
requirements:- class: InitialWorkDirRequirement listing: - entryname: sleep.py entry: $include: '../../sleep.py'
inputs: []outputs:- id: sleep type: File outputBinding: glob: sleep.txt
baseCommand:- python- sleep.pyThis CWL file can then be executed remotely by using any runner e.g. cwltool and will write the sleep.txt file after 60 seconds.
Implicit inputs - hardcoded files
Section titled “Implicit inputs - hardcoded files”Like shown in the above example there is also the possibility to specify inputs explictly. This is needed e.g. if the scripts loads a hardcoded file like in the following example.
s4n create -i file.txt -o out.txt python load.pywith open('file.txt', 'r') as file: data = file.read() with open('out.txt', 'w') as out: out.write(data)#!/usr/bin/env cwl-runner
cwlVersion: v1.2class: CommandLineTool
requirements:- class: InitialWorkDirRequirement listing: - entryname: file.txt entry: $include: '../../file.txt' - entryname: load.py entry: $include: '../../load.py'
inputs: []outputs:- id: out type: File outputBinding: glob: out.txt
baseCommand:- python- load.pyPiping
Section titled “Piping”Using the pipe operator | is a common usecase when using the commandline. Let’s assume the first 5 lines of a file are needed e.g cat speakers.csv | head -n 5 > speakers_5.csv
s4n create cat speakers.csv \| head -n 5 \> speakers_5.csv #!/usr/bin/env cwl-runner
cwlVersion: v1.2 class: CommandLineTool
requirements: - class: ShellCommandRequirement
inputs: - id: speakers_csv type: File default: class: File location: '../../speakers.csv' inputBinding: position: 0
outputs: - id: speakers_5 type: File outputBinding: glob: speakers_5.csv
baseCommand: cat arguments: - position: 1 valueFrom: '|' shellQuote: false - position: 1 valueFrom: head - position: 2 valueFrom: '-n' - position: 3 valueFrom: '5' - position: 4 valueFrom: '>' - position: 5 valueFrom: speakers_5.csvPulling containers
Section titled “Pulling containers”For full reproducibility it is recommended to use containers e.g. docker as requirement inside of the CWL files. Adding an existing container image is quite easy. The s4n create command needs to be called using -c or --container-image argument. For testing a python script using pandas is used together with the pandas/pandas container.
s4n create -c pandas/pandas:pip-all python calculation.py --population population.csv --speakers speakers_revised.csvimport pandas as pdimport argparse
parser = argparse.ArgumentParser(prog="python calculation.py", description="Calculates the percentage of speakers for each language")parser.add_argument("-p", "--population", required=True, help="Path to the population.csv File")parser.add_argument("-s", "--speakers", required=True, help="Path to the speakers.csv File")
args = parser.parse_args()
df = pd.read_csv(args.population)sum = df["population"].sum()
print(f"Total population: {sum}")
df = pd.read_csv(args.speakers)df["percentage"] = df["speakers"] / sum * 100
df.to_csv("results.csv")print(df.head(10))#!/usr/bin/env cwl-runner
cwlVersion: v1.2class: CommandLineTool
requirements:- class: InitialWorkDirRequirement listing: - entryname: calculation.py entry: $include: '../../calculation.py'- class: DockerRequirement dockerPull: pandas/pandas:pip-all
inputs:- id: population type: File default: class: File location: '../../population.csv' inputBinding: prefix: '--population'- id: speakers type: File default: class: File location: '../../speakers_revised.csv' inputBinding: prefix: '--speakers'
outputs:- id: results type: File outputBinding: glob: results.csv
baseCommand:- python- calculation.pyWhen the tool is executed by a runner supporting containerization e.g. cwltool it is using the pandas/pandas:pip-all container to run the script in a reproducible environment.
Building custom containers
Section titled “Building custom containers”Using a complex research environments a custom container is may needed. The same example from above will be executed in a container built from a Dockerfile.
This can be achieved by using the -c argument with a path to a Dockerfile. A tag can be specified by using -t.
s4n create -c Dockerfile -t my-docker python calculation.py --population population.csv --speakers speakers_revised.csv#!/usr/bin/env cwl-runner
cwlVersion: v1.2class: CommandLineTool
requirements:- class: InitialWorkDirRequirement listing: - entryname: calculation.py entry: $include: '../../calculation.py'- class: DockerRequirement dockerFile: $include: '../../Dockerfile' dockerImageId: my-docker
inputs:- id: population type: File default: class: File location: '../../population.csv' inputBinding: prefix: '--population'- id: speakers type: File default: class: File location: '../../speakers_revised.csv' inputBinding: prefix: '--speakers'
outputs:- id: results type: File outputBinding: glob: results.csv
baseCommand:- python- calculation.pyFROM pythonRUN pip install pandasA runner will check whether a container with the specified image is existent and build it using the Dockerfile otherwise.