Skip to content

Advanced Example

A basic Workflow using SciWIn Client (s4n) can be created with the commands hereafter. This guide assumes the usage of unix based operating systems, however Windows should work, too. If not please open an issue.

For this advanced example the demo repository needs to be cloned

git clone https://github.com/fairagro/m4.4_sciwin_client_demo

Installation

GitHub Release

The latest Version of s4n can be installed using the following command:

curl --proto '=https' --tlsv1.2 -LsSf https://fairagro.github.io/m4.4_sciwin_client/get_s4n.sh | sh 

Specific Versions can be installed with the following command, by replacing the version tag with a version of choice.

curl --proto '=https' --tlsv1.2 -LsSf https://github.com/fairagro/m4.4_sciwin_client/releases/download/v0.6.0/s4n-installer.sh | sh

The Installation can be verified using s4n -V. SciWIn Client comes with a lot of commands. In this demo the init, tool, workflow and execute commands will be showcased.

 _____        _  _    _  _____         _____  _  _               _   
/  ___|      (_)| |  | ||_   _|       /  __ \| |(_)             | |   
\ `--.   ___  _ | |  | |  | |  _ __   | /  \/| | _   ___  _ __  | |_  
 `--. \ / __|| || |/\| |  | | | '_ \  | |    | || | / _ \| '_ \ | __|
/\__/ /| (__ | |\  /\  / _| |_| | | | | \__/\| || ||  __/| | | || |_  
\____/  \___||_| \/  \/  \___/|_| |_|  \____/|_||_| \___||_| |_| \__|

Client tool for Scientific Workflow Infrastructure (SciWIn)
Documentation: https://fairagro.github.io/m4.4_sciwin_client/

Version: 0.6.0

Usage: s4n <COMMAND>

Commands:
  init         Initializes project folder structure and repository
  tool         Provides commands to create and work with CWL CommandLineTools
  workflow     Provides commands to create and work with CWL Workflows
  annotate     Annotate CWL files
  execute      Execution of CWL Files locally or on remote servers [aliases: ex]
  sync         
  completions  Generate shell completions
  help         Print this message or the help of the given subcommand(s)

Options:
  -h, --help     Print help
  -V, --version  Print version

Demo Repository

The Demo Repository mainly contains two folders data and code. The result workflow will download election data, print election results as barplot, convert input data into a geojson file, and maps it onto the geojson data resulting in a choropleth graph. See the images for the final outputs.

result result

Creating the CommandLineTools

First of all, we start, by creating a new s4n project.

s4n init

CWL mainly describes processes in CommandLineTools which later can be connected into Workflows. CommandLineTools are essentially wrappers for commands that would usually be executed in the command line. CWL uses a special YAML structure to describe those processes.

#!/usr/bin/env cwl-runner

cwlVersion: v1.2
class: CommandLineTool

requirements:
- class: DockerRequirement
  dockerPull: osgeo/gdal:ubuntu-full-3.6.3
- class: InlineJavascriptRequirement

inputs:
- id: districts_geojson
  type: string
  default: districts.geojson
  inputBinding:
    position: 0
- id: data_braunschweig
  type: Directory
  default:
    class: Directory
    location: ../../data/braunschweig
  inputBinding:
    position: 1
- id: lco
  type: string
  default: RFC7946=YES
  inputBinding:
    prefix: -lco

outputs:
- id: districts
  type: File
  outputBinding:
    glob: $(inputs.districts_geojson)

baseCommand: ogr2ogr

However it may is tedious to write those files by hand. That is where s4n comes to the rescue. A Command that would normally happen on the command line just needs to be prefixed with s4n tool create. Examples can be found at the documentation.

To create Tools based of the Python scripts in the code Directory a virtual environment needs to be created using

python3 -m venv .venv
source .venv/bin/activate
pip install plotly pandas kaleido matplotlib

The next step is to download election data using a series of API calls for which luckily already a script exists. The script downloads the data from votemanager.kdo.de and writes the csv to stdout. A tool can be created easily be prefixing the python call. However we also need to escape the > using a backslash for it to properly work

s4n tool create python code/download_election_data.py --ags 03101000 --election "Bundestagswahl 2025" \> data.csv

The written csv file lacks the header information of which party results correspond to which column. Therefore we use the get_feature_info script and create a tool as follows:

s4n tool create python code/get_feature_info.py --data data.csv

With this information the election plot can be outputted. The script plot_election does the job and accepts the json file from get_feature_info and the aforementioned csv.

s4n tool create -c Dockerfile --container-tag pyplot --enable-network python code/plot_election.py --data data.csv --features features.json

Combining the Tools into a workflow

The three CommandLineTools now will be combined into an automated pipeline. A barebones workflow can be generated by using the create command

s4n workflow create demo
The workflow that is being built looks like the graph represented in the following image

the resulting workflow

First of all a connection between the donwload script and get_feature_info as well as plot_election is created by

s4n workflow connect demo --from download_election_data/data --to get_feature_info/data
s4n workflow connect demo --from download_election_data/data --to plot_election/data
To get the correct values for --from and --to the command s4n tool ls -a can be used.

The plot tool also needs the feature information, so the next step is to combine both tools:

s4n workflow connect demo --from get_feature_info/features --to plot_election/features

To use the workflow it needs inputs and outputs. In this demo's tools there are a lot of inputs, but some have default values. That means only neccesary connections have to be made. For the creation of inputs the --from value neeeds to start with @inputs.

s4n workflow connect demo --from @inputs/election --to download_election_data/election
s4n workflow connect demo --from @inputs/ags --to download_election_data/ags

Adding outputs follows the same logic, however @outputs is used in --to

s4n workflow connect demo --from plot_election/election --to @outputs/bar

Saving the workflow is neccessary to have a clean git history for further creating CommandLineTools.

s4n workflow save demo

During the creation s4n workflow status demo can always be used to view the connection status.

Adding additional steps

The next tool uses GDAL to convert the shape file in data/braunschweig to a geojson file. The Command one would typically use would be

ogr2ogr districts.geojson data/braunschweig -lco RFC7946=YES
# s4n command
s4n tool create ogr2ogr districts.geojson data/braunschweig -lco RFC7946=YES

However we might not have gdal installed on our machine, so we request s4n to not run the command. Therefore s4n needs to be told what file will be written with -o and for later usage a docker image is specified using -c.

s4n tool create --name shp2geojson --no-run -o districts.geojson -c osgeo/gdal:ubuntu-full-3.6.3 ogr2ogr districts.geojson data/braunschweig -lco RFC7946=YES
This correct creation of the tool can be tested using
s4n execute local workflows/shp2geojson/shp2geojson.cwl 

The outputted file now needs to be committed to move on

git add . && git commit -m "Execution of shp2geojson"

In the last step the plot tool needs to be created. In this tool plotly is used to create a choropleth graph based on the outputs of the preceeding steps. The packages installed to the virtual environment are needed here. A Dockerfile to use is already in the repo.

s4n tool create -c Dockerfile --container-tag pyplot --enable-network python code/plot_map.py --geojson districts.geojson --csv data.csv --feature F3 --on gebiet-nr:BEZNUM --output_name plot

Adding the new tools to Workflow

The two new tools will now be added to the workflow.

the final resulting workflow

Knowing that the plot tool needs the geojson, a connection from the geojson output to the corresponding input can be created.

s4n workflow connect demo --from shp2geojson/districts --to plot_map/geojson

As the plot step also needs the election data, another connection can be created.

s4n workflow connect demo --from download_election_data/data --to plot_map/csv

Now we need to wire up the inputs. The input connections for ags, election, featureand shapes will be created as follows:

s4n workflow connect demo --from @inputs/feature --to plot_map/feature
s4n workflow connect demo --from @inputs/shapes --to shp2geojson/data_braunschweig

The last step is to add the output to the workflow. Only the png file is desired.

s4n workflow connect demo --from plot_map/plot --to @outputs/map

The final workflow needs to be saved.

s4n workflow save demo

Workflow Execution

We want to clean our workspace by deleting the outputs we created by creating the CommandLineTools. For the execution a parameter file will be created using the s4n execute make-template command.

s4n execute make-template workflows/demo/demo.cwl > inputs.yml

This needs to be updated using the correct input values:

ags: "03101000"
election: Bundestagswahl 2025
shapes:
  class: Directory
  location: data/braunschweig
feature: F3