Corn Prediction Example
Start by cloning the demo repo: git clone https://github.com/fairagro/m4.4_demo_corn_prediction
.
SciWIn Client Demo: Crop Yield Prediction Pipeline
Section titled “SciWIn Client Demo: Crop Yield Prediction Pipeline”⚠️ Important Note: This workflow is not a scientifically meaningful pipeline. It is a test demonstration created to showcase the capabilities of the SciWIn Client (
s4n
).
It uses a sequence of steps (e.g., merging soil and weather data, training a model, and predicting yields) to illustrate how s4n
can be used to:
- Create CommandLineTools from Python scripts
- Connect tools into a workflow
- Visualize the pipeline
- Execute workflows locally and remotely
It is not intended for real-world agricultural analysis or decision-making. The data, scripts, and logic are simplified for demonstration purposes only.
🔧 Installation
Section titled “🔧 Installation”Install the latest version of s4n
:
curl --proto '=https' --tlsv1.2 -LsSf https://fairagro.github.io/m4.4_sciwin_client/get_s4n.sh | sh
Verify installation:
s4n -V
🛠️ Step 1: Initialize Project
Section titled “🛠️ Step 1: Initialize Project”s4n init
🧪 Step 2: Create CommandLineTools
Section titled “🧪 Step 2: Create CommandLineTools”For each script in the code/
directory, we create a CWL CommandLineTool
using s4n create
. These tools wrap the Python scripts and define how they are executed with inputs, outputs, and dependencies.
🔹 1. Get Soil Data
Section titled “🔹 1. Get Soil Data”📌 Note: Most soil data has already been downloaded because downloading them takes time.
The first step is to get soil data from soilgrids for the Iowa counties coordinates.
s4n create -c Dockerfile --container-tag pyplot --enable-network \ python code/get_soil.py --geojson data/iowa_counties.geojson --soil_cache data/soil_data.csv
This creates a new directory workflows/get_soil
with a CWL CommandLineTool
file get_soil.cwl
:
#!/usr/bin/env cwl-runner
cwlVersion: v1.2class: CommandLineTool
requirements:- class: InitialWorkDirRequirement listing: - entryname: code/get_soil.py entry: $include: ../../code/get_soil.py- class: DockerRequirement dockerFile: $include: ../../Dockerfile dockerImageId: pyplot- class: NetworkAccess networkAccess: true
inputs:- id: geojson type: File default: class: File location: ../../data/iowa_counties.geojson inputBinding: prefix: --geojson- id: soil_cache type: File default: class: File location: ../../data/soil_data.csv inputBinding: prefix: --soil_cache
outputs:- id: soil type: File outputBinding: glob: soil.csv
baseCommand:- python- code/get_soil.py
🔹 2. Get Weather Data
Section titled “🔹 2. Get Weather Data”Next, we fetch weather data for each county, for the year that was used for prediction.
s4n create -c Dockerfile --container-tag pyplot --enable-network \ python code/get_weather.py --geojson data/iowa_counties.geojson
🔹 3. Merge Features
Section titled “🔹 3. Merge Features”Now we combine soil and weather data into a single feature set.
s4n create -c Dockerfile --container-tag pyplot --enable-network \ python code/merge_features.py --geojson data/iowa_counties.geojson \ --weather weather.csv \ --soil soil.csv
🔹 4. Train Yield Prediction Model
Section titled “🔹 4. Train Yield Prediction Model”We train a simple model using historical yield data.
s4n create -c Dockerfile --container-tag pyplot --enable-network \ python code/train_model.py --features county_features.csv \ --yield data/iowa_yield.csv
🔹 5. Predict Yields
Section titled “🔹 5. Predict Yields”Now we use the trained model to predict yields for each county.
s4n create -c Dockerfile --container-tag pyplot --enable-network \ python code/predict_yields.py --features county_features.csv \ --model model.pkl \ --scaler scaler.pkl
🔹 6. Plot Predictions
Section titled “🔹 6. Plot Predictions”Finally, we visualize the predictions on a map.
s4n create -c Dockerfile --container-tag pyplot --enable-network \ python code/plot_yields.py --predictions county_predictions.csv \ --geojson data/iowa_counties.geojson
⚙ Step 3: Build the Workflow
Section titled “⚙ Step 3: Build the Workflow”Now we’ll build the workflow in three clear phases: create, connect, and save.
✅ Phase 1: Create the Workflow
Section titled “✅ Phase 1: Create the Workflow”Start by creating a new workflow named demo
:
s4n create -n demo
This generates a new directory workflows/demo/
with a demo.cwl
file.
🔗 Phase 2: Connect Inputs and Outputs
Section titled “🔗 Phase 2: Connect Inputs and Outputs”Now connect the tools in the correct order. Use s4n list -a
to inspect available inputs and outputs.
🔹 Connect Inputs
Section titled “🔹 Connect Inputs”s4n connect demo --from @inputs/geojson --to get_soil/geojsons4n connect demo --from @inputs/soil --to get_soil/soil_caches4n connect demo --from @inputs/geojson --to get_weather/geojsons4n connect demo --from @inputs/geojson --to merge_features/geojsons4n connect demo --from @inputs/yield --to train_model/yield
🔹 Connect Intermediate Steps
Section titled “🔹 Connect Intermediate Steps”s4n connect demo --from get_soil/soil --to merge_features/soils4n connect demo --from get_weather/weather --to merge_features/weathers4n connect demo --from merge_features/county_features --to train_model/featuress4n connect demo --from train_model/model --to predict_yields/models4n connect demo --from train_model/scaler --to predict_yields/scalers4n connect demo --from merge_features/county_features --to predict_yields/featuress4n connect demo --from predict_yields/county_predictions --to plot_yields/predictions
🔹 Connect Final Output
Section titled “🔹 Connect Final Output”s4n connect demo --from @inputs/geojson --to plot_yields/geojsons4n connect demo --from plot_yields/iowa_county_yields --to @outputs/iowa_county_yields
✅ Phase 3: Save the Workflow
Section titled “✅ Phase 3: Save the Workflow”Once all connections are made, save the workflow to ensure it’s committed to version control:
s4n save demo
📊 Step 4: Visualize the Workflow
Section titled “📊 Step 4: Visualize the Workflow”Generate a visual representation of the pipeline:
s4n visualize --renderer dot workflows/demo/demo.cwl > workflow.dotdot -Tsvg workflow.dot -o workflow.svg
🚀 Step 5: Execute the Workflow
Section titled “🚀 Step 5: Execute the Workflow”🔹 Generate Input Template
Section titled “🔹 Generate Input Template”s4n execute make-template workflows/demo/demo.cwl > inputs.yml
Edit inputs.yml
with real data paths:
geojson: class: File location: data/iowa_counties.geojsonsoil: class: File location: data/soil_data.csvyield: class: File location: data/iowa_yield.csv
🔹 Run the Full Pipeline
Section titled “🔹 Run the Full Pipeline”s4n execute local workflows/demo/demo.cwl inputs.yml
🔹 Run Remotely (Optional)
Section titled “🔹 Run Remotely (Optional)”s4n execute remote start workflows/demo/demo.cwl inputs.yml