Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Writing action commands in Python

In row, actions execute arbitrary shell commands. When your action is Python code, you must wrap it with command line parsing that takes directories as arguments. There are many ways you can achieve this goal.

This guide will show you how to structure all of your actions in a single file: actions.py. This layout is inspired by row's predecessor signac-flow and its project.py.

tip

If you are familiar with signac-flow, see migrating from signac-flow for many helpful tips.

To demonstrate the structure of a project, let's build a workflow that computes the sum of squares. The focus of this guide is on structure and best practices. You need to think about how your simulation, analysis, data processing, or other code will fit into this structure.

Create the project

First, create the row project:

row init sum_squares --signac
cd sum_squares

Then, create a file populate_workspace.py in the same directory as workflow.toml with the contents:

"""Populate the workspace."""

import signac

N = 10

project = signac.get_project()

for x in range(N):
    job = project.open_job({'x': x}).init()

Execute:

signac init
python populate_workspace.py

to initialize the signac workspace and populate it with directories.

note

If you are not familiar with signac, then go read the basic tutorial. Come back to the row documentation when you get to the section on workflows. For extra credit, reimplement the signac tutorial workflow in row after you finish reading this guide.

Write actions.py

Now, create a file actions.py with the contents:

"""Implement actions."""

import argparse
import os

import signac


def square(*jobs):
    """Implement the square action.

    Squares the value `x` in each job's statepoint and writes the output to
    `square.out` when complete.
    """
    for job in jobs:
        # If the product already exists, there is no work to do.
        if job.isfile('square.out'):
            continue

        # Open a temporary file so that the action is not completed early or on error.
        with open(job.fn('square.out.in_progress'), 'w') as file:
            x = job.cached_statepoint['x']
            file.write(f'{x**2}')

        # Done! Rename the temporary file to the product file.
        os.rename(job.fn('square.out.in_progress'), job.fn('square.out'))


def compute_sum(*jobs):
    """Implement the compute_sum action.

    Prints the sum of `square.out` from each job directory.
    """
    total = 0
    for job in jobs:
        with open(job.fn('square.out')) as file:
            total += int(file.read())

    print(total)


if __name__ == '__main__':
    # Parse the command line arguments: python action.py --action <ACTION> [DIRECTORIES]
    parser = argparse.ArgumentParser()
    parser.add_argument('--action', required=True)
    parser.add_argument('directories', nargs='+')
    args = parser.parse_args()

    # Open the signac jobs
    project = signac.get_project()
    jobs = [project.open_job(id=directory) for directory in args.directories]

    # Call the action
    globals()[args.action](*jobs)

This file defines each action as a function with the same name. These functions take an array of jobs as an argument: def square(*jobs) and def compute_sum(*jobs). The if __name__ == "__main__": block parses the command line arguments, forms an array of signac jobs, and calls the requested action function.

note

This example demonstrates looping over directories in serial. However, this structure also gives you the ability to choose serial or parallel execution. Grouping many directories into a single cluster job submission will increase your workflow's throughput.

Write workflow.toml

Next, replace the contents of workflow.toml with the corresponding workflow:

[workspace]
value_file = "signac_statepoint.json"

[default.action]
command = "python actions.py --action $ACTION_NAME {directories}"

[[action]]
name = "square"
products = ["square.out"]
resources.walltime.per_directory = "00:00:01"

[[action]]
name = "compute_sum"
previous_actions = ["square"]
resources.walltime.per_directory = "00:00:01"
group.submit_whole = true

Both actions have the same command, set once by the default action:

[default.action]
command = "python actions.py --action $ACTION_NAME {directories}"

python actions.py executes the actions.py file above. It is given the argument --action $ACTION_NAME which selects the Python function to call. Here $ACTION_NAME is an environment variable that row sets in job scripts. The last arguments are given by {directories}. Unlike {directory} shown in previous tutorials, {directories} expands to ALL directories in the submitted group. action.py is executed once and is free to process the list of directories in any way it chooses (e.g. in serial, with multiprocessing parallelism, multiple threads, using MPI parallelism, ...).

Execute the workflow

Now, submit the square action:

row submit --action square

and you should see:

Submitting 1 job that may cost up to 0 CPU-hours.
Proceed? [Y/n]: y
[1/1] Submitting action 'square' on directory 04bb77c1bbbb40e55ab9eb22d4c88447 and 9 more.

Next, submit the compute_sum action:

row submit --action compute_sum

and you should see:

Submitting 1 job that may cost up to 0 CPU-hours.
Proceed? [Y/n]: y
[1/1] Submitting action 'compute_sum' on directory 04bb77c1bbbb40e55ab9eb22d4c88447 and 9 more.
285

It worked! sum printed the result 285.

note

If you are on a cluster, use --cluster=none or wait for jobs to complete after submitting.

Applying this structure to your workflows

With this structure in place, you can add new actions to your workflow following these steps:

  1. Write a function def action(*jobs) in actions.py.
  2. Add:
    [[action]]
    name = "action"
    # And other relevant options
    
    to your workflow.toml file.

note

You may write functions that take only one job def action(job) without modifying the given implementation of __main__. However, you will need to set action.group.maximum_size = 1 or use {directory} to ensure that action.py is given a single directory.

Next steps

In this guide, you learned how to write workflow action commands in Python. Now, you should know everything you need to build complex workflows with row and deploy them on HPC resources.


Development of row is led by the Glotzer Group at the University of Michigan.

Copyright © 2024-2025 The Regents of the University of Michigan.