cluster

An element in [[cluster]] is a table that defines the configuration of a single cluster.

For example:

[[cluster]]
name = "cluster1"
identify.by_environment = ["CLUSTER_NAME", "cluster1"]
scheduler = "slurm"
[[cluster.partition]]
name = "shared"
maximum_cpus_per_job = 127
maximum_gpus_per_job = 0
[[cluster.partition]]
name = "gpu-shared"
minimum_gpus_per_job = 1
[[cluster.partition]]
name = "compute"
require_cpus_multiple_of = 128
maximum_gpus_per_job = 0
[[cluster.partition]]
name = "debug"
maximum_gpus_per_job = 0
prevent_auto_select = true

name

cluster.name: string - The name of the cluster.

identify

cluster.identify: table - Set a condition to identify when row is executing on this cluster. The table must have one of the following keys:

by_environment: array of two strings - Identify the cluster when the environment variable by_environment[0] is set and equal to by_environment[1].
always: bool - Set to true to always identify this cluster. When false, this cluster may only be chosen by an explicit --cluster option.

caution

The first cluster in the list that sets identify.always = true will prevent any later cluster from being identified (except by explicit --cluster=name).

scheduler

cluster.scheduler: string - Set the job scheduler to use on this cluster. Must be one of:

"slurm"
"bash"

cluster.slurm_gpus_per_task: string - Set the sbatch command line option that selects the number of gpus per task (used only by the slurm scheduler). When omitted, slurm_gpus_per_task defaults to --gpus-per-task=.

submit_options

cluster.submit_options: array of strings - Scheduler submission options that are passed to every job on this cluster.

partition

cluster.partition: array of tables - Define the scheduler partitions that row may select from when submitting jobs. Row will check the partitions in the order provided and choose the first partition where the job matches all the provided conditions. All conditions are optional.

name

cluster.partition.name: string - The name of the partition as it should be passed to the cluster batch submission command.

maximum_cpus_per_job

cluster.partition.maximum_cpus_per_job: integer - The maximum number of CPUs that can be used by a single job on this partition:

total_cpus <= maximum_cpus_per_job

require_cpus_multiple_of

cluster.partition.require_cpus_multiple_of: integer - All jobs submitted to this partition must use an integer multiple of the given number of cpus:

total_cpus % require_cpus_multiple_of == 0

warn_cpus_not_multiple_of

cluster.partition.warn_cpus_not_multiple_of: integer - All jobs submitted to this partition should use an integer multiple of the given number of cpus:

if total_cpus % warn_cpus_not_multiple_of != 0:
  warn! ...

This is a nonblocking variant of require_cpus_multiple_of that allows for submission of jobs that underutilize resources.

memory_per_cpu_mb

cluster.partition.memory_per_cpu: integer - CPU Jobs submitted to this partition will pass this option to the scheduler. For example SLURM schedulers will set --mem-per-cpu=<memory_per_cpu_mb>M.

cpus_per_node

cluster.partition.cpus_per_node: string - Number of CPUs per node.

When cpus_per_node is not set, row will request n_processes tasks. In this case, some schedulers are free to spread tasks among any number of nodes (for example, shared partitions on Slurm schedulers).

When cpus_per_node is set, row will also request the minimal number of nodes needed to satisfy n_nodes * cpus_per_node >= total_cpus. This may result in longer queue times, but will lead to more stable performance for users.

tip

Set cpus_per_node only when all nodes in the partition have the same number of CPUs.

minimum_gpus_per_job

cluster.partition.minimum_gpus_per_job: integer - The minimum number of gpus that must be used by a single job on this partition:

total_gpus >= minimum_gpus_per_job

maximum_gpus_per_job

cluster.partition.maximum_gpus_per_job: integer - The maximum number of gpus that can be used by a single job on this partition:

total_gpus <= maximum_gpus_per_job

require_gpus_multiple_of

cluster.partition.require_gpus_multiple_of: integer - All jobs submitted to this partition must use an integer multiple of the given number of gpus:

total_gpus % require_gpus_multiple_of == 0

warn_gpus_not_multiple_of

cluster.partition.warn_gpus_not_multiple_of: integer - All jobs submitted to this partition should use an integer multiple of the given number of gpus:

if total_gpus % warn_gpus_not_multiple_of != 0:
  warn! ...

This is a nonblocking variant of require_gpus_multiple_of that allows for submission of jobs that underutilize resources.

Row documentation

cluster

name

identify

scheduler

slurm_gpus_per_task

submit_options

partition

name

maximum_cpus_per_job

require_cpus_multiple_of

warn_cpus_not_multiple_of

memory_per_cpu_mb

cpus_per_node

minimum_gpus_per_job

maximum_gpus_per_job

require_gpus_multiple_of

warn_gpus_not_multiple_of

memory_per_gpu_mb

gpus_per_node

prevent_auto_select

account_suffix

Keyboard shortcuts

Row documentation