17. Running Horace in Parallel

17.1. Controlling MPI

Certain operations in Horace have parallelised variants which can be used to speed up large operations.

Warning

Due to the nature of parallelism, some parallel operations require extra memory which may be up to twice the size of the sqw objects you’re operating on.

Warning

Be aware that for small jobs or some combinations of parameters, parallel calculation may, in fact, be slower than serial execution due to startup times and message sending. In future we hope to bring these times down and efficiencies up.

17.1.1. Enabling Locally

To run a particular function in parallel there is a function called parallel_call in Horace which runs a single task in parallel. Currently, parallel_call is available for:

fit
cut
func_eval
sqw_eval

The syntax for using parallel_call is as follows:

result = parallel_call(@function, {argument, arguments})

% Equivalent to

result = function(argument, arguments)

% E.g.

proj = line_proj([1, 0, 0], [0, 1, 0]);
w_out = parallel_call(@cut, {w_in, proj, [], [1 0.2 4], [0 1], [10 20]});

In order to use parallel_call you must ensure MPI has been configured correctly.

17.1.2. Enabling Globally

Parallel operations can be enabled globally using:

hpc('on')

Alternatively different parallel components can be enabled/disabled separately through hpc_config (see: changing Horace settings)

hpc_config with properties:

    build_sqw_in_parallel: 0
  parallel_workers_number: 2
        combine_sqw_using: 'matlab'
      combine_sqw_options: {'matlab'  'mex_code'  'mpi_code'}
  mex_combine_thread_mode: 0
  mex_combine_buffer_size: 131072
        parallel_multifit: 0
         parallel_cluster: 'herbert'
   parallel_configuration: [1x1 parallel_config]
              hpc_options: {1x6 cell}
               class_name: 'hpc_config'
                 saveable: 1
         returns_defaults: 0
            config_folder: '/home/jacob/.matlab/mprogs_config'

In particular, the parallel enabling options are:

parallel_multifit : Enable parallel fitting for multifit and tobyfit
build_sqw_in_parallel : Enable building sqw objects in parallel, i.e. gen_sqw, combine_sqw

The parallel_config contains most of the information to manage parallelism, though some is stored in hpc_config (described above):

parallel_config with properties:

                   worker: 'worker_v4'
              is_compiled: 0
         parallel_cluster: 'herbert'
           cluster_config: 'local'
           known_clusters: {'herbert'  'parpool'  'mpiexec_mpi'  'slurm_mpi'  'dummy'}
      known_clust_configs: {'local'}
   shared_folder_on_local: ''
  shared_folder_on_remote: ''
        working_directory: '/tmp/'
         wkdir_is_default: 1
         external_mpiexec: ''
               class_name: 'parallel_config'
                 saveable: 1
         returns_defaults: 0
            config_folder: '/home/jacob/.matlab/mprogs_config'

17.2. MPI Schemes

Horace can be run in parallel with a number of different schemes, all controlled through the parallel_config.

The five currently implemented parallel schemes are:

1. herbert (Poor-man’s MPI) - Data messages are sent through files written to the hard drive and read by each process. This is the slowest MPI scheme, but also the one with the fewest requirements.

2. parpool (Matlab Parallel Toolbox MPI) - Parpool uses Matlab’s parallel toolbox parallelism to send messages and therefore requires the parallel toolbox to be used.

3. mpiexec_mpi (C++ MPI) - Data messages are sent using C++ wrapping OpenMPI. This requires the MEX files to be built in order to be used.

4. slurm_mpi (Slurm MPI) - Data messages are sent using C++ wrapping OpenMPI, but are submitted to a running Slurm instance by Horace upon starting the job. This requires the MEX files to be built in order to be used.

5. dummy (Dummy MPI) - Dummy MPI is not MPI, but simply a dummy system for debugging and testing MPI algorithms on one process in serial.

17.3. Managing parallel jobs

Running jobs in parallel is as simple as selecting the appropriate MPI scheme, setting an appropriate parallel_workers_number and enabling the appropriate flags through the hpc_config and parallel_config.

17.4. Slurm Jobs

When running on Slurm-managed clusters, it is possible to automatically submit jobs to the Slurm queue to be run in parallel across the cluster. This will attempt to request the number of nodes required to run the selected number of parallel workers and associated threads, however, if you are using a cluster which requires non-standard options such as billing accounts and or non-default queues specifying, it is possible to issue extra commands through the slurm_commands variable accessible via the parallel_config object. This is a containers.Map object, and will only store the latest set commands.

new_commands = containers.Map({'-A' '-p'}, {'account' 'partition'});
pc = parallel_config();
pc.slurm_commands = [];                                       % Delete existing Slurm commands
pc.slurm_commands = new_commands;                             % Set new map
pc.slurm_commands = '-A account -p=partition'                 % Set as char
pc.slurm_commands = {'-A' 'account' '-p' 'partition'}         % Set as cellstr of commands (must be in pairs)
pc.slurm_commands = {{'-A' 'account'} {'-p' 'partition'}}     % Set as cell array of pairs of commands
pc.update_slurm_commands('-A account -p=partition', false)    % Using update_slurm_commands setting append to false
pc.update_slurm_commands(new_commands)                        % Using update_slurm_commands omitting append

Note

Setting slurm_commands by any of the above methods will remove all existing slurm_commands and set the new ones.

pc.slurm_commands('-A') = 'account'; pc.slurm_commands('-p') = 'partition'            % Set through Map interface
pc.update_slurm_commands('-A account -p=partition', true);                            % Set through update_slurm_commands
pc.update_slurm_commands(containers.Map({'-A' '-p'}, {'account', 'partition'}), true)

Note

Setting slurm_commands by any of the above methods will simply overwrite any existing slurm_commands.

It is possible to set the slurm_commands variable by loading the appropriate commands from a file if that is what your cluster team provides. This is done by using the following command:

pc = parallel_config();
pc = pc.load_slurm_commands_from_file(<filename>, <append>);

Where filename is the path of the file to load the commands from, and append specifies whether the commands are meant to be added to the existing commands or replace them entirely.