Automation#
When performing numerical simulations or applying a data analysis pipeline, it’s crucial to keep track of all the steps that where necessary to obtain the results. The idealistic goal is scientific reproducibility, which in practice boils down to instructing peers (and your future self) how to redo your computations. Any time-consuming and error-prone manual steps should be avoided and replaced with an automated workflow. This automation will help to remove tedious and repetitive tasks from your everyday research. And it’s anyway required if you want to redo similar computations with varying input parameters, for instance as batch jobs on a HPC cluster.
Even though you may start by documenting all required steps in some text document, you should quickly move on and automate as much as possible. This can be achieved with Python or Shell scripts. An alternative is to make use of available task runners. Another significant ingredient is to run external commands from inside Python code. Lastly, we will show examples of executing code in parallel using concurrency.