Background

The repository local_pipeline_management in the LEEF-UZH organisation on github contains the bash functions to manage the pipeline remotely. These commands do run in the Linux terminal as well as in the Mac terminals. check with windows!!!

To use these commands, you can either download the repository and unzip it somewhere, or clone the repository using git. This is slightly more complicated, but makes it easier to update the local commands from the github repo.

To clone the commands do the following:

git clone git@github.com:LEEF-UZH/local_pipeline_management.git

which will create a directory called local_pipeline_management. When downloading the zip file, you have to extract it, which will create a directory called local_pipeline_management-main. The content of these two directories are identical for the further discussion here.

Inside this directory is a directory called bin which contains the scripts to manage the pipeline remotely. The commands are:

  • server
  • upload
  • prepare
  • start
  • status
  • download
  • clean

To execute these commands, you have to be either in the directory where the commands are located, or the directory has to be in the path. If they are not in the path, you have to prepend ./ to the command to work, e.g. ./upload -h instead of upload -h when they are in the path. For thiis tutorial, I will put them in the path.

All commands contain a basic usage help, which can be called by using the -h or --help argument as in e.g. ./upload -h.

export PATH=~/Documents_Local/git/LEEF/local_pipeline_management/bin/:$PATH
##
upload -h
#> 
#> Usage: upload [options] source_directory target_directory
#>        upload [options] source_file target_directory
#>        
#> Upload data from 'source' to the target_directory on the pipeline server in the 'Incoming' 
#> directory.
#> 
#> Depending on if a directory or a file is specified as source or target, the behaviour 
#> differs slightly:
#> 
#> source_directory target_directory: Copies the source_directory into the target_directory
#> source_file target_directory     : Copies the source_file into the target_directory
#> source_file target_file          : Copies the source_file over the target_file
#> 
#> The transfer is done by using 'rsync'.
#> 
#> Options:
#>   -h, --help            Print short help message and exit

We will now go through the commands available and explain what they are doing and how they can be used. Finally, we will show a basic workflow on how to upload data, start the server, download results, and prepare the pipeline server for the next run.

The commands

server

Help

export PATH=~/Documents_Local/git/LEEF/local_pipeline_management/bin/:$PATH
##
server -h
#> 
#> Usage: server [options]
#>        
#> Returns the pipeline server IP. All scripts ude this function to obtain te server IP, 
#> wherefore only here a new server IP needs to be specified
#> 
#> Options:
#>   -h, --help            Print short help message and exit

Description

The command server returns the adress of the pipeline server. When the adress of the pipeline server changes, you can open the script in a text editor and simply replace the adress in the last line with the new adress.

#> #!/bin/bash
#> 
#> usage="
#> Usage: `basename $0` [options]
#>        
#> Returns the pipeline server IP. All scripts ude this function to obtain te server IP, 
#> wherefore only here a new server IP needs to be specified
#> 
#> Options:
#>   -h, --help            Print short help message and exit
#> "
#> 
#> if [[ "$1" == "-h" ]] || [[ "$1" == "--help" ]]; then
#>   echo "${usage}"
#>   exit 0
#> fi
#> 
#> ############
#> ############
#> 
#> echo 172.23.57.181

Example of typical usage

export PATH=~/Documents_Local/git/LEEF/local_pipeline_management/bin/:$PATH
##
server
#> 172.23.57.181

upload

Help

export PATH=~/Documents_Local/git/LEEF/local_pipeline_management/bin/:$PATH
##
upload -h
#> 
#> Usage: upload [options] source_directory target_directory
#>        upload [options] source_file target_directory
#>        
#> Upload data from 'source' to the target_directory on the pipeline server in the 'Incoming' 
#> directory.
#> 
#> Depending on if a directory or a file is specified as source or target, the behaviour 
#> differs slightly:
#> 
#> source_directory target_directory: Copies the source_directory into the target_directory
#> source_file target_directory     : Copies the source_file into the target_directory
#> source_file target_file          : Copies the source_file over the target_file
#> 
#> The transfer is done by using 'rsync'.
#> 
#> Options:
#>   -h, --help            Print short help message and exit

Description

This command uplaods data to the pipeline server. The most common usage is to uplad the data for the pipeline server. This is done by specifying the directory in which the 00.general.parameter and 0.raw.data directory resides locally.

The copying could also be done by mounting the leef_data as a samba share, but it would be slower.

Example of typical usage

upload ./20210101

would upload the folder ./20210101 into the folder Incoming on the pipeline server.

prepare

Help

export PATH=~/Documents_Local/git/LEEF/local_pipeline_management/bin/:$PATH
##
prepare -h
#> 
#> Usage: prepare [options] from
#> 
#> Prepare the pipeline by populating the LEEF folder
#> with the data in the folder 'Income/from'.
#> 
#> The command 'clean' will be run the *clean* command first and deleted the following folders in the LEEF folder:
#> 
#>  - 00.general.data
#>  - 0.raw.data
#>  - 1.pre-processed.data
#>  - 2.extracted.data
#>  
#> Options:
#>   -h, --help            Print short help message and exit

Description

Copying the data from within the folder from in the LEEF folder where it can be processed by the pipeline. Before copying the data, folder leftovers from earlier pipeline runs are deleted by running the clean script.

Example of typical usage

prepare 20210101

start

Help

export PATH=~/Documents_Local/git/LEEF/local_pipeline_management/bin/:$PATH
##
start -h
#> 
#> Usage: start [options] pipeline
#> 
#> Start the pipeline on the pipeline server.
#> It needs one parameter, specifying the pipeline to start.
#> 
#> Options:
#>   -h, --help            Print short help message and exit
#> 
#> Allowed values for 'pipeline' are:
#>    fast  : start the pipeline for flowcam, flowcytometer,
#>            o2meter and manual count.
#>    bemovi: start the pipeline for bemovi
#>    all   : start all pipelines, equivalent to first running 'fast''
#>            and than 'bemovi'.

Description

The pipeline consists of three actual pipelines,

  • bemovi.mag.16 - bemovi magnification 16
  • bemovi.mag.25 - bemovi magnification 25
  • fast - remaining measurements

The typical usage is to run both pipelines (first fast, and afterwards bemovi) by providing the argument all.

During the pipeline runs, logfiles are created in the pipeline folder. These have the extension

  • .txt - the general log file which should be looked at to mag=ke sure thhat there are no errors. Thes should be logged in the
  • error.txt file.
  • done.txt This file contains the timing info and is created at the end of the pipeline.

and are created for each pipeline run named as above.

Example of typical usage

start all

status

Help

export PATH=~/Documents_Local/git/LEEF/local_pipeline_management/bin/:$PATH
##
status -h
#> 
#> Usage: status [options]
#> 
#> Displays status information about the pipeline.
#> The script checks, if a tmux session named 'pipeline_running'
#> is active.
#> 
#> Options:
#>   -h, --help            Print short help message and exit

Description

The status returned, is the status when the pipeline is started using start. When started manually from the pipeline server (or via ssh), the status will not be reported correctly.

Example of typical usage

status

download

Help

export PATH=~/Documents_Local/git/LEEF/local_pipeline_management/bin/:$PATH
##
download -h
#> 
#> Usage: download [options] source target_directory
#>        
#> Download the 'source' directory or file from within the 'LEEF' directory on the 
#> pipeline server target_directory.
#>  
#> The target_directory will be created if it does not exist.
#> 
#> The transfer is done by using 'rsync'.
#> 
#> Options:
#>   -h, --help            Print short help message and exit

Description

Download files or folder from the LEEF directory on the pipeline server. If you want to download files from other folders, use .. to move one directory up. For example, ../Incoming would download the whole Incoming directory.

Example of typical usage

download 9.backend

download_logs

Help

export PATH=~/Documents_Local/git/LEEF/local_pipeline_management/bin/:$PATH
##
download_logs -h
#> 
#> Usage: download_logs [options]
#> 
#> This is a wrapper around a call to the script 'download' to download the logfiles from 
#>   the pipeline into the folder 'pipeline_logs' in the current directory. 
#>   If the folder exists, the script aborts.
#> Options:
#>   -h, --help            Print short help message and exit

Description

This is a specialised version of the download command. It downloads the log files into the directory ./pipeline_logs

Example of typical usage

download_logs

download_RRD

Help

export PATH=~/Documents_Local/git/LEEF/local_pipeline_management/bin/:$PATH
##
download_RRD -h
#> 
#> Usage: download_RRD [options] [all]
#> 
#> This is a wrapper around a call to the script download to download the RRD (Research Ready Data) sqlite database into the current directory
#> 
#> If 'all' is specified
#> Options:
#>   -h, --help            Print short help message and exit

Description

This is a specialised version of the download command. It downloads the RRD (Research Ready Data), either only the main database, or the complete set. Downloading all RRD can take a long time!

Example of typical usage

download_RRD

clean

Help

export PATH=~/Documents_Local/git/LEEF/local_pipeline_management/bin/:$PATH
##
clean -h
#> 
#> Usage: clean [options]
#> 
#> Clean the pipeline to prepare it for new input data.
#> This will *delete* the following folders in the LEEF folder:
#> 
#>  - 00.general.data
#>  - 0.raw.data
#>  - 1.pre-processed.data
#>  - 2.extracted.data
#>  
#> Options:
#>   -h, --help            Print short help message and exit

Description

Delete all raw data and results folders from the pipeline. The folders containing the archived data as well as the backend (containing the Reserch Read Data databases) are not deleted!

This script is run automatically the script prepare is executed.

The script asks for confirmation before deleting anything!

Example of typical usage

clean

Workflow example

A Typical workflow for the pipeline consist of the steps outlined below. It assumes, that the pipeline folder is complete as described WHERE?????

Let’s assume, that one sampling day is complete and all data has been collected in the folder ./20210401. The local preparations are covered in the document LINK.

Preparation

upload ./20210401
prepare 20210401

This will upload the data folder ./20210401 and prepare the pipeline to process that data.

Run the pipeline

start all
status

This will start the pipeline processing and check if it is running and give a message accordingly.

Check the progress of the pipeline

download_logs

This will download the log files which can be viewed to assess the progress and possible errors.

After pipeline has finished

download_logs

The logs should be checked, and if everything is fine, the RRD can be downloaded by using

download_RRD

or, for the complete set of RRD,

download_RRD all

Finally, the pipeline should be cleaned again by executing

clean

It is important to note the following points:

  1. When the run is completed, check the folders for error messages. They should be in the 0.raw.data, 1.pre-processed.data or the 2.extracted.data folder. You will recognise them when they are there.
  2. The folders 3.archived.data and 9.backend must not be deleted, as data is added to them during each run and they are managed by the pipeline (TODO).
  3. the log files give an indication if the run had been successful. In the case of bemovi, if individual movies could not be handled, would be considered a successful run!