The repository local_pipeline_management in the LEEF-UZH organisation on github contains the bash functions to manage the pipeline remotely. These commands do run in the Linux terminal as well as in the Mac terminals. check with windows!!!
To use these commands, you can either download the repository and unzip it somewhere, or clone the repository using git. This is slightly more complicated, but makes it easier to update the local commands from the github repo.
To clone the commands do the following:
git clone git@github.com:LEEF-UZH/local_pipeline_management.git
which will create a directory called local_pipeline_management
. When downloading the zip file, you have to extract it, which will create a directory called local_pipeline_management-main
. The content of these two directories are identical for the further discussion here.
Inside this directory is a directory called bin
which contains the scripts to manage the pipeline remotely. The commands are:
server
upload
prepare
start
status
download
clean
To execute these commands, you have to be either in the directory where the commands are located, or the directory has to be in the path. If they are not in the path, you have to prepend ./
to the command to work, e.g. ./upload -h
instead of upload -h
when they are in the path. For thiis tutorial, I will put them in the path.
All commands contain a basic usage help, which can be called by using the -h
or --help
argument as in e.g. ./upload -h
.
export PATH=~/Documents_Local/git/LEEF/local_pipeline_management/bin/:$PATH
##
upload -h
#>
#> Usage: upload [options] source_directory target_directory
#> upload [options] source_file target_directory
#>
#> Upload data from 'source' to the target_directory on the pipeline server in the 'Incoming'
#> directory.
#>
#> Depending on if a directory or a file is specified as source or target, the behaviour
#> differs slightly:
#>
#> source_directory target_directory: Copies the source_directory into the target_directory
#> source_file target_directory : Copies the source_file into the target_directory
#> source_file target_file : Copies the source_file over the target_file
#>
#> The transfer is done by using 'rsync'.
#>
#> Options:
#> -h, --help Print short help message and exit
We will now go through the commands available and explain what they are doing and how they can be used. Finally, we will show a basic workflow on how to upload data, start the server, download results, and prepare the pipeline server for the next run.
server
export PATH=~/Documents_Local/git/LEEF/local_pipeline_management/bin/:$PATH
##
server -h
#>
#> Usage: server [options]
#>
#> Returns the pipeline server IP. All scripts ude this function to obtain te server IP,
#> wherefore only here a new server IP needs to be specified
#>
#> Options:
#> -h, --help Print short help message and exit
The command server
returns the adress of the pipeline server. When the adress of the pipeline server changes, you can open the script in a text editor and simply replace the adress in the last line with the new adress.
#> #!/bin/bash
#>
#> usage="
#> Usage: `basename $0` [options]
#>
#> Returns the pipeline server IP. All scripts ude this function to obtain te server IP,
#> wherefore only here a new server IP needs to be specified
#>
#> Options:
#> -h, --help Print short help message and exit
#> "
#>
#> if [[ "$1" == "-h" ]] || [[ "$1" == "--help" ]]; then
#> echo "${usage}"
#> exit 0
#> fi
#>
#> ############
#> ############
#>
#> echo 172.23.57.181
upload
export PATH=~/Documents_Local/git/LEEF/local_pipeline_management/bin/:$PATH
##
upload -h
#>
#> Usage: upload [options] source_directory target_directory
#> upload [options] source_file target_directory
#>
#> Upload data from 'source' to the target_directory on the pipeline server in the 'Incoming'
#> directory.
#>
#> Depending on if a directory or a file is specified as source or target, the behaviour
#> differs slightly:
#>
#> source_directory target_directory: Copies the source_directory into the target_directory
#> source_file target_directory : Copies the source_file into the target_directory
#> source_file target_file : Copies the source_file over the target_file
#>
#> The transfer is done by using 'rsync'.
#>
#> Options:
#> -h, --help Print short help message and exit
This command uplaods data to the pipeline server. The most common usage is to uplad the data for the pipeline server. This is done by specifying the directory in which the 00.general.parameter
and 0.raw.data
directory resides locally.
The copying could also be done by mounting the leef_data
as a samba share, but it would be slower.
prepare
export PATH=~/Documents_Local/git/LEEF/local_pipeline_management/bin/:$PATH
##
prepare -h
#>
#> Usage: prepare [options] from
#>
#> Prepare the pipeline by populating the LEEF folder
#> with the data in the folder 'Income/from'.
#>
#> The command 'clean' will be run the *clean* command first and deleted the following folders in the LEEF folder:
#>
#> - 00.general.data
#> - 0.raw.data
#> - 1.pre-processed.data
#> - 2.extracted.data
#>
#> Options:
#> -h, --help Print short help message and exit
start
export PATH=~/Documents_Local/git/LEEF/local_pipeline_management/bin/:$PATH
##
start -h
#>
#> Usage: start [options] pipeline
#>
#> Start the pipeline on the pipeline server.
#> It needs one parameter, specifying the pipeline to start.
#>
#> Options:
#> -h, --help Print short help message and exit
#>
#> Allowed values for 'pipeline' are:
#> fast : start the pipeline for flowcam, flowcytometer,
#> o2meter and manual count.
#> bemovi: start the pipeline for bemovi
#> all : start all pipelines, equivalent to first running 'fast''
#> and than 'bemovi'.
The pipeline consists of three actual pipelines,
bemovi.mag.16
- bemovi magnification 16bemovi.mag.25
- bemovi magnification 25fast
- remaining measurementsThe typical usage is to run both pipelines (first fast
, and afterwards bemovi
) by providing the argument all
.
During the pipeline runs, logfiles are created in the pipeline folder. These have the extension
.txt
- the general log file which should be looked at to mag=ke sure thhat there are no errors. Thes should be logged in theerror.txt
file.done.txt
This file contains the timing info and is created at the end of the pipeline.and are created for each pipeline run named as above.
status
export PATH=~/Documents_Local/git/LEEF/local_pipeline_management/bin/:$PATH
##
status -h
#>
#> Usage: status [options]
#>
#> Displays status information about the pipeline.
#> The script checks, if a tmux session named 'pipeline_running'
#> is active.
#>
#> Options:
#> -h, --help Print short help message and exit
download
export PATH=~/Documents_Local/git/LEEF/local_pipeline_management/bin/:$PATH
##
download -h
#>
#> Usage: download [options] source target_directory
#>
#> Download the 'source' directory or file from within the 'LEEF' directory on the
#> pipeline server target_directory.
#>
#> The target_directory will be created if it does not exist.
#>
#> The transfer is done by using 'rsync'.
#>
#> Options:
#> -h, --help Print short help message and exit
download_logs
export PATH=~/Documents_Local/git/LEEF/local_pipeline_management/bin/:$PATH
##
download_logs -h
#>
#> Usage: download_logs [options]
#>
#> This is a wrapper around a call to the script 'download' to download the logfiles from
#> the pipeline into the folder 'pipeline_logs' in the current directory.
#> If the folder exists, the script aborts.
#> Options:
#> -h, --help Print short help message and exit
download_RRD
export PATH=~/Documents_Local/git/LEEF/local_pipeline_management/bin/:$PATH
##
download_RRD -h
#>
#> Usage: download_RRD [options] [all]
#>
#> This is a wrapper around a call to the script download to download the RRD (Research Ready Data) sqlite database into the current directory
#>
#> If 'all' is specified
#> Options:
#> -h, --help Print short help message and exit
clean
export PATH=~/Documents_Local/git/LEEF/local_pipeline_management/bin/:$PATH
##
clean -h
#>
#> Usage: clean [options]
#>
#> Clean the pipeline to prepare it for new input data.
#> This will *delete* the following folders in the LEEF folder:
#>
#> - 00.general.data
#> - 0.raw.data
#> - 1.pre-processed.data
#> - 2.extracted.data
#>
#> Options:
#> -h, --help Print short help message and exit
Delete all raw data and results folders from the pipeline. The folders containing the archived data as well as the backend (containing the Reserch Read Data databases) are not deleted!
This script is run automatically the script prepare
is executed.
The script asks for confirmation before deleting anything!
A Typical workflow for the pipeline consist of the steps outlined below. It assumes, that the pipeline folder is complete as described WHERE?????
Let’s assume, that one sampling day is complete and all data has been collected in the folder ./20210401
. The local preparations are covered in the document LINK.
upload ./20210401
prepare 20210401
This will upload the data folder ./20210401
and prepare the pipeline to process that data.
start all
status
This will start the pipeline processing and check if it is running and give a message accordingly.
download_logs
This will download the log files which can be viewed to assess the progress and possible errors.
download_logs
The logs should be checked, and if everything is fine, the RRD can be downloaded by using
download_RRD
or, for the complete set of RRD,
download_RRD all
Finally, the pipeline should be cleaned again by executing
clean
It is important to note the following points:
0.raw.data
, 1.pre-processed.data
or the 2.extracted.data
folder. You will recognise them when they are there.3.archived.data
and 9.backend
must not be deleted, as data is added to them during each run and they are managed by the pipeline (TODO).