The pipeline server is located in the S3IT Science Cloud and is accessible from within the UZH network.
To access it from outside the UZH network, it is necessary to use the UZH VPN!
This applies to all activities on the server, e.g. uploading, downloading, managing and mounting the samba share!
To manage the pipeline server instance itself, you have to connect to Dashboard of the S3IT Science cloud at https://cloud.s3it.uzh.ch/auth/login/?next=/. Normally, no interaction with the dashboard is necessary. See the Admin Guide for details.
Before you can use the bash scripts for management of the pipeline, you need a terminal running a bash shell. These are included in Mac and Linux, but need to be installed on Windows. Probably the easiest approach is to use the Windows Subsystem for Linux. This can relatively easily be installed as described here. Please see this How-To Geek article for details on how to run bash script after the WSL is installed. In this Linux Bash Shell in Windows you can execute the bash scripts provided.
Other options for Windows include To be added after input from Uriah.
To be able to remotely log in the pipeline server, an ssh client is needed. In Mac, Linux and the WSL are these builtin. Another widely used ssh shell under Windows is provided by putty.
For any interaction with the pipeline server you have to authenticate. The pipeline server is setup to only accept passwordless logins (except of mounting the SAMBA share), which authenticates by using a so called ssh certificate, which is unique to your computer. This is safer than password authentication and much easier to use once setup.
Before you generate a new one, you should check if you already have one (not impossible) by executing
cat ~/.ssh/id_rsa.pub
from the bash shell. If it tells you No such file or directory
, you have to generate one as described in the S3IT training handout for Mac, Windows or Linux.
After you have an ssh key for your computer, you have to contactsomebody who already has access to the pipeline server to have it added to the instance so that you can interact with the pipeline server either via the local pipeline management scripts (which are using ssh) or logging in using ssh directly (username ubuntu
) and executing commands on the pipeline server.
The data submitted to the pipeline consists out of two folder: one 0.raw.data
folder containing the measured data and measurement specific metadata, and the folder 00.general.parameter
containing the metadata and some data used by all measurements. The folder structure has to be as follows:
0.raw.data
├── bemovi.mag.16
│ ├── ........_00097.cxd
│ ├── ...
│ ├── ........_00192.cxd
│ ├── bemovi_extract.mag.16.yml
│ ├── video.description.txt
│ ├── video_classifiers_18c_16x.rds
│ └── video_classifiers_increasing_16x_best_available.rds
├── bemovi.mag.25
│ ├── ........_00001.cxd
│ ├── ...
│ ├── ........_00096.cxd
│ ├── bemovi_extract.mag.25.cropped.yml
│ ├── bemovi_extract.mag.25.yml
│ ├── video.description.txt
│ ├── video_classifiers_18c_25x.rds
│ └── video_classifiers_increasing_25x_best_available.rds
├── flowcam
│ ├── 11
│ ├── 12
│ ├── 13
│ ├── 14
│ ├── 15
│ ├── 16
│ ├── 17
│ ├── 21
│ ├── 22
│ ├── 23
│ ├── 24
│ ├── 25
│ ├── 26
│ ├── 27
│ ├── 34
│ └── 37
├── flowcytometer
│ ├── ........
│ └── .........ciplus
├── manualcount
│ └── .........xlsx
└── o2meter
└── .........csv
00.general.parameter
├── compositions.csv
├── experimental_design.csv
└── sample_metadata.yml
see the document on Teams with the detailed steps necessary to assemble the data and the necessary metadata.
These two folders need top be uploaded to the pipeline server and the pipeline needs to be started.
There are two approaches of uploading the data to the pipeline server and start the pipeline afterwards: using local bash scripts from a local computer or executing the commands from the pipeline server.
The recommended approach is to use the local bash scripts, as this will minimise the likelihood of errors or accidental data loss. Nevertheless, for some actions it might be necessary to work directly on the pipeline server, usually via an ssh session and to execute commands on the pipeline server.
After completing the pipeline, the folder LEEF
on the pipeline server will look as follws:
./LEEF
├── 0.raw.data
│ ├── bemovi.mag.16
│ ├── bemovi.mag.25
│ ├── flowcam
│ ├── flowcytometer
│ ├── manualcount
│ └── o2meter
├── 00.general.parameter
│ ├── compositions.csv
│ ├── experimental_design.csv
│ └── sample_metadata.yml
├── 1.pre-processed.data
│ ├── bemovi.mag.16
│ ├── bemovi.mag.25
│ ├── flowcam
│ ├── flowcytometer
│ ├── manualcount
│ └── o2meter
├── 2.extracted.data
│ ├── bemovi.mag.16
│ ├── bemovi.mag.25
│ ├── flowcam
│ ├── flowcytometer
│ ├── manualcount
│ └── o2meter
├── 3.archived.data
│ ├── extracted
│ ├── pre_processed
│ └── raw
├── 9.backend
│ ├── LEEF.RRD.sqlite
│ ├── LEEF.RRD_bemovi_master.sqlite
│ ├── LEEF.RRD_bemovi_master_cropped.sqlite
│ ├── LEEF.RRD_flowcam_algae_metadata.sqlite
│ └── LEEF.RRD_flowcam_algae_traits.sqlite
├── log.2021-03-03--15-06-32.fast.done.txt
├── log.2021-03-03--15-06-32.fast.txt
├── log.2021-03-03--15-14-20.bemovi.mag.16.done.txt
├── log.2021-03-03--15-14-20.bemovi.mag.16.error.txt
├── log.2021-03-03--15-14-20.bemovi.mag.16.txt
├── log.2021-03-03--15-14-20.bemovi.mag.25.done.txt
├── log.2021-03-03--15-14-20.bemovi.mag.25.error.txt
└── log.2021-03-03--15-14-20.bemovi.mag.25.txt
This folder contains the pre-processed data. Pre-processed means, that the raw data (0.raw.data
) is converted into open formats where this is possible to be done lossless and compressed (in case of the bemovi videos). All further processing is done with the pre-processed data.
This folder contains the data which will be used in the further analysis outside the pipeline. It contains the intermediate extracted data as well as the data which will finally be added to the backend (9.backend
). The final extracted data for the backend is in csv
format.
Data is archived as raw data, pre-processed data, and extracted data. In the respective folders, a folder using the timestamp as specified in the sample.metadata.yml
is created containing the actual data. The raw data as well as the pre-processed data from the bemovi is just copied over, while the others are in form of .tar.gz
archives. Of all files sha256 hashes are calculated to guarantee the correctness of the data.
The backend consists of the following sqlite databases.
LEEF.RRD.sqlite containes the Research Ready Data (RRD) of all measurements. This one will be used for the further analysis
LEEF.RRD_bemovi_master_cropped.sqlite contains the master file as returned from the bemovi analysis of the cropped data from the 25 magnification images.
LEEF.RRD_bemovi_master.sqlite contains the master file as returned from the bemovi analysis of the uncropped data from the 16 and 25 magnification images. Will be used after new classification.
LEEF.RRD_flowcam_algae_metadata.sqlite metadata about the flowcam analysis (I think this one is not needed)
LEEF.RRD_flowcam_algae_traits.sqlite raw data as returned fron the flowcam analysis. Will be used after new classification.