ViVar

General

Abstract

Structural genomic variations play an important role in human disease and phenotypic diversity. With the rise of high-throughput sequencing tools, mate-pair/paired-end sequencing has become an important technique for the detection and exploration of structural variation. ViVar is a comprehensive analysis platform for the processing, analysis and visualization of structural variation based on sequencing data or genomic microarrays, enabling the rapid identification of disease loci or genes. Vivar allows you to scale your analysis with your work load over multiple (cloud) servers, has user access control to keep your data safe but still easy to share, and is easy expandable as analysis techniques advance.

Data

To help you explore ViVar we loaded some demo experiments, using real (non-simulated) human data. When you're logged in, you can click the links in the table below to zoom to interresting genomic regions of the sample experiments.

Experiment	Description	Links
Patient 12	Mate-pair sequencing data of duplication and deletion on chromosome 1	View chr1 region
Multiple Patients	Clustering analysis detecting a balanced abberation, an inversion. This would not be detectable by coverage or genomic microarray analysis.	View chr4 region
Patient 28	Mate-pair sequencing data, and genomic microarray data of a complex duplications on the long arm of the X chromosome	View chrX region
Patient 29	Mate-pair sequencing data, and genomic microarray data of a complex trisomy 21	View chr21 region

Technical Documentation

1. Requirements:

- General

ViVar is a php5 application powered by the Slim framework and the Twig templating engine. It is available as a stand alone webapplication or can be served through a Docker container . The ViVar docker image is an easy distributable installation of the ViVar platform, based on the official Nginx image available at Docker Hub, which is again based on Debian:jessie. All internal processes are monitored by Supervisord and the whole depends on a separately run Mongo database.

- Hard- and Software

The ViVar webapp requires three separate systems to run.
The website front runs as a php5 application and thus requires the following packages to be installed, available through apt-get on Ubuntu, Debian, etc.

php5-fpm
php5-mongo
php5-gd
php5-curl
curl
samtools
mongodb-clients

All of these requirements are readily installed inside the docker image.

The analysis back-end requires a compute cluster with Torque. This cluster should have access to a folder shared with the webserver where the job submission scripts can be written. A shared folder for raw data to be analyzed should also be available. Software-wise, the cluster must have the following executables installed and available in the PATH environmental value:

bowtie2
bedtools
samtools
R (>=3.2.2) with libraries

Bioconductor-QDNAseq
Bioconductor-CGHcall

Perl with Mojolicious

To accept job submissions from an external source, a daemon app must be run on the submit node of the cluster. The daemon script can be generated with the "submitdaemon.sh" script found in the "daemon" folder in the vivar source code. This script can be run as follows.

bash submitdaemon.sh "queueName" "queueServer" "queueUser"

This script will also install the Mojolicious perl libraries. To use the daemon, start a screen session and execute:

morbo vivar_submitdaemon.pl &

The whole webapplication uses a Mongo database as backbone for data storage.

2. Installation:

- Docker Image (recommended)

Inside the container Nginx is listening at port 80 which can be mounted to a port of your choice. A graphical user interface for supervisord is made available at port 9002. A Mongo database can be run locally or as a separate Docker container. To link a local or existing Mongo database, mount the appropriate port to 21017 in the ViVar container.

In a Docker container, data is not persistent. To avoid loss of data, you can mount /vivar and /vivar-data to your local filesystem. The /vivar folder contains all source pre and configuration files for the web application. /vivar-data can be used to store and import data files.

CAVEAT: This Docker image is a work in progress. While efforts have been made to automate as much of the configuration as possible, things still can go wrong. We advise to check the configuration files at /vivar/config.ini and /vivar/jobs/general.json for errors if any problems might occur.

1. Make sure a Mongo database is available. This can be either run locally or this can be a docker container.
2. Run the ViVar webapp.
example: ViVar with MongoDB run locally and mounted data volumes

docker run --name=vivar -d -p 8080:80 -p 9002:9002 -e VIVAR_API="172.17.0.1:8080" MONGO_HOST="your_mongo_host" -v /foo/bar/vivar:/vivar/ -v /foo/bar/vivar-data:/vivar-data/ vivar

Available environmental values to edit

Variable	Default	Synopsis
VIVAR_API	172.17.0.1:8080	This is the adress and port where ViVar listens for API calls, normally this will be the ip adress of the docker0 interface when using the command `ip a`.
MONGO_HOST	127.0.0.1	Hostname of the machine where the mongo database listens for connections. When using a linked container, this value is set automatically.
MONGO_PORT	27017	Port where where the mongo database listens for connections.
PBS_HOST	127.0.0.1	FQDN of the host where the job submission daemon listens.
PBS_PORT	3000	Port at which the job submission daemon listens.
PBS_SERVER	pbsqueue	Hostname of the PBS server.
PBS_QUEUE	batch	Name of the queue where the jobs can be submitted.
HOST_DATADIR	/mnt/vivar-data	Directory on the docker host where the `/vivar-data` volume is mounted.

3. OPTIONAL Run a graphical MongoDB user interface of your choosing. Here we use a pre-made docker image for mongo-express.
A list of available options can be found in the MongoDB documentation .
example:

docker run --name express -d -p 8081:8081 -e ME_CONFIG_MONGODB_SERVER="your_mongo_host" knickers/mongo-express

Database entries

Reference Organisms

It is possible to manually add new reference organisms. To add a new organism of reference build, edit or add the appropriate field to the database. New organism documents should be formatted as follows:

example: Organism document for Homo Sapiens, with 2 reference builds included

{
"_id": 9606,
"name": "Homo sapiens",
"build": ["GRCh37","GRCh38"]
}

Each organism should have the corresponding reference files stored in the appropriate file structure, with the top folder named after the corresponding taxonomy ID

example: Document tree for Homo Sapiens, with 2 reference builds included

./9606
├── GRCh37
│   ├── bioconductor
│   │   ├── GRCh37.fa -> ../seq/GRCh37.fa
│   │   ├── QDNAseq.GRCh37.100kbp.SR100.rds
│   ├── bowtie2
│   │   ├── GRCh37.1.bt2
│   │   ├── GRCh37.2.bt2
│   │   ├── GRCh37.3.bt2
│   │   ├── GRCh37.4.bt2
│   │   ├── GRCh37.fa -> ../seq/GRCh37.fa
│   │   ├── GRCh37.rev.1.bt2
│   │   └── GRCh37.rev.2.bt2
│   ├── seq
│   │   └── GRCh37.fa
└── GRCh38
    ├── bioconductor
    ├── bowtie2
    ├── seq
    └── wisecondor

Loading reference features

To be continued ...