General

Abstract

Structural genomic variations play an important role in human disease and phenotypic diversity. With the rise of high-throughput sequencing tools, mate-pair/paired-end sequencing has become an important technique for the detection and exploration of structural variation. ViVar is a comprehensive analysis platform for the processing, analysis and visualization of structural variation based on sequencing data or genomic microarrays, enabling the rapid identification of disease loci or genes. Vivar allows you to scale your analysis with your work load over multiple (cloud) servers, has user access control to keep your data safe but still easy to share, and is easy expandable as analysis techniques advance.

Data

To help you explore ViVar we loaded some demo experiments, using real (non-simulated) human data. When you're logged in, you can click the links in the table below to zoom to interresting genomic regions of the sample experiments.

Experiment Description Links
Patient 12 Mate-pair sequencing data of duplication and deletion on chromosome 1 View chr1 region
Multiple Patients Clustering analysis detecting a balanced abberation, an inversion. This would not be detectable by coverage or genomic microarray analysis. View chr4 region
Patient 28 Mate-pair sequencing data, and genomic microarray data of a complex duplications on the long arm of the X chromosome View chrX region
Patient 29 Mate-pair sequencing data, and genomic microarray data of a complex trisomy 21 View chr21 region

Technical Documentation

1. Requirements:

- General

ViVar is a php5 application powered by the Slim framework and the Twig templating engine. It is available as a stand alone webapplication or can be served through a Docker container . The ViVar docker image is an easy distributable installation of the ViVar platform, based on the official Nginx image available at Docker Hub, which is again based on Debian:jessie. All internal processes are monitored by Supervisord and the whole depends on a separately run Mongo database.

- Hard- and Software

The ViVar webapp requires three separate systems to run.
The website front runs as a php5 application and thus requires the following packages to be installed, available through apt-get on Ubuntu, Debian, etc.

All of these requirements are readily installed inside the docker image.

The analysis back-end requires a compute cluster with Torque. This cluster should have access to a folder shared with the webserver where the job submission scripts can be written. A shared folder for raw data to be analyzed should also be available. Software-wise, the cluster must have the following executables installed and available in the PATH environmental value:

To accept job submissions from an external source, a daemon app must be run on the submit node of the cluster. The daemon script can be generated with the "submitdaemon.sh" script found in the "daemon" folder in the vivar source code. This script can be run as follows.
bash submitdaemon.sh "queueName" "queueServer" "queueUser" 
This script will also install the Mojolicious perl libraries. To use the daemon, start a screen session and execute:
morbo vivar_submitdaemon.pl & 

The whole webapplication uses a Mongo database as backbone for data storage.

2. Installation:

- Docker Image (recommended)

Inside the container Nginx is listening at port 80 which can be mounted to a port of your choice. A graphical user interface for supervisord is made available at port 9002. A Mongo database can be run locally or as a separate Docker container. To link a local or existing Mongo database, mount the appropriate port to 21017 in the ViVar container.

In a Docker container, data is not persistent. To avoid loss of data, you can mount /vivar and /vivar-data to your local filesystem. The /vivar folder contains all source pre and configuration files for the web application. /vivar-data can be used to store and import data files.

CAVEAT: This Docker image is a work in progress. While efforts have been made to automate as much of the configuration as possible, things still can go wrong. We advise to check the configuration files at /vivar/config.ini and /vivar/jobs/general.json for errors if any problems might occur.

1. Make sure a Mongo database is available. This can be either run locally or this can be a docker container.
2. Run the ViVar webapp.
example: ViVar with MongoDB run locally and mounted data volumes
docker run --name=vivar -d -p 8080:80 -p 9002:9002 -e VIVAR_API="172.17.0.1:8080" MONGO_HOST="your_mongo_host" -v /foo/bar/vivar:/vivar/ -v /foo/bar/vivar-data:/vivar-data/ vivar
Available environmental values to edit
Variable Default Synopsis
VIVAR_API 172.17.0.1:8080 This is the adress and port where ViVar listens for API calls, normally this will be the ip adress of the docker0 interface when using the command ip a.
MONGO_HOST 127.0.0.1 Hostname of the machine where the mongo database listens for connections. When using a linked container, this value is set automatically.
MONGO_PORT 27017 Port where where the mongo database listens for connections.
PBS_HOST 127.0.0.1 FQDN of the host where the job submission daemon listens.
PBS_PORT 3000 Port at which the job submission daemon listens.
PBS_SERVER pbsqueue Hostname of the PBS server.
PBS_QUEUE batch Name of the queue where the jobs can be submitted.
HOST_DATADIR /mnt/vivar-data Directory on the docker host where the /vivar-data volume is mounted.
3. OPTIONAL Run a graphical MongoDB user interface of your choosing. Here we use a pre-made docker image for mongo-express.
A list of available options can be found in the MongoDB documentation .

example:
docker run --name express -d -p 8081:8081 -e ME_CONFIG_MONGODB_SERVER="your_mongo_host" knickers/mongo-express

Database entries

Reference Organisms

It is possible to manually add new reference organisms. To add a new organism of reference build, edit or add the appropriate field to the database. New organism documents should be formatted as follows:

example: Organism document for Homo Sapiens, with 2 reference builds included
{
"_id": 9606,
"name": "Homo sapiens",
"build": ["GRCh37","GRCh38"]
}

Each organism should have the corresponding reference files stored in the appropriate file structure, with the top folder named after the corresponding taxonomy ID

example: Document tree for Homo Sapiens, with 2 reference builds included
./9606
├── GRCh37
│   ├── bioconductor
│   │   ├── GRCh37.fa -> ../seq/GRCh37.fa
│   │   ├── QDNAseq.GRCh37.100kbp.SR100.rds
│   ├── bowtie2
│   │   ├── GRCh37.1.bt2
│   │   ├── GRCh37.2.bt2
│   │   ├── GRCh37.3.bt2
│   │   ├── GRCh37.4.bt2
│   │   ├── GRCh37.fa -> ../seq/GRCh37.fa
│   │   ├── GRCh37.rev.1.bt2
│   │   └── GRCh37.rev.2.bt2
│   ├── seq
│   │   └── GRCh37.fa
└── GRCh38
    ├── bioconductor
    ├── bowtie2
    ├── seq
    └── wisecondor
Loading reference features

To be continued ...