basicQC

Snakemake and knitr based pipeline for quality control of sequencing reads.

View the Project on GitHub vlpb3/basicQC

BasicQC

Snakemake and knitr based pipeline for quality control of sequencing reads.

This is small Snakemake and knitr based pipeline. It generates the Quality Control report of sequencing reads in html format. Results are presented in the context of the whole experiment and not single sample. The pipeline requires reads in fastq or sff format and a design file.

Below you can find usage description, if you would like to learn about installation procedure, have a look at README in the repository.

Design file

The design file is meant to describe the experimental setup.

sampleid barcode runid color day tissue
tomato12 BC01 RID0020 red friday leaf
... ... ... ... ... ...

Design table is tab delimited file with a header. First 3 columns have to be sampleid, barcode and runid. There can by any number of columns. Data in the columns should be categorical, not numerical. Every row is a single sample. If the same sample was run in separate runs (same sampleid, different runid), the reads will be pulled together.

Organisation of the pipeline

Every analysis is encapsulated and run in the directory together with all the code necessary. Here is the description of the contents of basicQC analysis directory:

# basicQC analysis directory
# with all necessary Packrat related files and directories
.
├── basicQC.htm               # html file with QC report
├── basicQC.md                # markdown file genereated by knitr
├── basicQC.Rmd               # Rmarkdown defining the QC report
├── figure                    # generated plots
│   ├── A.design.plot-1.png
│   ├── C3plot-1.png
│   ├── D1plot-1.png
│   └── D2plot-1.png
├── get_experiment_fq.py      # script that imports read data form sff files into fq files
├── packrat                   # dir with R environment related files
│   ├── init.R                # script initializing Packrat environment
│   ├── lib                   #
│   ├── lib-ext               # R library
│   ├── lib-R                 #
│   ├── packrat.lock          # log of all the package versions used by this environment
│   ├── packrat.opts          # packrat options
│   └── src
├── raw                       # fastq files generated from sff files
│   ├── N406T.fastq
│   ├── N540T.fastq
│   ├── N583T.fastq
│   └── sknas.fastq
├── README.md                 # project readme file
├── seqdesign.txt             # experiment design file
└── Snakefile                 # pipeline defined for Snakemake

Please modify SFF_STORE constant in Snakemake file to match the location of .sff files generated by ion proton. If you want to start pipeline from .fastq files, put them in raw directory.

pacrat/src directory contains tarballs needed to build all necessary R packages if not available, Packrat will try to collect them from online resources pacrat/lib* directories contain installed R packages, they are build from source in src directory

Running the pipeline

To run the pipeline run snakemake from command line from within analysis directory. If using python virtual enviroment, activate it before running the pipeline.

snakemake

The result is basicQC.htm file. This single file encapsulates whole report and can be viewed in any modern web browser.