Code Monkey home page Code Monkey logo

vcf-validator's Introduction

vcf-validator Build Status

Validator for the Variant Call Format (VCF) implemented using C++11.

It includes all the checks from the vcftools suite, and some more that involve lexical, syntactic and semantic analysis of the VCF input. If any inconsistencies are found, they are classified in one of the following categories:

  • Errors: Violations of the VCF specification
  • Warnings: An indication that something weird happened (commas were used instead of colons to split ids) or a recommendation is not followed (missing meta-data)

Please read the wiki for more details about checks already implemented.

Download

We recommend using the latest release for the most stable experience using vcf-validator. Along with the release notes, you will find the executables vcf_validator and vcf_debugulator, which will allow you to validate and fix VCF files.

Run

Validator

vcf-validator only needs a non-compressed input VCF file to run, although pipes can be used for compressed files (see below). It accepts input in the following ways:

  • File path as argument: vcf_validator -i /path/to/file.vcf
  • Standard input: vcf_validator < /path/to/file.vcf
  • Standard input from pipe: zcat /path/to/file.vcf.gz | vcf_validator

The validation level can be configured using -l / --level. This parameter is optional and accepts 3 values:

  • error: Display only syntax errors
  • warning: Display both syntax and semantic, both errors and warnings (default)
  • stop: Stop after the first syntax error is found

Different types of validation reports can be written with the -r / --report option. Several ones may be specified in the same execution, using commas to separate each type (without spaces, e.g.: -r summary,database,text).

  • summary: Write a human-readable summary report to a file. This includes one line for each type of error and the number of occurrences, along with the first line that shows that type of error (default)
  • text: Write a human-readable report to a file, with one description line for each VCF line that has an error.
  • database: Write structured report to a database file. The database engine used is SQLite3, so the results can be inspected manually, but they are intended to be consumed by other applications.

Each report is written into its own file and it is named after the input file, followed by a timestamp. The default output directory is the same as the input file's if provided using -i, or the current directory if using the standard input; it can be changed with the -o / --outdir option.

Debugulator

There are some simple errors that can be automatically fixed. The most common error is the presence of duplicate variants. The needed parameters are the original VCF and the report generated by a previous run of the vcf_validator with the option -r database.

The fixed VCF will be written into the standard output, which you can redirect to a file, or use the -o / --output option and specify the desired file name.

The logs about what the debugulator is doing will be written into the error output. The logs may be redirected to a log file 2>debugulator_log.txt or completely discarded 2>/dev/null.

Examples

Simple example: vcf_validator -i /path/to/file.vcf

Full example: vcf_validator -i /path/to/file.vcf -l stop -r database,stdout -o /path/to/output/folder/

Debugulator example:

vcf_validator -i /path/to/file.vcf -r database -o /path/to/write/report/
vcf_debugulator -i /path/to/file.vcf -e /path/to/write/report/vcf.errors.timestamp.db -o /path/to/fixed.vcf 2>debugulator_log.txt

Static build (Docker-based)

The easiest way to build vcf-validator is using the Docker image provided with the source code. This will create an executable that can be run in any Linux machine.

  1. Install and configure Docker following their tutorial.
  2. Create the Docker image:
    1. Clone this Git repository: git clone https://github.com/EBIvariation/vcf-validator.git
    2. Move to the folder the code was downloaded to: cd vcf-validator
    3. Build the image: docker build -t ebivariation/vcf-validator docker/. Please replace ebivariation with your user account if you plan to push this image to Docker Hub.
  3. Build the executable running docker run -v ${PWD}:/tmp ebivariation/vcf-validator. Again, replace ebivariation with your user name if necessary.

The following executables will be created in the build/bin subfolder:

  • vcf_validator: validation tool
  • vcf_debugulator: automatic fixing tool
  • test_validator and derivatives: testing correct behaviour of the tools listed above

Dynamic build

Note: Please ignore this section if you only want to use the application.

The end-users build is perfectly valid during development to generate a static binary. Please follow the instructions below if you would like to generate a dynamically linked binary.

Dependencies

Boost

The dependencies are the Boost library core, and its submodules: Boost.filesystem, Boost.program_options, Boost.regex, Boost.log and Boost.system. If you are using Ubuntu, the required packages' names will be libboost-dev, libboost-filesystem-dev, libboost-program-options-dev, libboost-regex-dev and libboost-log-dev.

ODB

You will need to download the ODB compiler, the ODB common runtime library, and the SQLite database runtime library from this page.

ODB requires SQLite3 to be installed. If you are using Ubuntu, the required packages' names will be libsqlite3-0 and libsqlite3-dev.

To install the ODB compiler, the easiest way is to download the .deb or .rpm packages, in order to be installed automatically with dpkg. Both the ODB runtime and SQLite database runtime libraries can be installed manually running ./configure && make && sudo make install. This will install the libraries in /usr/local/lib.

If you don't have root permissions, please run ./configure --prefix=/path/to/odb/libraries/folder to specify which folder to install ODB in, then make && make install, without sudo.

Compile

The build has been tested on the following compilers:

  • Clang 3.5 to 3.7
  • GCC 4.8 to 5.0

In order to create the build scripts, please run cmake with your preferred generator. For instance, cmake -G "Unix Makefiles" will create Makefiles, and to build the binaries, you will need to run make. If the ODB libraries were not found during the build, please run sudo updatedb && sudo ldconfig.

For those users who need static linkage, the option -DBUILD_STATIC=1 must be provided to the cmake command. Also, if ODB has been installed in a non-default location, the option -DODB_PATH=/path/to/odb/libraries/folder must be also provided to the cmake command.

In any case, the following binaries will be created in the bin subfolder:

  • vcf_validator: validation tool
  • vcf_debugulator: automatic fixing tool
  • test_validator and derivatives: testing correct behaviour of the tools listed above

Tests

Unit tests can be run using the binary bin/test_validator or, if the generator supports it, a command like make test. The first option may provide a more detailed output in case of test failure.

Note: Tests that require input files will only work when executed with make test or running the binary from the project root folder (not the bin subfolder).

Generate code from descriptors

Code generated from descriptors shall be always up-to-date in the GitHub repository. If changes to the source descriptors were necessary, please generate the Ragel machines C code from .ragel files using:

ragel -G2 src/vcf/vcf_v41.ragel -o inc/vcf/validator_detail_v41.hpp
ragel -G2 src/vcf/vcf_v42.ragel -o inc/vcf/validator_detail_v42.hpp
ragel -G2 src/vcf/vcf_v43.ragel -o inc/vcf/validator_detail_v43.hpp

And the full ODB-based code from the classes definitions using:

odb --include-prefix vcf --std c++11 -d sqlite --generate-query --generate-schema --hxx-suffix .hpp --ixx-suffix .ipp --cxx-suffix .cpp --output-dir inc/vcf/ inc/vcf/error.hpp
mv inc/vcf/error-odb.cpp src/vcf/error-odb.cpp

vcf-validator's People

Contributors

anishka0107 avatar cyenyxe avatar jmmut avatar maneshnarayan avatar sambrightman avatar srbcheema1 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.