Code Monkey home page Code Monkey logo

maf-lib's Introduction

GitHub tag (latest SemVer) Language GitHub branch checks state Black

MAF-LIB: the Mutation Annotation Format Library

Python API library and command line tools for GDC MAF files.

Contributing

Read how to contribute here.

Building

To clone the repository: git clone [email protected]:NCI-GDC/maf-lib.git.

To install locally: python setup.py install.

API

The MAF API can be found in maflib/. Documentation can be generated using your favorite python documentation tool, such as pydoc.

The library includes but is not limited to the following modules:

Module Description
maflib.reader a module for reading MAF file
maflib.writer a module for writing MAF files
maflib.header a module for data stored in a MAF header, including but not limited to the version, annotation specification, sort order, and column names
maflib.record a module to store a single line of the MAF file, or more specifically, the annotation values for a single mutation
maflib.column a module to store a possibly typed column value for a MAF record
maflib.column_types a module containing custom types for columns in a MAF file
maflib.column_values a module containing custom enumeration values for columns in a MAF file
maflib.schemes a module containing the schemes for a MAF file, determining the number of columns, their names, and their expected values
maflib.sort_order a module containing the available sort orders to order records in a MAF file.
maflib.sorter a module containing an implementation of a disk-backed sorting system.
maflib.validation a module for the underlying validation of values stored in MAF files.
maflib.overlap_iter a module containing an implementation of an iterator over MAF records that overlap across multiple MAF files.
maflib.locatable a module containing interfaces for "locatable" MAF records, namely those that have a genomic span

The GDC has specific "schemes" that determine the number of columns, their names, and their expected values. Pre-defined schemes can be found in the src/maflib/resources directory. The following schemes are natively supported:

Scheme Name Type Inherits From
gdc.1.0.0 Basic None
gdc.1.0.0-protected Protected gdc.1.0.0
gdc.1.0.0-public Public gdc.1.0.0-protected
gdc.1.0.1-protected Protected gdc.1.0.0
gdc.1.0.1-public Public gdc.1.0.1-protected

maf-lib's People

Contributors

czyszctds avatar kmhernan avatar nh13 avatar tzuni avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

maf-lib's Issues

Reader with any location sorting doesn't allow for passing of contigs

If I read BarcodeAndCoordinate sorted MAF that used the fai file for chromosome order, an exception will be thrown because there is no way to pass the fai file to the reader. I think that there are two things that should be done:

  • All coordinate sortings should be able to place a , list of chromosomes in the header on the sort_order pragma if a fai file was passed, otherwise it will use the default python-style sorting
  • Potentially, add the ability of reader to accept a fai file and/or pass assume_sorted=True to it

๐Ÿ› Error in sorted MAFs

I am getting an ValueError: Records out of order exception on BarcodeAndCoordinate sorted MAFs at this junction. It seems to give the diff of -1 incorrectly handling chr10 vs chr9.

chr9	133106620	133106621
chr10	13229462	13229473

Sorting error

I was using the sorting function and got this error:

$ cat TCGA.MESO.mutect.182992a2-feb3-45ff-ba4e-45512d5855b9.DR-7.0.protected.sort.logs
2017-10-03 15:04:41,653 - maflib.Sort - INFO - Sorted 10000 records
2017-10-03 15:05:07,162 - maflib.Sort - INFO - Sorted 20000 records
2017-10-03 15:05:32,719 - maflib.Sort - INFO - Sorted 30000 records
2017-10-03 15:05:58,667 - maflib.Sort - INFO - Sorted 40000 records
2017-10-03 15:06:24,762 - maflib.Sort - INFO - Sorted 50000 records
2017-10-03 15:06:51,710 - maflib.Sort - INFO - Sorted 60000 records
2017-10-03 15:07:17,683 - maflib.Sort - INFO - Sorted 70000 records
2017-10-03 15:07:43,562 - maflib.Sort - INFO - Sorted 80000 records
2017-10-03 15:08:09,488 - maflib.Sort - INFO - Sorted 90000 records
2017-10-03 15:08:41,845 - maflib.Sort - INFO - Sorted 100000 records
2017-10-03 15:09:07,368 - maflib.Sort - INFO - Sorted 110000 records
2017-10-03 15:09:32,743 - maflib.Sort - INFO - Sorted 120000 records
2017-10-03 15:09:57,873 - maflib.Sort - INFO - Sorted 130000 records
2017-10-03 15:10:23,483 - maflib.Sort - INFO - Sorted 140000 records
2017-10-03 15:10:49,792 - maflib.Sort - INFO - Sorted 150000 records
2017-10-03 15:11:15,333 - maflib.Sort - INFO - Sorted 160000 records
2017-10-03 15:11:41,610 - maflib.Sort - INFO - Sorted 170000 records
2017-10-03 15:12:06,942 - maflib.Sort - INFO - Sorted 180000 records
2017-10-03 15:12:31,848 - maflib.Sort - INFO - Sorted 190000 records
2017-10-03 15:13:03,320 - maflib.Sort - INFO - Sorted 200000 records
2017-10-03 15:13:28,799 - maflib.Sort - INFO - Sorted 210000 records
2017-10-03 15:13:54,099 - maflib.Sort - INFO - Sorted 220000 records
2017-10-03 15:14:20,160 - maflib.Sort - INFO - Sorted 230000 records
2017-10-03 15:14:47,038 - maflib.Sort - INFO - Sorted 240000 records
2017-10-03 15:15:12,598 - maflib.Sort - INFO - Sorted 250000 records
2017-10-03 15:15:37,455 - maflib.Sort - INFO - Sorted 260000 records
2017-10-03 15:16:02,750 - maflib.Sort - INFO - Sorted 270000 records
2017-10-03 15:16:27,636 - maflib.Sort - INFO - Sorted 280000 records
2017-10-03 15:16:53,133 - maflib.Sort - INFO - Sorted 290000 records
2017-10-03 15:17:24,255 - maflib.Sort - INFO - Sorted 300000 records
2017-10-03 15:17:50,282 - maflib.Sort - INFO - Sorted 310000 records
2017-10-03 15:18:15,539 - maflib.Sort - INFO - Sorted 320000 records
2017-10-03 15:18:41,324 - maflib.Sort - INFO - Sorted 330000 records
2017-10-03 15:19:06,918 - maflib.Sort - INFO - Sorted 340000 records
2017-10-03 15:19:32,443 - maflib.Sort - INFO - Sorted 350000 records
2017-10-03 15:19:57,847 - maflib.Sort - INFO - Sorted 360000 records
2017-10-03 15:20:23,287 - maflib.Sort - INFO - Sorted 370000 records
2017-10-03 15:20:48,656 - maflib.Sort - INFO - Sorted 380000 records
2017-10-03 15:21:13,965 - maflib.Sort - INFO - Sorted 390000 records
2017-10-03 15:21:46,274 - maflib.Sort - INFO - Sorted 400000 records
2017-10-03 15:22:12,174 - maflib.Sort - INFO - Sorted 410000 records
2017-10-03 15:22:19,887 - maflib.Sort - INFO - Sorted 413011 records
Traceback (most recent call last):
  File "/home/ubuntu/.virtualenvs/maflib-p3-prod/bin/maftools", line 11, in <module>
    sys.exit(main())
  File "/home/ubuntu/.virtualenvs/maflib-p3-prod/lib/python3.5/site-packages/maftools/__main__.py", line 46, in main
    options.func(options)
  File "/home/ubuntu/.virtualenvs/maflib-p3-prod/lib/python3.5/site-packages/maftools/subcommand.py", line 35, in main
    return cls.__main__(options)
  File "/home/ubuntu/.virtualenvs/maflib-p3-prod/lib/python3.5/site-packages/maftools/sort.py", line 78, in __main__
    for record in sorter:
  File "/home/ubuntu/.virtualenvs/maflib-p3-prod/lib/python3.5/site-packages/maflib/sorter.py", line 70, in __iter__
    key_func=self._key_func)
  File "/home/ubuntu/.virtualenvs/maflib-p3-prod/lib/python3.5/site-packages/maflib/sorter.py", line 288, in __init__
    s_iter = _SortedIterator(path=path, codec=codec, key_func=key_func)
  File "/home/ubuntu/.virtualenvs/maflib-p3-prod/lib/python3.5/site-packages/maflib/sorter.py", line 220, in __init__
    self._handle = gzip.open(path, mode="rb", compresslevel=5)
  File "/usr/lib/python3.5/gzip.py", line 53, in open
    binary_file = GzipFile(filename, gz_mode, compresslevel)
  File "/usr/lib/python3.5/gzip.py", line 163, in __init__
    fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/tmp/tmpypjo842m.gz'

tumor depth column type

The tumor depth column is sometimes 0 from MuTect2 so it should be allowed. In both 1.0.0 and 1.0.1 annotation specs, this:

[ "t_depth", "OneBasedIntegerColumn" ]

should be ZeroBasedIntegerColumn

setup.py missing

In the repo, there is no setup.py. Could you please add setup.py to repo?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.