isa-tools / biopy-isatab Goto Github PK
View Code? Open in Web Editor NEWPython parser for ISAtab, a biological file format for experimental metadata
License: MIT License
Python parser for ISAtab, a biological file format for experimental metadata
License: MIT License
There are no restrictions in the category names in ISA-TAB and the parser fails when they are not valid field names for the namedtuple.
rec = isatab.parse("/Users/agbeltran/workspace/datasets/harvard-CD133")
Traceback (most recent call last):
File "", line 1, in
File "bcbio/isatab/parser.py", line 57, in parse
rec = s_parser.parse(rec)
File "bcbio/isatab/parser.py", line 192, in parse
["Raw Data File", "Derived Data File", "Image File"])
File "bcbio/isatab/parser.py", line 228, in _parse_study
node.metadata)
File "bcbio/isatab/parser.py", line 246, in _line_keyvals
self._collapse_attributes)
File "bcbio/isatab/parser.py", line 260, in _line_by_type
val = collapse_quals_fn(line, header, hgroups[index])
File "bcbio/isatab/parser.py", line 275, in _collapse_attributes
Attrs = collections.namedtuple('Attrs', names)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/collections.py", line 253, in namedtuple
ValueError: Type names and field names cannot start with a number: '57B_antigen_expression_by_IHC'
Please specify the license of this code. Check license compatibility in case parts of the codebase
were taken from other code (ignore the last bit if it was written from scratch). Thanks, Steffen
When trying to install biopy-isatab via pip I run into an AttributeError when it tries to create the install egg. We would like to try using this package in a deployed virtual environment via pip. Can someone please take a look at this?
Thx!
C:\Dropbox\Code\isa-tab>pip install biopy-isatab
Downloading/unpacking biopy-isatab
Downloading biopy-isatab-0.1.tar.gz
Running setup.py egg_info for package biopy-isatab
Traceback (most recent call last):
File "<string>", line 14, in <module>
File "C:\Dropbox\Code\isa-tab\build\biopy-isatab\setup.py", line 16, in <module>
install_requires = [
File "C:\Python27\lib\distutils\core.py", line 152, in setup
dist.run_commands()
File "C:\Python27\lib\distutils\dist.py", line 953, in run_commands
self.run_command(cmd)
File "C:\Python27\lib\distutils\dist.py", line 972, in run_command
cmd_obj.run()
File "<string>", line 12, in replacement_run
File "C:\Python27\lib\site-packages\setuptools\command\egg_info.py", line 254, in find_sources
mm.run()
File "C:\Python27\lib\site-packages\setuptools\command\egg_info.py", line 310, in run
self.read_template()
File "C:\Python27\lib\site-packages\setuptools\command\sdist.py", line 209, in read_template
sys.exc_info()[2].tb_next.tb_frame.f_locals['template'].close()
File "C:\Python27\lib\distutils\text_file.py", line 128, in close
self.file.close ()
AttributeError: 'NoneType' object has no attribute 'close'
Complete output from command python setup.py egg_info:
running egg_info
creating pip-egg-info\biopy_isatab.egg-info
writing pip-egg-info\biopy_isatab.egg-info\PKG-INFO
writing namespace_packages to pip-egg-info\biopy_isatab.egg-info\namespace_packages.txt
writing top-level names to pip-egg-info\biopy_isatab.egg-info\top_level.txt
writing dependency_links to pip-egg-info\biopy_isatab.egg-info\dependency_links.txt
writing manifest file 'pip-egg-info\biopy_isatab.egg-info\SOURCES.txt'
warning: manifest_maker: standard file '-c' not found
reading manifest file 'pip-egg-info\biopy_isatab.egg-info\SOURCES.txt'
reading manifest template 'MANIFEST.in'
Traceback (most recent call last):
File "<string>", line 14, in <module>
File "C:\Dropbox\Code\isa-tab\build\biopy-isatab\setup.py", line 16, in <module>
install_requires = [
File "C:\Python27\lib\distutils\core.py", line 152, in setup
dist.run_commands()
File "C:\Python27\lib\distutils\dist.py", line 953, in run_commands
self.run_command(cmd)
File "C:\Python27\lib\distutils\dist.py", line 972, in run_command
cmd_obj.run()
File "<string>", line 12, in replacement_run
File "C:\Python27\lib\site-packages\setuptools\command\egg_info.py", line 254, in find_sources
mm.run()
File "C:\Python27\lib\site-packages\setuptools\command\egg_info.py", line 310, in run
self.read_template()
File "C:\Python27\lib\site-packages\setuptools\command\sdist.py", line 209, in read_template
sys.exc_info()[2].tb_next.tb_frame.f_locals['template'].close()
File "C:\Python27\lib\distutils\text_file.py", line 128, in close
self.file.close ()
AttributeError: 'NoneType' object has no attribute 'close'
----------------------------------------
Command python setup.py egg_info failed with error code 1
Storing complete log in C:\Users\felciano\AppData\Roaming\pip\pip.log
C:\Dropbox\Code\isa-tab>pip install biopy-isatab
Error when parsing Study or Assay files with repeated header names (as Term Source REF)
The code builds a named tuple to store the attributes in multiple columns and named tuples don't allow duplicates.
Some output for a few datasets below.
Error with Yox1 data
Python 2.7.1 (r271:86832, Jul 31 2011, 19:30:53)
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
from bcbio import isatab
rec = isatab.parse("/Users/agbeltran/workspace/datasets/Yox1")
Traceback (most recent call last):
File "", line 1, in
File "bcbio/isatab/parser.py", line 57, in parse
rec = s_parser.parse(rec)
File "bcbio/isatab/parser.py", line 192, in parse
["Raw Data File"])
File "bcbio/isatab/parser.py", line 228, in _parse_study
node.metadata)
File "bcbio/isatab/parser.py", line 248, in _line_keyvals
self._collapse_attributes)
File "bcbio/isatab/parser.py", line 260, in _line_by_type
val = collapse_quals_fn(line, header, hgroups[index])
File "bcbio/isatab/parser.py", line 275, in _collapse_attributes
Attrs = collections.namedtuple('Attrs', names)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/collections.py", line 259, in namedtuple
ValueError: Encountered duplicate field name: 'Term_Source_REF'
Error with BII-S-6
rec = isatab.parse("/Users/agbeltran/workspace/datasets/BII-S-6")
Traceback (most recent call last):
File "", line 1, in
File "bcbio/isatab/parser.py", line 57, in parse
rec = s_parser.parse(rec)
File "bcbio/isatab/parser.py", line 185, in parse
["Sample Name", "Comment[ENA_SAMPLE]"])
File "bcbio/isatab/parser.py", line 228, in _parse_study
node.metadata)
File "bcbio/isatab/parser.py", line 248, in _line_keyvals
self._collapse_attributes)
File "bcbio/isatab/parser.py", line 260, in _line_by_type
val = collapse_quals_fn(line, header, hgroups[index])
File "bcbio/isatab/parser.py", line 275, in _collapse_attributes
Attrs = collections.namedtuple('Attrs', names)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/collections.py", line 259, in namedtuple
ValueError: Encountered duplicate field name: 'Term_Source_REF'
Error with mtbls2
rec = isatab.parse("/Users/agbeltran/workspace/datasets/mtbls2")
Traceback (most recent call last):
File "", line 1, in
File "bcbio/isatab/parser.py", line 57, in parse
rec = s_parser.parse(rec)
File "bcbio/isatab/parser.py", line 192, in parse
["Raw Data File"])
File "bcbio/isatab/parser.py", line 228, in _parse_study
node.metadata)
File "bcbio/isatab/parser.py", line 248, in _line_keyvals
self._collapse_attributes)
File "bcbio/isatab/parser.py", line 260, in _line_by_type
val = collapse_quals_fn(line, header, hgroups[index])
File "bcbio/isatab/parser.py", line 275, in _collapse_attributes
Attrs = collections.namedtuple('Attrs', names)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/collections.py", line 259, in namedtuple
ValueError: Encountered duplicate field name: 'Term_Source_REF'
The parser doesn't adequately handle reading UTF8 characters in input strings.
There looks like a fix in forked repo in this commit: timeu@a2cd7b9 but need to test if the fix works as expected.
Currently, as stated in the comments, the code is biased towards microarray and next-gen sequencing data. The nodes considered are those containing a "Raw Data File". This needs to be extended to support other assay types.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.