cgatoxford / cgat Goto Github PK

View Code? Open in Web Editor NEW

124.0 124.0 66.0 351.37 MB

Do not use - please refer to our newest code: https://github.com/cgat-developers/cgat-apps

License: BSD 3-Clause "New" or "Revised" License

Python 71.57% C++ 23.20% C 0.43% R 2.57% Perl 0.61% Shell 0.43% Makefile 0.03% Jupyter Notebook 1.14% Dockerfile 0.01%

cgat's People

Contributors

Stargazers

Watchers

Forkers

siping tanglingfung pombredanne hjanime bioinformaticsarchive jjjscuedu yangjl lesheng radaniba jmadzo thiagomaframg santayana charlie-george orianna14 eromasko kuoming bioxiao q-kim fw1121 xyuan danknight mmaarriiee silviettapar sudlab prasoonnema xizhou sea200k mint1234 eveand jaquol gjaime scv malegria01 ltcguthrie aminzargar kiran0991 861934367 cggoxford gsc0107 baiyuanxiang mparker2 wangzhennan14 mfilipav kathrinjansen cwt1 kevinrue garthfisher flopezo blumroy bixbeta juliatitaeva tbrown91 dnyansagar mmzvt springtan samyyhe wangdi2014 lpinuer rosalinha hulb0203 lennartraman yirenheihei animesh xiaoyulei0406

cgat's Issues

Error compiling Cython files is causing Travis to fail

Error when compiling Cython files is causing Travis to fail
(see https://s3.amazonaws.com/archive.travis-ci.org/jobs/30447970/log.txt)

The error below appears:

cythoning scripts/_bam2peakshape.pyx to scripts/_bam2peakshape.c

Error compiling Cython file:

...
self.shift = shift

#################################################
# bigwig versions
def coverageInInterval( self,
                         Samfile samfile,
                        ^

Documentation - chain2psl

Andreas has been assigned to look at the documentaiton for the above script, which hasn't been imporoved before. This doesn't mean that there is anything wrong, just that it hasn't been checked.

Our current notes are:
needs reformat and usage examples
Expand all areas

Please see the documentation style guide.

Please pass back to charlie and katie for checking when finished.

Documentation - fastq2table

Ian has been assigned to look at the documentaiton for the above script, which hasn't been imporoved before. This doesn't mean that there is anything wrong, just that it hasn't been checked.

Our current notes are:
just need reformat
Example input and output, describe options

Please see the documentation style guide.

Please pass back to charlie and katie for checking when finished.

Pep8 violations - bam2feature.py

Documentation - diff_chains

David has been assigned to look at the documentaiton for the above script, which hasn't been imporoved before. This doesn't mean that there is anything wrong, just that it hasn't been checked.

Our current notes are:
needs example usage and reformat

Please see the documentation style guide.

Please pass back to charlie and katie for checking when finished.

Expression.py replicate detection

Currently Expression.py detects if a data set has replicates by look at the size of the smallest group. This means that in the situation of "partial replication", data is detected as replicated. However, the DESeq manual suggests treating partially replicated data as if it were replicated (using pooled variance estimation shares information across conditions anyway, so the replicated condition benefits form the the replicates in the other condition).

I suggest the following line in Expression.py:

 min_per_group = R('''min(table(groups)) ''')[0]

has_replicates = min_per_group >= 2

become:

 max_per_group = R('''max(table(groups)) ''')[0]

 has_replicates = max_per_group >= 2

Another alternative would be to return one of three values: Unreplicated, partially replicated or replicated and deal with each case seperately (for eg enforcing pooled variance estimation), but this is probably unneccsary and the risks inherent in partially replicated data can be left to the user.

Also when dispersion method is 'blind', Expressoin.py fails because it tries to plot dispersion estimates assuming that there are multiple dispersion estimates because it only checks that mode isn't "pooled".

The gulity lines are:

 if dispersion_method == "pooled":
        R.png('''%sdispersion_estimates_pooled.png''' %
              (outfile_prefix))
        R.plotDispEsts(cds)
        R['dev.off']()
    else:
        dispersions = R('''ls(cds@fitInfo)''')
        for dispersion in dispersions:
            R.png('%sdispersion_estimates_%s.png' %
                  (outfile_prefix, dispersion))
        R.plotDispEsts(cds, name=dispersion)
        R['dev.off']()

This fails as if dispersion_method == "blind" then dispersions is empty.

I have a pull reqiuest ready for the first issue if the change is agreed.

Documentation - gff2psl

Tom has been assigned to look at the documentaiton for bed2stats, which hasn't been imporoved before. This doesn't mean that there is anything wrong, just that it hasn't been checked.

Our current notes are:
needs minimal level of documentation
Expand options documentation and usage

Please see the documentation style guide.

Please pass back to charlie and katie for checking when finished.

Documentation - bed2stats

You've been assigned to look at the documentaiton for bed2stats, which hasn't been imporoved before.

Our current notes are:
more detail for options
Example input and output, describe options

Please pass back to charlie and katie for checking when finished.

index_genome verification using a compressed index

index_fasta --verify works fine when both the reference database and comparator are uncompressed. However, if the reference database is compressed the process fails with the traceback:

python ../../scripts/index_fasta.py test3 --verify=test1
# output generated by ../../scripts/index_fasta.py test3 --verify=test1
# job started at Wed Apr  2 18:11:36 2014 on fgu070.anat.ox.ac.uk -- d508d4c7-9139-4a8f-9786-751488bc61ca
# pid: 5334, system: Linux 2.6.32-358.23.2.el6.x86_64 #1 SMP Wed Oct 16 11:13:47 CDT 2013 x86_64
# allow_duplicates                        : False
# benchmark                               : False
# benchmark_fragment_size                 : 1000
# benchmark_num_iterations                : 1000000
# clean_sequence                          : False
# compress_index                          : False
# compression                             : None
# extract                                 : None
# file_format                             : auto
# force                                   : False
# input_format                            : zero-both-open
# loglevel                                : 1
# random_access_points                    : 0
# random_seed                             : None
# regex_identifier                        : None
# stderr                                  : <open file \'<stderr>\', mode \'w\' at 0x7ff52ff4e270>
# stdin                                   : <open file \'<stdin>\', mode \'r\' at 0x7ff52ff4e150>
# stdlog                                  : <open file \'<stdout>\', mode \'w\' at 0x7ff52ff4e1e0>
# stdout                                  : <open file \'<stdout>\', mode \'w\' at 0x7ff52ff4e1e0>
# synonyms                                : None
# timeit_file                             : None
# timeit_header                           : None
# timeit_name                             : all
# translator                              : None
# verify                                  : test1
# verify_fragment_size                    : 100
# verify_num_iterations                   : 100000
verifying test3.gz and test1.fasta using 100000 random segments of length 100
Traceback (most recent call last):
  File "../../scripts/index_fasta.py", line 251, in <module>
    sys.exit(main())
  File "../../scripts/index_fasta.py", line 208, in main
    stdout=options.stdout)
  File "/ifs/devel/Ian/cgat/CGAT/IndexedFasta.py", line 1162, in verify
    contig, strand, start, end = fasta1.getRandomCoordinates(fragment_size)
  File "/ifs/devel/Ian/cgat/CGAT/IndexedFasta.py", line 895, in getRandomCoordinates
    pos_id, pos_seq, lcontig = struct.unpack("QQi", data)

If the reference is uncompressed by the comparator is compressed, then the test starts without error but runs for a very long time (may never finish)

Ian

pipeline_mapping function buildTranscriptLevelReadCounts

Hello,

When I ran this function, even if I use the --local, the function use the cluster. I think the issue come from the list statements (in others function it is just a string statement (e.g. buildExonValidation)). P.run() use the cluster in the case of list.

It is normal behavior for this funcion.

Many Thanks

Cynthia
Cynthia

Documentation - bed2table

Ian has been assigned to look at the documentaiton for bed2stats, which hasn't been imporoved before. This doesn't mean that there is anything wrong, just that it hasn't been checked.

Our current notes are:
needs reformat and usage examples
Document options and usage example

Please see the documentation style guide.

Please pass back to charlie and katie for checking when finished.

Documenation bam_vs_bam

Antonio has been assigned to look at the documentaiton for the above script, which hasn't been imporoved before. This doesn't mean that there is anything wrong, just that it hasn't been checked.

Our current notes are:
needs reformat and usage examples
Expand usage and example

Please see the documentation style guide.

Please pass back to charlie and katie for checking when finished.

bam_vs_bed

Why is bam_vs_bed called bam_vs_bed when the same job, when performed for gtf files is done using gtf2table -c counts?

Also what is bam_vs_bed doing that coverageBed doesn't do?

set-gene-to-transcript in gtf2gtf.py fails

with error message

raceback (most recent call last):
File "/ifs/devel/Ian/cgat/scripts/gtf2gtf.py", line 1291, in
sys.exit(main(sys.argv))
File "/ifs/devel/Ian/cgat/scripts/gtf2gtf.py", line 638, in main
gff.gene_id = gff.transcript_id
AttributeError: 'pysam.TabProxies.GTFProxy' object has no attribute 'gene_id'

AssertionError: lca2table scripts/modules - ImportError

Script lca2table can not be imported

Documentation diff_bed

Jethro has been assigned to look at the documentaiton for the above script, which hasn't been imporoved before. This doesn't mean that there is anything wrong, just that it hasn't been checked.

Our current notes are:
needs reformat and usage examples
Expand usage and example

Please see the documentation style guide.

Please pass back to charlie and katie for checking when finished.

AssertionError: pipeline_idr scripts/modules - Exception

pipeline_idr can not be imported

Documentation - fastqs2fastqs

Ian has been assigned to look at the documentaiton for bed2stats, which hasn't been imporoved before. This doesn't mean that there is anything wrong, just that it hasn't been checked.

Our current notes are:
options need documenting

Please see the documentation style guide.

Please pass back to charlie and katie for checking when finished.

Documentation - bam2stats

Jethro has been assigned to look at the documentaiton for the above script, which hasn't been imporoved before. This doesn't mean that there is anything wrong, just that it hasn't been checked.

Our current notes are:
needs examples and more option description
Example input and output, describe options

Please see the documentation style guide.

Please pass back to charlie and katie for checking when finished.

PipelineIDR.py - pep8 fails

also:

pipeline_idr.py
pipeline_idr/trackers/IDR.py

pairs_mapped counting in bam2stats.py

bam2stats is counting read_pairs incorrectly.

The value provided is higher than the result of the command:

samtools view -f 64 -F 12 <bam_file> | wc -l

(this is requiring first in pair, and not unmapped or mate unmapped)

samfiles may contain multiple alignments per read, and so it is possible that there may be many alignments per read, but the samtools command should provide an upper bound.

For example:

for the file /ifs/projects/proj014/illumina1_4c_run2/Mapping.dir/bwa.dir/91-X-1.bwa.bam,

The samtools method gives 1,758,640 mapped pairs
The bam2stats method gives 1,953,653 mapped pairs
Picard reports 1,955,422 READS_ALIGNED_IN_PAIRS

Given that all three of these methods disagree, at least two fo them must be wrong, or counting something different.

--matrix-format in bam2geneprofile

In the last commit to bam2geneprofile, the --matrix-format option seems to have dissappeared. Is this deliberate? Is is documented? If its deliberate, then it needs to be removed from the tests as they are now failing with the error 'no option --matrix-format'.

Documentation - fastq2fastq

Mike has been assigned to look at the documentaiton for bed2stats, which hasn't been imporoved before. This doesn't mean that there is anything wrong, just that it hasn't been checked.

Our current notes are:
Needs options
Document options and usage example

Please see the documentation style guide.

Please pass back to charlie and katie for checking when finished.

Documentation - bed2fasta

Tom has been assigned to look at the documentaiton for bed2stats, which hasn't been imporoved before. This doesn't mean that there is anything wrong, just that it hasn't been checked.

Our current notes are:
needs example from input and output
Example input and output, describe options, give correct script name (is it bed2stat or bed2table?)

Please see the documentation style guide.

Please pass back to charlie and katie for checking when finished.

Documentation gff2gff

Tom has been assigned to look at the documentaiton for the above script, which hasn't been imporoved before. This doesn't mean that there is anything wrong, just that it hasn't been checked.

Our current notes are:
maybe expand usage example

Please see the documentation style guide.

Please pass back to charlie and katie for checking when finished.

Documentation - fasta2bed

Antonio has been assigned to look at the documentaiton for the above script, which hasn't been imporoved before. This doesn't mean that there is anything wrong, just that it hasn't been checked.

Our current notes are:
Looks OK, needs options and example documented
Document options

Please see the documentation style guide.

Please pass back to charlie and katie for checking when finished.

bam2fastq - use output pattern

Currently bam2fastq must be run with three positional arguements - an input bam and an output bam, should we allow the use of a --output-pattern like for fastqs2fastqs ?

gtf2table swapping spliced and unspliced reads

Running gtf2table with
gtf2table.py -b bamfile.bam -c read-counts -L /dev/null --library-type firststrand

is producing output where the number of outputed as counted_splice is the number of unspliced reads and counted_unspliced is the number of spliced reads.

Unique read filtering for bowtie

Someone has added a section to the preprocessing step for the bowtie mapper in PipelineMapping that references unique_cmd without setting it first.

To get unique reads out of bowtie it is better to setup bowtie to only report unique matches with -m1 rather than post filter.

Can we please remove the unique filtering from the Bowtie post processing?

Documentation - beds2beds

Mike has been assigned to look at the documentaiton for the above script, which hasn't been imporoved before. This doesn't mean that there is anything wrong, just that it hasn't been checked.

Our current notes are:
needs reformat and usage examples
Expand all areas

Please see the documentation style guide.

Please pass back to charlie and katie for checking when finished.

Documentation - gtfs2tsv

Mike has been assigned to look at the documentaiton for the above script, which hasn't been imporoved before. This doesn't mean that there is anything wrong, just that it hasn't been checked.

Our current notes are:
needs examples and more option description
Example input and output, describe options

Please see the documentation style guide.

Please pass back to charlie and katie for checking when finished.

Documentation - chain2stats

Jethro has been assigned to look at the documentaiton for bed2stats, which hasn't been imporoved before.

Our current notes are:
needs examples and more option description
Example input and output, describe options

Please see the documentation style guide.

Please pass back to charlie and katie for checking when finished.

PipelineMetagenomeAssembly.py - pep8 fails

also:

pipeline_metagenomecommunities.py
/pipeline_metagenomeassembly.py
pipeline_docs/pipeline_metagenomecommunities/trackers/Kegg.py
pipeline_docs/pipeline_metagenomecommunities/trackers/Lca.py
pipeline_docs/pipeline_metagenomecommunities/trackers/Metaphlan.py

issue with option without_cluster

Hello,

I try change the option without_cluster (without_cluster=0) in the pipeline.ini file (my workdirectory) to run pipeline_annotations.py. The head of my pipeline.ini is:

Test specific parameters for test_annotations

Use pre-indexed fasta file with only hg19.chr19

[general]

location of indexed genome

genome=hg19

location of indexed genome

genome_dir=./
[ensembl]

without_cluster=0

However the pipeline_annotation try to use the grid:

(cgat-venv)-bash-4.1$ python /net/isi-scratch/cynthia/CGAT-DEPS/cgat/CGATPipelines/pipeline_annotations.py make full

output generated by /net/isi-scratch/cynthia/CGAT-DEPS/cgat/CGATPipelines/pipeline_annotations.py make full

job started at Mon Sep 29 11:13:42 2014 on fgu217.anat.ox.ac.uk -- ad189190-ad51-4288-8f1a-5871769fb935

pid: 75794, system: Linux 2.6.32-431.11.2.el6.x86_64 #1 SMP Mon Mar 3 13:32:45 EST 2014 x86_64

cluster_num_jobs : None

cluster_options : None

cluster_parallel_environment : None

cluster_priority : None

cluster_queue : None

debug : False

dry_run : False

exceptions_terminate_immediately : False

force : False

log_exceptions : False

logfile : pipeline.log

loglevel : 1

multiprocess : 2

pipeline_action : None

pipeline_format : svg

pipeline_targets : []

random_seed : None

stderr : <open file '', mode 'w' at 0x7f89481491e0>

stdin : <open file '', mode 'r' at 0x7f89481490c0>

stdlog : <open file '', mode 'w' at 0x7f8948149150>

stdout : <open file '', mode 'w' at 0x7f8948149150>

terminate : None

timeit_file : None

timeit_header : None

timeit_name : all

variables_to_set : []

without_cluster : False

2014-09-29 11:13:42,098 INFO # output generated by /net/isi-scratch/cynthia/CGAT-DEPS/cgat/CGATPipelines/pipeline_annotations.py make full

# job started at Mon Sep 29 11:13:42 2014 on fgu217.anat.ox.ac.uk -- ad189190-ad51-4288-8f1a-5871769fb935

# pid: 75794, system: Linux 2.6.32-431.11.2.el6.x86_64 #1 SMP Mon Mar 3 13:32:45 EST 2014 x86_64

2014-09-29 11:13:42,099 INFO code location: /net/isi-scratch/cynthia/CGAT-DEPS/cgat/scripts

I was wondering what is the solution to force the pipepline_annotations.py (same issue with others pipeline) to not use the grid and more generally if my way to modify the options was correct.

I use the last version of pipeline_annotations (update yesterday). However I can not say what version exactlly I use because the command --version give me:
pipeline_annotations.py version: $Id$

Many Thanks for your help

Cynthia

Inconsistency between the quality encoding options in fastq2fastq.py and Fastq.py

fastq2fastq.py accepts sanger, solexa, phred64 and integer,
Fastq.py accepts sanger, solexa, phred64 and illumina-1.8

Not sure how these relate to one another, but it's not currently possible to use either --change-format, or --guess-format with the option illumina-1.8.

Documentation - gtf2tsv

Andreas has been assigned to look at the documentaiton for bed2stats, which hasn't been imporoved before. This doesn't mean that there is anything wrong, just that it hasn't been checked.

Our current notes are:
needs minimal level of documentation
Example input and output, describe options

Please see the documentation style guide.

Please pass back to charlie and katie for checking when finished.

runGO.py

I was trying to run runGO.py so that it would return an emprical p-value. However, the calculateFDR function in GO.py attempts to use getSamples() with gene2go as an argument. gene2go is not however passed as an argument to calculateFDR and therefore causes runGO.py to fail with the following error:

Traceback (most recent call last):
File "/ifs/devel/nicki/cgat/GO.py", line 2229, in
sys.exit( main() )
File "/ifs/devel/nicki/cgat/GO.py", line 2058, in main
fdrs, samples, method = computeFDRs( go_results, options, test_ontology )
File "/ifs/devel/nicki/cgat/GO.py", line 1342, in computeFDRs
samples, simulation_min_pvalues = getSamples( gene2go, genes, background, options, test_ontology )
NameError: global name 'gene2go' is not defined

The script works fine if the empirical p is not required by passing e.g.
runGO.py --filename-input=gene2go.map
--genes=foreground.tsv
--background=background.tsv
-q BH

Best wishes

Nick

Documentation - bed2gff

David has been assigned to look at the documentaiton for bed2stats, which hasn't been imporoved before. This doesn't mean that there is anything wrong, just that it hasn't been checked.

Our current notes are:
needs reformat and usage examples
Example input and output, usage recommendation

Please see the documentation style guide.

Please pass back to charlie and katie for checking when finished.

Documentation - gff2histogram

Ian has been assigned to look at the documentaiton for the above script, which hasn't been imporoved before. This doesn't mean that there is anything wrong, just that it hasn't been checked.

Our current notes are:
needs usage examples and reformat

Please see the documentation style guide.

Please pass back to charlie and katie for checking when finished.

Documentation - gff2bed

David has been assigned to look at the documentaiton for bed2stats, which hasn't been imporoved before. This doesn't mean that there is anything wrong, just that it hasn't been checked.

Our current notes are:
needs minimal level of documentation
Expand options documentation and usage

Please see the documentation style guide.

Please pass back to charlie and katie for checking when finished.

duplicate entry names in index_fasta

I'm going through and writing tests. Then I'm going to do my assigned documentation. Any bugs I come across, I'm going to submit here. When I finish, I will go back through these and fix try to fix as necessary if no one else has.

index_fasta has an option --allow-duplicates, which should allow the same contig name:

From the help:

--allow-duplicates    allow duplicate identifiers. Further occurances of an
                        identifier are suffixed by an '_%i' [default=False].

however if I run

python ../../scripts/index_fasta.py test5 chrI.fa chr1.fa --allow-duplicates

in the index_fasta tests directory, where chrI.fa and chr1.fa both contain chromosomes called chrI, the resulting indexed genome only contains a single entry for chrI.

Ian

PipelineIDR.py - changes to sample naming requirements using PipelineTracks

PipelineIDR.callerIDRPeaks makes use of PipelineTracks when generating input file ids for peakcalling.

In commit 774232e the run method in callerIDRPeaks switched from using PipelineTracks.Sample3 to PipelineTracks.AutoSample. This changed attribute names of the resulting sample object from 'tissue' 'condition' 'replicate' to 'attribute%i'.

However, elsewhere in the code - in method getControlFile() - sample attributes were still being edited using original nomenclature, see below:

def getControlfile( self, track ):
"""
Return appropriate input file for a track.
For pooled tracks (R0), will always return a pooled input file.
If options is set to pooled, will return a pooled input for all tracks.
If options is set to single, will return first input replicate (R1).
If options is set to matched, will return input with matching replicate.
Otherwise will return ValueError
"""
n = track.clone()
n.condition = "input" # is hardcoded into regex for ruffus tasks

The result of this bug is that since 28/04/2014, pipeline_idr.py has been calling peaks against the wrong input files (when using default parameters... this will be a pooled sample file rather than the pooled input file).

This bug is easy to fix, but it will have affected all pipeline runs since April.

To check whether your run is affected, look in each of the peakfiles_* directories. If the resulting narrowpeak files are named "tissue-condition-replicate_VS_tissue-input-R0", then everything is okay. If they are named "--VS--R0", then you will need to re-run the pipeline once I've pushed the bug fix.

bam2geneprofile issue

Hi,
I have installed cgat on my Ubuntu machine, and when I run bam2geneprofile, it says:

Traceback (most recent call last):
File "/usr/local/bin/cgat", line 9, in
load_entry_point('CGAT==0.2.1', 'console_scripts', 'cgat')()
File "/usr/local/lib/python2.7/site-packages/CGATScripts/cgat.py", line 126, in main
module = imp.load_module(command, file, pathname, description)
File "/usr/local/lib/python2.7/site-packages/CGATScripts/bam2geneprofile.py", line 314, in
import CGAT._bam2geneprofile as _bam2geneprofile
File "_bam2geneprofile.pyx", line 10, in init CGAT._bam2geneprofile (scripts/_bam2geneprofile.c:26858)
File "/usr/local/lib/python2.7/site-packages/CGAT/Stats.py", line 43, in
from rpy2.robjects import r as R
File "/usr/local/lib/python2.7/site-packages/rpy2/robjects/init.py", line 20, in
from rpy2.robjects.functions import Function, SignatureTranslatedFunction
File "/usr/local/lib/python2.7/site-packages/rpy2/robjects/functions.py", line 5, in
from rpy2.robjects import help
File "/usr/local/lib/python2.7/site-packages/rpy2/robjects/help.py", line 12, in
from rpy2.robjects.packages_utils import get_packagepath, _libpaths, _packages
File "/usr/local/lib/python2.7/site-packages/rpy2/robjects/packages_utils.py", line 9, in
_find_package = rinterface.baseenv['find.package']
LookupError: 'find.package' not found

Does anyone know how to solve this? Thanks a lot!

Expression.py multiple groups in edgeR

DESeq tests multiple groups by doing all pairwise comparisons. EdgeR does it via an ANNOVA-like test. This works fine. However, EdgeR then tries to do an MA plot and passes a list of conditoins to plotSmear:

 # output differences between pairs
    R.png('''%(outfile_prefix)smaplot.png''' % locals())
    R('''plotSmear( countsTable, pair=c('%s') )''' % "','".join(groups))
    R('''abline( h = c(-2,2), col = 'dodgerblue') ''')
    R['dev.off']()

This fails if there are more than two groups because (understandably) plotSmear requires two groups and will fail with more than two.

bam2bed improvements

Bam to bed has a number of short commings:

It won't operate on sam files, only bam files when there is no real reason for this.
--region and --merge-pairs cannot both be specified. You can use samtools and pipe into bam2bed, but then you get fragements where read one is in the region, irrespective of where the mate is. Thus fragments that overlap the region only at one end may or may not be output depending on which end it is that overlaps.
With --merge-pairs, fragment ends are estimated as the start of the second-in-pair reads + the length of the first-in-pair read. This can lead to serious inaccruacy.

bedtools has bamtobed which does most of what bam2bed will do, but without the above short commings.

Should we imporove bam2bed or deprecate in favor of bam2bed?

bam2stats.py - error compiling cython file and outfile_details keyword error

Running bam2stats.py results in the following error:
cat rnaseq_test.bam | python /ifs/devel/katherinef/cgat/scripts/bam2stats.py -

Error compiling Cython file:

...
cdef int x
cdef int lflags = len(FLAGS)
cdef int f

# detailed counting
cdef FastqProxy fq
    ^

/ifs/devel/katherinef/cgat/scripts/_bam2stats.pyx:77:9: 'FastqProxy' is not a type identifier

Error compiling Cython file:

...
# Alternatives to dictionary
#1. POSIX hash tables (hsearch,...) or trees: very slow
#2. custom hash implementation: worth the effort?
#3. Sorted list and binary search: too slow for many lookups
reads = {}
fastqfile = Fastqfile( filename_fastq )

^

/ifs/devel/katherinef/cgat/scripts/_bam2stats.pyx:93:29: undeclared name not builtin: Fastqfile

gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -D_FILE_OFFSET_BITS=64 -D_USE_KNETFILE= -I/ifs/apps/apps/python-2.7.1/lib/python2.7/site-packages/numpy/core/include -I/ifs/apps/apps/python-2.7.1/lib/python2.7/site-packages/pysam-0.7.5-py2.7-linux-x86_64.egg/pysam -I/ifs/apps/apps/python-2.7.1/lib/python2.7/site-packages/pysam-0.7.5-py2.7-linux-x86_64.egg/pysam/include/samtools -I/ifs/apps/apps/python-2.7.1/lib/python2.7/site-packages/pysam-0.7.5-py2.7-linux-x86_64.egg/pysam/include/tabix -I/ifs/apps/apps/python-2.7.1/include/python2.7 -c /ifs/devel/katherinef/cgat/scripts/_bam2stats.c -o /ifs/home/katherinef/.pyxbld/temp.linux-x86_64-2.7/ifs/devel/katherinef/cgat/scripts/_bam2stats.o

/ifs/devel/katherinef/cgat/scripts/_bam2stats.c:1:2: error: #error Do not use this file, it is the result of a failed Cython compilation.

output generated by /ifs/devel/katherinef/cgat/scripts/bam2stats.py -

job started at Mon Sep 30 13:35:37 2013 on cgat150.anat.ox.ac.uk -- f4a95c1f-fa93-4019-8346-723a2d4e4b89

pid: 30412, system: Linux 2.6.32-358.11.1.el6.x86_64 #1 SMP Wed May 15 10:48:38 EDT 2013 x86_64

filename_fastq : None

filename_rna : None

force_output : False

input_reads : 0

loglevel : 1

output_details : False

output_filename_pattern : %s

output_force : False

remove_rna : False

stderr : <open file '', mode 'w' at 0x2b7b357f1270>

stdin : <open file '', mode 'r' at 0x2b7b357f1150>

stdlog : <open file '', mode 'w' at 0x2b7b357f11e0>

stdout : <open file '', mode 'w' at 0x2b7b357f11e0>

timeit_file : None

timeit_header : None

timeit_name : all

Traceback (most recent call last):
File "/ifs/devel/katherinef/cgat/scripts/bam2stats.py", line 480, in
sys.exit( main( sys.argv) )
File "/ifs/devel/katherinef/cgat/scripts/bam2stats.py", line 319, in main
outfile_details = outfile_details )
File "_bam2stats.pyx", line 21, in CGAT._bam2stats.count (scripts/_bam2stats.c:1324)
cdef struct CountsType:
TypeError: count() got an unexpected keyword argument 'outfile_details'

test bam is in /ifs/projects/katherinef/

Please see the documentation style guide.

Please pass back to charlie and katie for checking when finished.

cgatoxford / cgat Goto Github PK

cgat's People

Contributors

Stargazers

Watchers

Forkers

cgat's Issues

Error compiling Cython file:

Ian

Test specific parameters for test_annotations

Use pre-indexed fasta file with only hg19.chr19

location of indexed genome

location of indexed genome

output generated by /net/isi-scratch/cynthia/CGAT-DEPS/cgat/CGATPipelines/pipeline_annotations.py make full

job started at Mon Sep 29 11:13:42 2014 on fgu217.anat.ox.ac.uk -- ad189190-ad51-4288-8f1a-5871769fb935

pid: 75794, system: Linux 2.6.32-431.11.2.el6.x86_64 #1 SMP Mon Mar 3 13:32:45 EST 2014 x86_64

cluster_num_jobs : None

cluster_options : None

cluster_parallel_environment : None

cluster_priority : None

cluster_queue : None

debug : False

dry_run : False

exceptions_terminate_immediately : False

force : False

log_exceptions : False

logfile : pipeline.log

loglevel : 1

multiprocess : 2

pipeline_action : None

pipeline_format : svg

pipeline_targets : []

random_seed : None

stderr : <open file '', mode 'w' at 0x7f89481491e0>

stdin : <open file '', mode 'r' at 0x7f89481490c0>

stdlog : <open file '', mode 'w' at 0x7f8948149150>

stdout : <open file '', mode 'w' at 0x7f8948149150>

terminate : None

timeit_file : None

timeit_header : None

timeit_name : all

variables_to_set : []

without_cluster : False

2014-09-29 11:13:42,098 INFO # output generated by /net/isi-scratch/cynthia/CGAT-DEPS/cgat/CGATPipelines/pipeline_annotations.py make full

# job started at Mon Sep 29 11:13:42 2014 on fgu217.anat.ox.ac.uk -- ad189190-ad51-4288-8f1a-5871769fb935

# pid: 75794, system: Linux 2.6.32-431.11.2.el6.x86_64 #1 SMP Mon Mar 3 13:32:45 EST 2014 x86_64

2014-09-29 11:13:42,099 INFO code location: /net/isi-scratch/cynthia/CGAT-DEPS/cgat/scripts

Ian

Error compiling Cython file:

Error compiling Cython file:

^

output generated by /ifs/devel/katherinef/cgat/scripts/bam2stats.py -

job started at Mon Sep 30 13:35:37 2013 on cgat150.anat.ox.ac.uk -- f4a95c1f-fa93-4019-8346-723a2d4e4b89

pid: 30412, system: Linux 2.6.32-358.11.1.el6.x86_64 #1 SMP Wed May 15 10:48:38 EDT 2013 x86_64

filename_fastq : None

filename_rna : None

force_output : False

input_reads : 0

loglevel : 1

output_details : False

output_filename_pattern : %s

output_force : False

remove_rna : False

stderr : <open file '', mode 'w' at 0x2b7b357f1270>

stdin : <open file '', mode 'r' at 0x2b7b357f1150>

stdlog : <open file '', mode 'w' at 0x2b7b357f11e0>

stdout : <open file '', mode 'w' at 0x2b7b357f11e0>

timeit_file : None

timeit_header : None

timeit_name : all

Recommend Projects

Recommend Topics

Recommend Org