Code Monkey home page Code Monkey logo

bigsi's People

Contributors

bingmann avatar iqbal-lab avatar phelimb avatar zhicheng-liu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

zhicheng-liu

bigsi's Issues

Cannot Use the all-microbial FTP Index for Searching

Hi,
I have downloaded the all-microbial-bigsi-v03* files from the FTP at ftp://ftp.ebi.ac.uk/pub/software/bigsi/nat_biotech_2018/all-microbial-index-v03/ and concatenated them to get a combined-index file, which I then referenced using the template config from FTP:

h: 3
m: 25000000
nproc: 4
k: 31
storage-engine: berkeleydb
storage-config:
  filename: /media/disk2/combined-index # cat * > combined-index 
  flag: "c" ## Change to 'r' for read-only access                                                       

However I get the following error when I try to use this config and make a search:

(base) joe@fractal:~/BIGSI$ bigsi search -c config.yaml CGGCGAGGAAGCGTTAAATCTCTTTCTGACG
Traceback (most recent call last):
  File "/home/joe/anaconda3/bin/bigsi", line 33, in <module>
    sys.exit(load_entry_point('bigsi==0.3.8', 'console_scripts', 'bigsi')())
  File "/home/joe/anaconda3/lib/python3.8/site-packages/bigsi-0.3.8-py3.8.egg/bigsi/__main__.py", line 402, in main
  File "/home/joe/anaconda3/lib/python3.8/site-packages/hug/api.py", line 441, in __call__
    result = self.commands.get(command)()
  File "/home/joe/anaconda3/lib/python3.8/site-packages/hug/interface.py", line 650, in __call__
    raise exception
  File "/home/joe/anaconda3/lib/python3.8/site-packages/hug/interface.py", line 646, in __call__
    result = self.output(self.interface(**pass_to_function), context)
  File "/home/joe/anaconda3/lib/python3.8/site-packages/hug/interface.py", line 129, in __call__
    return __hug_internal_self._function(*args, **kwargs)
  File "/home/joe/anaconda3/lib/python3.8/site-packages/bigsi-0.3.8-py3.8.egg/bigsi/__main__.py", line 283, in search
  File "/home/joe/anaconda3/lib/python3.8/site-packages/bigsi-0.3.8-py3.8.egg/bigsi/__main__.py", line 66, in search_bigsi
  File "/home/joe/anaconda3/lib/python3.8/site-packages/bigsi-0.3.8-py3.8.egg/bigsi/graph/bigsi.py", line 181, in search
  File "/home/joe/anaconda3/lib/python3.8/site-packages/bigsi-0.3.8-py3.8.egg/bigsi/graph/bigsi.py", line 196, in exact_filter
  File "/home/joe/anaconda3/lib/python3.8/site-packages/bigsi-0.3.8-py3.8.egg/bigsi/graph/bigsi.py", line 208, in get_sample_list
  File "/home/joe/anaconda3/lib/python3.8/site-packages/bigsi-0.3.8-py3.8.egg/bigsi/graph/metadata.py", line 70, in colours_to_samples
  File "/home/joe/anaconda3/lib/python3.8/site-packages/bigsi-0.3.8-py3.8.egg/bigsi/graph/metadata.py", line 71, in <dictcomp>
  File "/home/joe/anaconda3/lib/python3.8/site-packages/bigsi-0.3.8-py3.8.egg/bigsi/graph/metadata.py", line 59, in colour_to_sample
  File "/home/joe/anaconda3/lib/python3.8/site-packages/bigsi-0.3.8-py3.8.egg/bigsi/graph/metadata.py", line 96, in _get_string
  File "/home/joe/anaconda3/lib/python3.8/site-packages/bigsi-0.3.8-py3.8.egg/bigsi/storage/base.py", line 84, in get_string
  File "/home/joe/anaconda3/lib/python3.8/site-packages/bigsi-0.3.8-py3.8.egg/bigsi/storage/base.py", line 21, in __getitem__
KeyError: b'metadata:447931:string'

Are you able to provide any guidance on what I might be doing wrong?
Thanks for your work on a really fascinating research paper.

Updated conda package

Could an updated conda package be made please from this branch of the code? Currently the one available is from over a year ago.

Does BIGSI support building bloom filters directly from sequence files?

For building a BIGSI index for a large dataset I would like to know whether BIGSI supports building Bloom filters directly from the sequence files (FASTA/FASTQ). As far as I have read the code I do not find anything related to this, except for the bloom function docstring statement. In general, BIGSI requires cortex graphs to build bloom filters. Can you please clarify this?

Thank you!

pyfasta.fasta.FastaNotFound: "True"

Hi all,

when running:

bigsi bulk_search -c config_10K_00.yaml -f csv -t 0.0 --score True foo.fas

with the files of the paper
https://www.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.000499

I get this error:

Traceback (most recent call last):
  File "/usr/local/bin/bigsi", line 11, in <module>
    load_entry_point('bigsi==0.3.5', 'console_scripts', 'bigsi')()
  File "/usr/local/lib/python3.7/dist-packages/bigsi-0.3.5-py3.7.egg/bigsi/__main__.py", line 307, in main
  File "/usr/local/lib/python3.7/dist-packages/hug/api.py", line 399, in __call__
    result = self.commands.get(command)()
  File "/usr/local/lib/python3.7/dist-packages/hug/interface.py", line 546, in __call__
    raise exception
  File "/usr/local/lib/python3.7/dist-packages/hug/interface.py", line 542, in __call__
    result = self.output(self.interface(**pass_to_function), context)
  File "/usr/local/lib/python3.7/dist-packages/hug/interface.py", line 100, in __call__
    return __hug_internal_self._function(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/bigsi-0.3.5-py3.7.egg/bigsi/__main__.py", line 259, in bulk_search
  File "/usr/local/lib/python3.7/dist-packages/pyfasta/fasta.py", line 67, in __init__
    raise FastaNotFound('"' + fasta_name + '"')
pyfasta.fasta.FastaNotFound: "True"

Looking in the code --score should accept True or False.

Without --score it works properly.

Where I am wrong?

Thank you in advance,
Alex

Improve testing framework

This is the test coverage report from PR #1, which adds more tests to BIGSI, sorted by source file with lowest test coverage to highest:
Coverage report.pdf
We should improve the test coverage in many files.

I wonder if you think that the tests are cumbersome to run, due to using real DBs. One option would be to mock the DBs, but it is nice to have tests using real DBs also. I see that many of the tests are actually integration tests, as they test not only the function/method being tested, but all the other functions and methods called by the tested function/method. I could start adding unit testing with mocking if you wish, although it seems that integration tests might be more complete, but less specific, than unit testing.

Anyway, this issue is more like a discussion issue to talk about how we should improve the testing framework.

bigsi search failed when using docker

Hi,

I tried to follow the manual at https://bigsi.readme.io/docs/your-first-bigsi and run BIGSI in docker image (phelimb/bigsi:63768c2).

I got the error messages in the search step:

❯ docker run -v $PWD/example-data:/data phelimb/bigsi:63768c2 bigsi search --config /data/configs/berkeleydb.yaml CGGCGAGGAAGCGTTAAATCTCTTTCTGACG
Traceback (most recent call last):
  File "/usr/local/bin/bigsi", line 11, in <module>
    load_entry_point('bigsi==0.3.2', 'console_scripts', 'bigsi')()
  File "/usr/local/lib/python3.7/dist-packages/bigsi-0.3.2-py3.7.egg/bigsi/__main__.py", line 178, in main
  File "hug/api.py", line 390, in hug.api.CLIInterfaceAPI.__call__
  File "hug/interface.py", line 551, in hug.interface.CLI.__call__
  File "hug/interface.py", line 547, in hug.interface.CLI.__call__
  File "hug/interface.py", line 100, in hug.interface.Interfaces.__call__
  File "/usr/local/lib/python3.7/dist-packages/bigsi-0.3.2-py3.7.egg/bigsi/__main__.py", line 158, in search
  File "/usr/local/lib/python3.7/dist-packages/bigsi-0.3.2-py3.7.egg/bigsi/graph/bigsi.py", line 133, in __init__
  File "/usr/local/lib/python3.7/dist-packages/bigsi-0.3.2-py3.7.egg/bigsi/graph/index.py", line 23, in __init__
  File "/usr/local/lib/python3.7/dist-packages/bigsi-0.3.2-py3.7.egg/bigsi/matrix/bitmatrix.py", line 16, in __init__
  File "/usr/local/lib/python3.7/dist-packages/bigsi-0.3.2-py3.7.egg/bigsi/storage/base.py", line 67, in get_integer
  File "/usr/local/lib/python3.7/dist-packages/bigsi-0.3.2-py3.7.egg/bigsi/storage/base.py", line 21, in __getitem__
  File "/usr/local/lib/python3.7/dist-packages/bsddb3/__init__.py", line 239, in __getitem__
    return _DeadlockWrap(lambda: self.db[key])  # self.db[key]
  File "/usr/local/lib/python3.7/dist-packages/bsddb3/dbutils.py", line 67, in DeadlockWrap
    return function(*_args, **_kwargs)
  File "/usr/local/lib/python3.7/dist-packages/bsddb3/__init__.py", line 239, in <lambda>
    return _DeadlockWrap(lambda: self.db[key])  # self.db[key]
KeyError: b'number_of_rows:int'

I also pasted the outputs of build:

❯ docker run -v $PWD/example-data:/data phelimb/bigsi:63768c2 bigsi build --config /data/configs/berkeleydb.yaml /data/test1.bloom /data/test2.bloom -s s1 -s s2
INFO:bigsi.cmds.build:Building index: 0/1
DEBUG:bigsi.cmds.build:Loading /data/test1.bloom/test1.bloom
DEBUG:bigsi.cmds.build:Loading /data/test2.bloom/test2.bloom
DEBUG:bigsi.graph.bigsi:Insert sample metadata
DEBUG:bigsi.graph.bigsi:Create signature index
DEBUG:bigsi.graph.index:Transpose bitarrays
DEBUG:bigsi.graph.index:Insert rows
DEBUG:bigsi.storage.base:set bitarrays

I also tried the latest docker image phelimb/bigsi:310ef4c, it failed at build step:

❯ docker run -v $PWD/example-data:/data phelimb/bigsi:310ef4c bigsi build --config /data/configs/berkeleydb.yaml /data/test1.bloom /data/test2.bloom -s s1 -s s2

Traceback (most recent call last):
  File "/usr/local/bin/bigsi", line 11, in <module>
    load_entry_point('bigsi==0.3.8', 'console_scripts', 'bigsi')()
  File "/usr/local/lib/python3.7/dist-packages/bigsi-0.3.8-py3.7.egg/bigsi/__main__.py", line 324, in main
  File "/usr/local/lib/python3.7/dist-packages/hug/api.py", line 439, in __call__
    result = self.commands.get(command)()
  File "/usr/local/lib/python3.7/dist-packages/hug/interface.py", line 631, in __call__
    raise exception
  File "/usr/local/lib/python3.7/dist-packages/hug/interface.py", line 627, in __call__
    result = self.output(self.interface(**pass_to_function), context)
  File "/usr/local/lib/python3.7/dist-packages/hug/interface.py", line 123, in __call__
    return __hug_internal_self._function(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/bigsi-0.3.8-py3.7.egg/bigsi/__main__.py", line 157, in build
AssertionError

Could you please provide any suggestions?

Ensure users and devs have a similar environment

We had issues due to dependencies updating and API changes, modifying BIGSI behaviour. I feel very strongly that we should ensure that users and devs have the exact same environment, so the execution and issues are reproducible, and we don't have unexpected bugs. The only way to achieve this is to require everyone to use the containers, but I don't think that is feasible. So, we should at least fix the python dependencies versions, so at least the python environment and the dependencies are always the same.

I don't think there is the downside of our dependencies never getting upgraded if we fix the versions: we are just controlling the dependencies' versions. We will thus be responsible for upgrading the dependencies from time to time, but before upgrading the dependencies, we will be able to make sure everything works by running a comprehensive test suite.

Bigsi for 600000 genomes

Hello,

I work in a ministry of health in the pathogenes detection.
I read about Bigsi (congratulation!), and I think it could be useful.
However didn't find the docs very useful (sorry), and the demo link https://bigsi.readme.io/ doesn't work
We need to find presence of specific genes in 600.000 Salmonella genomes.
I am in charge to find the better tool to do that.
Could you tell me please if you think it's doable with Bigsi, and how many volume in my hard drive I need,
and how much time (approximatively of course) it would take?

Thank you very much

David

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.