cruizperez / fastaai Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Hello,
I just want to confirm if the current commit (c016d8b) is ready to deploy in MiGA 1.0, or if I should keep the version currently in the submodule (50e0303) until we test it.
I'm talking about the submodule here:
https://github.com/bio-miga/miga/tree/main/utils
In the kmer_extract function, selection of best HMM hits happens via parsing the filtered HMM file and selecting the highest score for each protein.
The problem is that the string.split() function returns a list of strings, not a best-matching type of each chunk. The HMM scores are being compared as strings, so "80" > "300" for the protein scores.
score = line[8]
in that function must be replaced by
score = float(line[8])
Hey there! Thanks so much for this tool, it's exactly what I was looking for!
I was able to install FastAAI easily using pip install, and made sure all the dependencies installed properly. However, I keep getting the same error message anytime I run build_db (this is the only module I've tried thus far). Here's the line of code I'm running:
fastaai build_db --genomes Genomes_85_5/ --threads 20 --verbose --output Halomonas_fAAI_Build --database Halomonas_Build_DB.db --compress
I tried it both on a server that has Python 3.6 installed, and on my own computer that has Python 3.9 installed. Here's the error I get with Python 3.6:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 10: invalid start byte
And here's the error I get with Python 3.9 (similar but slightly different):
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf0 in position 26: invalid continuation byte
Any suggestions on what could be going wrong, or how to resolve this issue?
Thank you!!
Hi,
I successfully installed fastaai - however I could not find the fastaai conda package in your repo (I installed the other dependencies using conda and clones your github repo).
I alson succeeded in running a first sample pair (two fasta files).
However, I do not comprehend the output:
sample-107.fasta sample-100.fasta 0.45 0.2385 83 83 65.25653091282805
Could you also provide the column headers or describe the output in the readme?
Thanks for this very nice tool
Dear developer,
I tested 3 protein groups and found that fastAAI results are about 5% smaller than comperm, have you done a comparison between this software and classic blastp method and how accurate is it?
fastaai:
query_genome A_Mi.faa B_Ms.faa C_Ms1.faa
A_Mi >90% 84.67 84.67
B_Ms 84.67 >90% >90%
C_Ms1 84.67 >90% >90%
comparem
#Genome A Genes in A Genome B Genes in B # orthologous genes Mean AAI Std AAI Orthologous fraction (OF)
A_Mi 11642 C_Ms1 13182 8222 89.58 10.79 70.62
A_Mi 11642 B_Ms 13182 8222 89.58 10.79 70.62
C_Ms1 13182 B_Ms 13182 13090 100.00 0.00 99.30
Another question is there a way to show the exact value of AAI less than 30 and greater than 90 instead of the token?
Looking forward your reply. Thanks a lot.
shutil.rmtree(td)
in the build_db code is trying to remove a directory in which processes are still trying to use those resources. A try:
+ except:
will help, but also using temp directories that differ for each run could help (e.g., naming the directories based on UUIDs).
The specific error:
File "/tmp/global2/nyoungblut/code/dev/ll_pipelines/llg/.snakemake/conda/e597b1bc4c3f6c65a46887160aeefc74/bin/fastaai", line 8, in <module>
sys.exit(main())
File "/tmp/global2/nyoungblut/code/dev/ll_pipelines/llg/.snakemake/conda/e597b1bc4c3f6c65a46887160aeefc74/lib/python3.7/site-packages/fastaai/FastAAI.py", line 3927, in main
build_db(genomes, proteins, hmms, db_name, output, threads, verbose, do_comp)
File "/tmp/global2/nyoungblut/code/dev/ll_pipelines/llg/.snakemake/conda/e597b1bc4c3f6c65a46887160aeefc74/lib/python3.7/site-packages/fastaai/FastAAI.py", line 1618, in build_db
shutil.rmtree(td)
File "/tmp/global2/nyoungblut/code/dev/ll_pipelines/llg/.snakemake/conda/e597b1bc4c3f6c65a46887160aeefc74/lib/python3.7/shutil.py", line 494, in rmtree
_rmtree_safe_fd(fd, path, onerror)
File "/tmp/global2/nyoungblut/code/dev/ll_pipelines/llg/.snakemake/conda/e597b1bc4c3f6c65a46887160aeefc74/lib/python3.7/shutil.py", line 452, in _rmtree_safe_fd
onerror(os.unlink, fullname, sys.exc_info())
File "/tmp/global2/nyoungblut/code/dev/ll_pipelines/llg/.snakemake/conda/e597b1bc4c3f6c65a46887160aeefc74/lib/python3.7/shutil.py", line 450, in _rmtree_safe_fd
os.unlink(entry.name, dir_fd=topfd)
OSError: [Errno 16] Device or resource busy: '.nfs0000001c0007a6050021804d'
Conda env:
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
ca-certificates 2022.6.15 ha878542_0 conda-forge
fastaai 0.1.15 pypi_0 pypi
hmmer 3.3.2 h87f3376_2 bioconda
ld_impl_linux-64 2.36.1 hea4e1c9_2 conda-forge
libblas 3.9.0 16_linux64_openblas conda-forge
libcblas 3.9.0 16_linux64_openblas conda-forge
libffi 3.4.2 h7f98852_5 conda-forge
libgcc-ng 12.1.0 h8d9b700_16 conda-forge
libgfortran-ng 12.1.0 h69a702a_16 conda-forge
libgfortran5 12.1.0 hdcd56e2_16 conda-forge
libgomp 12.1.0 h8d9b700_16 conda-forge
liblapack 3.9.0 16_linux64_openblas conda-forge
libnsl 2.0.0 h7f98852_0 conda-forge
libopenblas 0.3.21 pthreads_h78a6416_3 conda-forge
libsqlite 3.39.3 h753d276_0 conda-forge
libstdcxx-ng 12.1.0 ha89aaad_16 conda-forge
libzlib 1.2.12 h166bdaf_2 conda-forge
ncurses 6.3 h27087fc_1 conda-forge
numpy 1.21.6 py37h976b520_0 conda-forge
openssl 3.0.5 h166bdaf_1 conda-forge
pigz 2.6 h27826a3_0 conda-forge
pip 22.2.2 pyhd8ed1ab_0 conda-forge
prodigal 2.6.3 hec16e2b_4 bioconda
psutil 5.9.2 pypi_0 pypi
pyhmmer 0.6.2 pypi_0 pypi
pyrodigal 1.1.2 pypi_0 pypi
python 3.7.12 hf930737_100_cpython conda-forge
python_abi 3.7 2_cp37m conda-forge
readline 8.1.2 h0f457ee_0 conda-forge
setuptools 65.3.0 py37h89c1867_0 conda-forge
sqlite 3.39.3 h4ff8645_0 conda-forge
tk 8.6.12 h27826a3_0 conda-forge
wheel 0.37.1 pyhd8ed1ab_0 conda-forge
xz 5.2.6 h166bdaf_0 conda-forge
zlib 1.2.12 h166bdaf_2 conda-forge
I recently updated FastAAI, and now I get the following error when running fastaai aai_index
:
I couldn't find the module you specified. Please select one of the following modules:
So, it appears that the UI was dramatically changed, but I can't find a changelog or release notes stating this. Which is the last pypi release to contain aai_index?
Also, it appears that your docs still include aai_index:
If you wish to query multiple genomes against themselves in all vs. all AAI search, use aai_index instead.
If you wish to query multiple genomes against multiple targets, use multi_query instead.
...but aai_index was commented-out a few lines lower in the code:
#print(" multi_query |" + " Create a query DB and a target DB, then calculate query vs. target AAI")
#print(" aai_index |" + " Create a database from multiple genomes and do an all vs. all AAI index of the genomes")
Please make compatible with the latest version of python
Commands:
conda create -n fastaai python=3.12.1
conda activate fastaai
conda install pip -y
pip install FastAAI
Error:
....
pyrodigal/_pyrodigal.c:80984:16: note: in expansion of macro ‘__Pyx_IsTracing’
80984 | return __Pyx_IsTracing(tstate, 0, 0) && retval;
| ^~~~~~~~~~~~~~~
error: command '/usr/bin/gcc' failed with exit code 1
[end of output]note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for pyrodigal
Failed to build pyrodigal
ERROR: Could not build wheels for pyrodigal, which is required to install pyproject.toml-based projects
When an genome name starts with a number, the tool fails:
Traceback (most recent call last):
File "...../multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "...../site-packages/fastaai/FastAAI.py", line 2281, in do_sql_query_no_SD
database.cursor.execute(temp_tab)
sqlite3.OperationalError: unrecognized token: "**11_0_1__xyz_PF01813_17**"
These are the lines apparently causing the error:
temp_tab = "CREATE TEMP TABLE " + temp_name + " (kmer INTEGER)"\
database.cursor.execute(temp_tab)
do_sql_query() has a similar fragment.
Hello @KGerhardt
I noticed fastaai single_query
seems to be broken.
The following command:
fastaai single_query -qp data/06.cds/gb_AQYU00000000.faa.gz -tp data/06.cds/Microbacterium_sediminis_GCA_001689915_1.faa.gz -o xxxx
Produces the output:
Query start: Genome [Protein] Protein+HMM
Target start: Genome [Protein] Protein+HMM
Output will be located at xxxx/results/gb_AQYU00000000_vs_Microbacterium_sediminis_GCA_001689915_1.aai.txt
/home/migagw/miniconda3/envs/miga-beta/share/rubygems/gems/miga-base-1.3.4.2/utils/FastAAI/fastaai/fastaai:273: DeprecationWarning: `Sequence.taxonomy_id` is not supported consistently in Easel and will be removed in `v0.8.0`
easel_seq = easel_seq.digitize(pyhmmer.easel.Alphabet.amino())
Traceback (most recent call last):
File "/home/migagw/miniconda3/envs/miga-beta/share/rubygems/gems/miga-base-1.3.4.2/utils/FastAAI/fastaai/fastaai", line 4803, in <module>
main()
File "/home/migagw/miniconda3/envs/miga-beta/share/rubygems/gems/miga-base-1.3.4.2/utils/FastAAI/fastaai/fastaai", line 4622, in main
single_query(query_file, target_file, output, verbose, threads, do_compress)
File "/home/migagw/miniconda3/envs/miga-beta/share/rubygems/gems/miga-base-1.3.4.2/utils/FastAAI/fastaai/fastaai", line 3784, in single_query
print(query.partial_timings())
File "/home/migagw/miniconda3/envs/miga-beta/share/rubygems/gems/miga-base-1.3.4.2/utils/FastAAI/fastaai/fastaai", line 919, in partial_timings
protein_pred = self.prot_pred_time-self.init_time
TypeError: unsupported operand type(s) for -: 'float' and 'datetime.datetime'
The resulting folder doesn't have results other than the HMMs.
Hi,
Thanks so much for the nice tool! I encountered an error when using this tool.
Code and error:
fastaai build_db -p split -o fastaai_out --threads 90 --verbose --compress
Processing inputs
Completion |##################################################| 100.00% ( 229 of 229 ) at 04/09/2023 20:36:00
Collecting results
Database build complete!
fastaai db_query -q fastaai_out/database/FastAAI_database.sqlite.db -t fastaai_out/database/FastAAI_database.sqlite.db -o out --threads 80 --verbose
Query database improperly formatted. Exiting FastAAI
Do you have any suggestions? Thank you!
fastaai db_query runs smoothly on a formatted database. However, it breaks upon usage of matrix output
fastaai db_query --query ./FastAAI/database/bac_proteidb --target ./FastAAI/database/bac_proteidb --threads 14 --verbose --output FastAAI_matrix --output_style matrix
Performing an all vs. all query on ./FastAAI/database/bac_proteidb
Perusing database metadata
Calculating AAI
Completion |##################################################| 100.00% ( 548 of 548 ) at 25/05/2023 14:44:00
Finalizing results.
Completion |################# | 35.71% ( 5 of 14 ) at 25/05/2023 14:44:00
Traceback (most recent call last):
File "/home/filipe/.local/bin/fastaai", line 8, in
sys.exit(main())
File "/home/filipe/.local/lib/python3.10/site-packages/fastaai/fastaai.py", line 4613, in main
db_query(query, target, verbose, output, threads, do_stdev, style, in_mem, store)
File "/home/filipe/.local/lib/python3.10/site-packages/fastaai/fastaai.py", line 3399, in db_query
mdb.run()
File "/home/filipe/.local/lib/python3.10/site-packages/fastaai/fastaai.py", line 3335, in run
self.db_on_disk()
File "/home/filipe/.local/lib/python3.10/site-packages/fastaai/fastaai.py", line 3265, in db_on_disk
self.write_mat_from_files(result_files, tempdir_path)
File "/home/filipe/.local/lib/python3.10/site-packages/fastaai/fastaai.py", line 3288, in write_mat_from_files
fh = open(f, "r")
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpgi3knqay/partial_results_group_7.txt'
The matrix file is produced with only a small part of the accessions. Any idea how to fix this?
MiGA base code uses this repo, but it cannot be pulled because it's private
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.