Code Monkey home page Code Monkey logo

fastsubtrees's Introduction

  • ๐Ÿ‘‹ Hi, Iโ€™m @ggonnella
  • ๐Ÿ‘€ Iโ€™m interested in genomics, bioinformatics, strings algorithms, DNA sequences and more.
  • ๐Ÿ I am the author of several open source Python packages, which can be installed using 'pip'.
  • ๐ŸŒฑ Iโ€™m currently learning Rust.
  • ๐Ÿ’ž๏ธ Iโ€™m looking to collaborate on interesting life science-related software projects.
  • ๐Ÿ“ซ How to reach me in LinkedIn: https://www.linkedin.com/in/giorgio-gonnella-36677620/

fastsubtrees's People

Contributors

ggonnella avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

konradhoeffner

fastsubtrees's Issues

Dockerfile does not build from local copy

In docker/Dockerfile:

RUN pip install 'ntmirror @ git+https://github.com/ggonnella/fastsubtrees/#subdirectory=ntmirror'
RUN ntmirror-download ntdumpdir
RUN pip install 'fastsubtrees @ git+https://github.com/ggonnella/fastsubtrees'

This builds from the latest copy on Git but this causes three major problems:

  1. local fixes that are not yet pushed to Git cannot be tested
  2. when checking out an older state of fastsubtrees, the Dockerfile is not in sync with the fastsubtrees, which may cause any number of detectable or undetectable errors
  3. you cannot use the Dockerfile as is in a fork
  4. it can break new versions when old versions are in the Docker build cache (see issue #5)

I recommend building from the local filesystem instead to circumvent all those errors.

simplify the installation of the example application

The example application does not need a database access. However, it still requires ntmirror to download the NCBI taxonomy data from NCBI.

A separation of the ntmirror package into two packages (ntdownload, ntmirror) will allow to further simplify running the example application, without requiring to setup the database, by making it only dependent on ntdownload.

ntmirror will remain for the comparative tests, which require the data to be loaded into a database

scripts and commands too complicated

There is a mix of makefiles, pip install, external dependencies, Docker and more and some of it is inside other folders.
I would prefer it if I would never have to mix things, for example when I'm using the Docker image I should not need to execute a makefile because Docker can wrap all of that in the container.
Then for example for Docker it could just be:

docker run fastsubtrees test
docker run fastsubtrees benchmark
docker run fastsubtrees ...

constructing tree data and index parallelizable?

Constructing the tree data and index takes several minutes and only seems to use one CPU core, is it possible to parallelize that?

fastsubtrees$ docker exec fastsubtrees benchmarks
# Downloading the NCBI taxonomy dump...
# Building the fastsubtrees NCBI taxonomy tree...
2022-10-11 08:09:41 INFO: Constructing tree from NCBI taxonomy dump file ntdumpdir/nodes.dmp
2022-10-11 08:09:41 INFO: Constructing temporary parents table...
2022-10-11 08:09:41 INFO: Reading data from file "ntdumpdir/nodes.dmp" ...
2447844it [00:01, 1349487.11it/s]
2022-10-11 08:09:43 INFO: Constructing subtree sizes table...
2984676it [00:06, 451387.87it/s]
2022-10-11 08:09:50 INFO: Constructing tree data and index...
100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 2984676/2984676 [05:57<00:00, 8351.77it/s]  
2022-10-11 08:15:47 SUCCESS: Tree data structure constructed
2022-10-11 08:15:47 SUCCESS: Tree constructed
2022-10-11 08:15:47 SUCCESS: Tree written to file "ncbi-taxonomy.tree"

benchmark runs on one CPU core only?

I am currently running the benchmark, which takes a long time, and it only seems to use one CPU core. Is that only the benchmark part or is fastsubtrees in general not parallelized?
Or is it just the SQL part?
At least modern PostgreSQL versions can run a single query on multiple cores, which is an important factor with current CPUs having around 6-16 cores.

pip install fails

Using Python 3.10.7 on Arch Linux, pip install fastsubtrees fails with the output below.
This is an acceptance blocker for the JOSS review openjournals/joss-reviews#4755.

 pip install fastsubtrees
Defaulting to user installation because normal site-packages is not writeable
Collecting fastsubtrees
  Downloading fastsubtrees-1.1-py3-none-any.whl (17 kB)
Collecting tqdm>=4.57.0
  Downloading tqdm-4.64.1-py2.py3-none-any.whl (78 kB)
     โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 78.5/78.5 kB 3.6 MB/s eta 0:00:00
Collecting sh>=1.14.2
  Downloading sh-1.14.3.tar.gz (62 kB)
     โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 62.9/62.9 kB 6.2 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Collecting loguru>=0.5.1
  Downloading loguru-0.6.0-py3-none-any.whl (58 kB)
     โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 58.3/58.3 kB 5.0 MB/s eta 0:00:00
Collecting docopt>=0.6.2
  Downloading docopt-0.6.2.tar.gz (25 kB)
  Preparing metadata (setup.py) ... done
Collecting ntmirror>=1.2
  Downloading ntmirror-1.2-py3-none-any.whl (10 kB)
Collecting schema>=0.7.4
  Downloading schema-0.7.5-py2.py3-none-any.whl (17 kB)
Requirement already satisfied: PyYAML>=6.0 in /usr/lib/python3.10/site-packages (from fastsubtrees) (6.0)
Collecting mariadb
  Downloading mariadb-1.1.4.zip (97 kB)
     โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 97.4/97.4 kB 7.5 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  ร— python setup.py egg_info did not run successfully.
  โ”‚ exit code: 1
  โ•ฐโ”€> [19 lines of output]
      /bin/sh: line 1: mariadb_config: command not found
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-mj6twek8/mariadb_db9a53888c2a40b99114f01c2c82d527/setup.py", line 27, in <module>
          cfg = get_config(options)
        File "/tmp/pip-install-mj6twek8/mariadb_db9a53888c2a40b99114f01c2c82d527/mariadb_posix.py", line 62, in get_config
          cc_version = mariadb_config(config_prg, "cc_version")
        File "/tmp/pip-install-mj6twek8/mariadb_db9a53888c2a40b99114f01c2c82d527/mariadb_posix.py", line 28, in mariadb_config
          raise EnvironmentError(
      OSError: mariadb_config not found.
      
      This error typically indicates that MariaDB Connector/C, a dependency which
      must be preinstalled, is not found.
      If MariaDB Connector/C is not installed, see installation instructions
      If MariaDB Connector/C is installed, either set the environment variable
      MARIADB_CONFIG or edit the configuration file 'site.cfg' to set the
       'mariadb_config' option to the file location of the mariadb_config utility.
      
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

ร— Encountered error while generating package metadata.
โ•ฐโ”€> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

implement CLI tools as subcommands

@vinisalazar wrote:

CLI subcommands (fastsubtrees-construct, -query, -add-subtree etc) should be made into subparsers of a single command
fastsubtrees, which when called upon lists all available single commands and a one-line description.

Troubles with genome-attributes-viewer in the new Docker image

@vinisalazar wrote:

I rebuilt the Docker image today after the updates. I still can successfully run the benchmarks from the Docker container, but I was having trouble with the genome-attributes-viewer app, it started, I ran a couple of queries, and then it crashed with this traceback:

 root@d8798c592652:/fastsubtrees/docker# ./start-example-app
Dash is running on http://0.0.0.0:8050/

 * Serving Flask app 'start' (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: on
./start-example-app: line 15:   823 Killed                  ./start.py
root@d8798c592652:/fastsubtrees/docker# Traceback (most recent call last):
  File "/fastsubtrees/genomes-attributes-viewer/start.py", line 171, in <module>
    app.run_server(debug=True, host='0.0.0.0')
  File "/usr/local/lib/python3.10/dist-packages/dash/dash.py", line 2033, in run_server
    self.server.run(host=host, port=port, debug=debug, **flask_run_options)
  File "/usr/local/lib/python3.10/dist-packages/flask/app.py", line 920, in run
    run_simple(t.cast(str, host), port, self, **options)
  File "/usr/local/lib/python3.10/dist-packages/werkzeug/serving.py", line 1000, in run_simple
    _rwr(
  File "/usr/local/lib/python3.10/dist-packages/werkzeug/_reloader.py", line 418, in run_with_reloader
    ensure_echo_on()
  File "/usr/local/lib/python3.10/dist-packages/werkzeug/_reloader.py", line 398, in ensure_echo_on
    termios.tcsetattr(sys.stdin, termios.TCSANOW, attributes)
termios.error: (5, 'Input/output error')

tests fail

Using Python 3.10.7 on Arch Linux, pytest fails with the output below.
This is an acceptance blocker for the JOSS review openjournals/joss-reviews#4755.

fastsubtrees$ pytest
==================================================================== test session starts =====================================================================
platform linux -- Python 3.10.7, pytest-7.1.3, pluggy-1.0.0
rootdir: /home/konrad/tmp/fastsubtrees
collected 10 items / 1 error                                                                                                                                 

=========================================================================== ERRORS ===========================================================================
________________________________________________________ ERROR collecting tests/test_fastsubtrees.py _________________________________________________________
ImportError while importing test module '/home/konrad/tmp/fastsubtrees/tests/test_fastsubtrees.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/test_fastsubtrees.py:6: in <module>
    from tests.reference_results import *
E   ModuleNotFoundError: No module named 'tests'
====================================================================== warnings summary ======================================================================
ntmirror/tests/test_dbload_cli.py:8
  /home/konrad/tmp/fastsubtrees/ntmirror/tests/test_dbload_cli.py:8: PytestUnknownMarkWarning: Unknown pytest.mark.script_launch_mode - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
    @pytest.mark.script_launch_mode('subprocess')

ntmirror/tests/test_dbload_cli.py:21
  /home/konrad/tmp/fastsubtrees/ntmirror/tests/test_dbload_cli.py:21: PytestUnknownMarkWarning: Unknown pytest.mark.script_launch_mode - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
    @pytest.mark.script_launch_mode('subprocess')

ntmirror/tests/test_downloader_cli.py:20
  /home/konrad/tmp/fastsubtrees/ntmirror/tests/test_downloader_cli.py:20: PytestUnknownMarkWarning: Unknown pytest.mark.script_launch_mode - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
    @pytest.mark.script_launch_mode('subprocess')

ntmirror/tests/test_downloader_cli.py:33
  /home/konrad/tmp/fastsubtrees/ntmirror/tests/test_downloader_cli.py:33: PytestUnknownMarkWarning: Unknown pytest.mark.script_launch_mode - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
    @pytest.mark.script_launch_mode('subprocess')

ntmirror/tests/test_downloader_cli.py:46
  /home/konrad/tmp/fastsubtrees/ntmirror/tests/test_downloader_cli.py:46: PytestUnknownMarkWarning: Unknown pytest.mark.script_launch_mode - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
    @pytest.mark.script_launch_mode('subprocess')

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================================== short test summary info ===================================================================
ERROR tests/test_fastsubtrees.py
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
================================================================ 5 warnings, 1 error in 0.17s ================================================================

community guidelines in README

@KonradHoeffner wrote:

There are no explicit community guidelines in the README.md but it's a GitHub repository with an issue tracker for reporting issues and seek support, but the way the checklist is written implies this is not enough

Github actions

@vinisalazar wrote:

it would be nice to have a GitHub Actions workflow to run the tests, and perhaps an additional action for things like linting/code style/etc -- could be done with pre-commit.

requirements file

@vinisalazar:

The requirements file should explicitly list the requirements. Having it as is (only content is -e ., presumably for
pip install or a setup command) is bad practice. Either remove this file or modify it to list the requirements.

unsupported operant type in app

I tried to compare two taxons, which worked a few times, but then I got the following error:

Callback error updating ..boxplot.figure...boxplot.style...histogram.figure...histogram.style...add-comparison.disabled..
Traceback (most recent call last):
  File "/fastsubtrees/genomes-attributes-viewer/start.py", line 140, in update_figure
    boxplot_dict[id + ')'] = []
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

output of fastsubtrees query

@vinisalazar wrote:

I also struggled with the output of the fastsubtrees-attr-query command, for example:

root@d8798c592652:/fastsubtrees# fastsubtrees-attr-query ncbi-taxonomy.tree GC_content 1129
2022-10-11 03:51:44 SUCCESS: Tree loaded from file "ncbi-taxonomy.tree"
2022-10-11 03:51:44 INFO: Subtree of node 1129 has size 1762
[None, None, None, None, None, [0.49187], None, None, None, None, [0.633354, 0.63345], [0.524473], None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, [0.563698], None, None, None, None, [0.600861], None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, [0.493706], None, None, None, [0.485047], None, None, None, None, None, None, None, None, None, None, None, None, [0.491264], None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, [0.604614], None, None, None, [0.597736], None, None, None, None, [0.603592], None, [0.644399], None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, [0.64059], None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, [0.541645], None, None, [0.602373], None, None, None, None, [0.584503], None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, [0.492615], [0.493446], None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, [0.613678], None, [0.590853], None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, [0.606186], None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, [0.607883], None, None, None, None, None, None, None, None, [0.58221], None, [0.599123], None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, [0.534223], None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, [0.406211], None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, [0.574981], None, None, None, None, [0.646371], [0.67057], [0.645688], [0.636933], None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, [0.534347], None, None, [0.549756], [0.583811], None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, [0.603577], None, [0.613608], [0.59087], [0.588178], [0.592676], [0.539637], [0.607604], None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, [0.49424], None, [0.581548], None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, [0.639157], None, None, [0.538882], None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, [0.679814], None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, [0.625627], None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, [0.66265], None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, [0.631121], None, None, None, [0.618893], None, None, None, None, None, None, [0.554843, 0.554375], [0.554356], None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, [0.55123], None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]

This output is confusing to me. I understand that these are GC content values, but how can I see which value belongs to which node? When redirecting this to a file, for example, one gets this Python representation of a list, where it would be preferable to have a stdout of one value per line.

can't build Docker image: Failed to connect to ftp.ncbi.nih.gov port 21: Connection refused

Either the file does not exist anymore or my organization is blocking the FTP ports for security reasons. Would it be possible to host that file somewhere else? For example on GitHub over HTTPS.

~$ cd tmp/fastsubtrees/docker 
docker$ docker build . -t fastsubtrees
[+] Building 90.4s (22/30)                                                                                                                                    
 => [internal] load build definition from Dockerfile                                                                                                     0.9s
 => => transferring dockerfile: 1.59kB                                                                                                                   0.0s
 => [internal] load .dockerignore                                                                                                                        0.8s
 => => transferring context: 2B                                                                                                                          0.0s
 => resolve image config for docker.io/docker/dockerfile:1                                                                                               1.2s
 => CACHED docker-image://docker.io/docker/dockerfile:1@sha256:9ba7531bd80fb0a858632727cf7a112fbfd19b17e94c4e84ced81e24ef1a0dbc                          0.0s
 => [internal] load .dockerignore                                                                                                                        0.0s
 => [internal] load build definition from Dockerfile                                                                                                     0.0s
 => [internal] load metadata for docker.io/library/mariadb:10.6                                                                                          0.7s
 => [ 1/22] FROM docker.io/library/mariadb:10.6@sha256:fa578ca359c2d56d9f194abd69c7603077429adc03a1e28e00bf3ed864e5a162                                 10.7s
 => => resolve docker.io/library/mariadb:10.6@sha256:fa578ca359c2d56d9f194abd69c7603077429adc03a1e28e00bf3ed864e5a162                                    0.0s
 => => sha256:fa578ca359c2d56d9f194abd69c7603077429adc03a1e28e00bf3ed864e5a162 979B / 979B                                                               0.0s
 => => sha256:34b09daa97dd250d4f95430737cf6b92048e45a647b270aedb04ff3d00d638b2 8.28kB / 8.28kB                                                           0.0s
 => => sha256:9bbcda1c9ef5a7c692fcd746e20ff8cf4432fd32e8bf9bb8efe01fa325e83a99 2.62kB / 2.62kB                                                           0.0s
 => => sha256:c994327ef3cc11acbe7e8fa87751b72723e88d9cf38ff3ebf91b05a1a2945548 149B / 149B                                                               0.5s
 => => sha256:40d12c21872a92b0b6dd483317d136caa739fa9b0cbe7d444d0267b0ebfe2adc 2.26MB / 2.26MB                                                           0.4s
 => => sha256:dd18621bd4ea9c95166fdbbfede9511a59701cd8000726cf6e8a6f6b429617f7 2.49kB / 2.49kB                                                           0.4s
 => => extracting sha256:c994327ef3cc11acbe7e8fa87751b72723e88d9cf38ff3ebf91b05a1a2945548                                                                0.0s
 => => sha256:582572716dc8228c293bb77ede11deb84fb1c1f137618356e61ba11ff48b45fa 325B / 325B                                                               0.7s
 => => sha256:f0857ffe38ff40e8be228e963b996397db15ccf67118527159f4e458621113e2 88.49MB / 88.49MB                                                         2.0s
 => => extracting sha256:40d12c21872a92b0b6dd483317d136caa739fa9b0cbe7d444d0267b0ebfe2adc                                                                0.2s
 => => sha256:e3f76ad8eca3132254de759ea88d7ea655129f6e170b4d426d76d1843f38260c 3.49kB / 3.49kB                                                           0.8s
 => => sha256:edffdfb1bdec3b59b6b56bbe6cd2c8296939ffddb4bece07e3dee8cd6cef4e7b 7.05kB / 7.05kB                                                           0.9s
 => => extracting sha256:dd18621bd4ea9c95166fdbbfede9511a59701cd8000726cf6e8a6f6b429617f7                                                                0.0s
 => => extracting sha256:582572716dc8228c293bb77ede11deb84fb1c1f137618356e61ba11ff48b45fa                                                                0.0s
 => => extracting sha256:f0857ffe38ff40e8be228e963b996397db15ccf67118527159f4e458621113e2                                                                7.0s
 => => extracting sha256:e3f76ad8eca3132254de759ea88d7ea655129f6e170b4d426d76d1843f38260c                                                                0.1s
 => => extracting sha256:edffdfb1bdec3b59b6b56bbe6cd2c8296939ffddb4bece07e3dee8cd6cef4e7b                                                                0.0s
 => [internal] load build context                                                                                                                        0.3s
 => => transferring context: 194B                                                                                                                        0.0s
 => [ 2/22] RUN apt-get update -y                                                                                                                        2.5s
 => [ 3/22] RUN apt-get install -y software-properties-common                                                                                           26.9s
 => [ 4/22] RUN add-apt-repository -y ppa:deadsnakes/ppa                                                                                                 4.0s 
 => [ 5/22] RUN apt-get install -y python3.8 python3-pip wget curl git time                                                                             17.4s 
 => [ 6/22] RUN ln -s /usr/bin/python3.8 /usr/bin/python                                                                                                 0.6s 
 => [ 7/22] RUN wget https://downloads.mariadb.com/MariaDB/mariadb_repo_setup                                                                            0.8s 
 => [ 8/22] RUN chmod +x mariadb_repo_setup                                                                                                              0.5s 
 => [ 9/22] RUN ./mariadb_repo_setup --mariadb-server-version=mariadb-10.6                                                                               4.6s 
 => [10/22] RUN apt-get install -y libmariadb3 libmariadb-dev                                                                                            2.6s 
 => [11/22] RUN pip install mariadb pytest PyYAML sqlalchemy sh pytest-console-scripts                                                                   4.9s 
 => [12/22] RUN pip install schema snacli                                                                                                                6.4s 
 => [13/22] RUN pip install 'ntmirror @ git+https://github.com/ggonnella/fastsubtrees/#subdirectory=ntmirror'                                            4.4s 
 => ERROR [14/22] RUN ntmirror-download ntdumpdir                                                                                                        0.8s 
------                                                                                                                                                        
 > [14/22] RUN ntmirror-download ntdumpdir:                                                                                                                   
#21 0.692 2022-10-07 10:14:33 | ERROR |                                                                                                                       
#21 0.692                                                                                                                                                     
#21 0.692   RAN: /usr/bin/curl ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz -o ntdumpdir/taxdump.tar.gz -w '%{size_download}' -R                        
#21 0.692                                                                                                                                                     
#21 0.692   STDOUT:
#21 0.692 0
#21 0.692 
#21 0.692   STDERR:
#21 0.692   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
#21 0.692                                  Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
#21 0.692 curl: (7) Failed to connect to ftp.ncbi.nih.gov port 21: Connection refused
#21 0.692 
------

Automatical download and construction of the tree and inclusion of taxon names

@vinisalazar wrote

I was also able to run some fastsubtrees-query commands, which got me thinking it would be nice to have other attributes of a taxonomy entry (e.g. Scientific Name, Lineage, etc) in the output. Maybe this could be achieved by using taxonkit along with fastsubtrees.

I maintain my impression that fastsubtrees has legitimate and useful applications, but I think users will struggle to adopt it given the current installation procedure and command interface. I think that more focus could be given on having users easily (i.e. automatically) index and build the NCBI Taxonomy Tree and run queries on it. The genome-attributes-viewer app should also be able of being deployed automatically with a single command after install. Ideally, upon installing fastsubtrees with conda, users should already have a prebuilt NCBI Taxonomy Tree and could start the Dash app right away.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.