Code Monkey home page Code Monkey logo

amber's People

Contributors

abremges avatar alphasquad avatar fernandomeyer avatar graingert avatar p-hofmann avatar pbelmann avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

amber's Issues

v2.0.17-beta TWO bugs

hello, @fernandomeyer
I have met two bugs when using AMBER v2.0.17-beta.

  1. index.html
    批注 2019-06-28 162303
    the pictures of worst, medium, best have failed to load.

  2. purity_completeness_seq.png
    批注 2019-06-28 162702
    purity_completeness_seq.png is blank.

Error in creating html page

Hello,

I am encountering an error (pasted below) while running AMBER for taxonomic binning on the CAMI medium complexity dataset.

2020-09-08 19:10:25,561 INFO Loading NCBI files
2020-09-08 19:10:43,554 INFO Loading Gold standard
2020-09-08 19:10:43,599 INFO Loading predictions_10
2020-09-08 19:10:43,616 INFO Creating output directories
2020-09-08 19:10:43,696 INFO Evaluating Gold standard (sample gs, taxonomic binning)
2020-09-08 19:11:05,368 INFO Evaluating predictions_10 (sample gs, taxonomic binning)
/cbio/donnees/rmenegaux/miniconda3/envs/amber/lib/python3.7/site-packages/src/binning_classes.py:306: RuntimeWarning: invalid value encountered in double_scalars
  (utils_labels.F1_SCORE_BP, [2 * self.__precision_avg_bp * self.__recall_avg_bp / (self.__precision_avg_bp + self.__recall_avg_bp)]),
/cbio/donnees/rmenegaux/miniconda3/envs/amber/lib/python3.7/site-packages/src/binning_classes.py:313: RuntimeWarning: invalid value encountered in double_scalars
  (utils_labels.F1_SCORE_SEQ, [2 * self.__precision_avg_seq * self.__recall_avg_seq / (self.__precision_avg_seq + self.__recall_avg_seq)]),
/cbio/donnees/rmenegaux/miniconda3/envs/amber/lib/python3.7/site-packages/src/binning_classes.py:319: RuntimeWarning: invalid value encountered in double_scalars
  (utils_labels.F1_SCORE_PER_BP, [2 * self.__precision_weighted_bp * self.__recall_weighted_bp / (self.__precision_weighted_bp + self.__recall_weighted_bp)]),
/cbio/donnees/rmenegaux/miniconda3/envs/amber/lib/python3.7/site-packages/src/binning_classes.py:320: RuntimeWarning: invalid value encountered in double_scalars
  (utils_labels.F1_SCORE_PER_SEQ, [2 * self.__precision_weighted_seq * self.__recall_weighted_seq / (self.__precision_weighted_seq + self.__recall_weighted_seq)]),
2020-09-08 19:11:22,665 INFO Saving computed metrics
2020-09-08 19:11:22,872 INFO Creating taxonomic binning plots
/cbio/donnees/rmenegaux/miniconda3/envs/amber/lib/python3.7/site-packages/src/plots.py:343: UserWarning: FixedFormatter should only be used together with FixedLocator
  axs.set_xticklabels(['{:3.0f}'.format(x * 100) for x in vals], fontsize=11)
...
*(Error above repeated a bunch of times)*

2020-09-08 19:11:46,422 INFO Creating HTML page
Traceback (most recent call last):
  File "/cbio/donnees/rmenegaux/miniconda3/envs/amber/bin/amber.py", line 302, in <module>
    main()
  File "/cbio/donnees/rmenegaux/miniconda3/envs/amber/bin/amber.py", line 297, in main
    args.desc)
  File "/cbio/donnees/rmenegaux/miniconda3/envs/amber/lib/python3.7/site-packages/src/amber_html.py", line 848, in create_html
    metrics_row_t = create_taxonomic_binning_html(df_summary, pd_bins[pd_bins['rank'] != 'NA'], labels, sample_ids_list, options)
  File "/cbio/donnees/rmenegaux/miniconda3/envs/amber/lib/python3.7/site-packages/src/amber_html.py", line 777, in create_taxonomic_binning_html
    rank_to_sample_to_html[rank].append(create_table_html(pd_mean_rank.T, is_taxonomic=True))
  File "/cbio/donnees/rmenegaux/miniconda3/envs/amber/lib/python3.7/site-packages/src/amber_html.py", line 450, in create_table_html
    html += df_metrics.style.apply(get_heatmap_colors, df_metrics=df_metrics, axis=1).set_precision(3).set_table_styles(this_style).render()
  File "/cbio/donnees/rmenegaux/miniconda3/envs/amber/lib/python3.7/site-packages/pandas/io/formats/style.py", line 540, in render
    self._compute()
...
  File "/cbio/donnees/rmenegaux/miniconda3/envs/amber/lib/python3.7/site-packages/pandas/core/frame.py", line 467, in __init__
    mgr = init_dict(data, index, columns, dtype=dtype)
  File "/cbio/donnees/rmenegaux/miniconda3/envs/amber/lib/python3.7/site-packages/pandas/core/internals/construction.py", line 283, in init_dict
    return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
  File "/cbio/donnees/rmenegaux/miniconda3/envs/amber/lib/python3.7/site-packages/pandas/core/internals/construction.py", line 78, in arrays_to_mgr
    index = extract_index(arrays)
  File "/cbio/donnees/rmenegaux/miniconda3/envs/amber/lib/python3.7/site-packages/pandas/core/internals/construction.py", line 397, in extract_index
    raise ValueError("arrays must all be same length")
ValueError: arrays must all be same length

The command that gets this error is the following:

amber.py predictions_10.binning --gold_standard_file ground_truth_10.binning --ncbi_nodes_file nodes.dmp --ncbi_names_file names.dmp --ncbi_merged_file merged.dmp --filter 1 --output_dir output_filter_1

The nodes files are freshly downloaded off NCBI, and the ground truth and predictions toy files are:

cat predictions_10.binning 
@Version:0.9.1
@SampleID:gs

@@SEQUENCEID	TAXID
RM2|S1|R0	222805
RM2|S1|R1	187303
RM2|S1|R2	1525
RM2|S1|R3	146919
RM2|S1|R4	1488
RM2|S1|R5	305
cat ground_truth_10.binning 
@Version:0.9.1
@SampleID:gs

@@SEQUENCEID	BINID	TAXID	_READID	_LENGTH
RM2|S1|R0	1030896	1123266	scaffold00002_27-953956	100
RM2|S1|R1	1220_BD	169973	scaffold9.1_8-4249	100
RM2|S1|R2	1036704	1123003	scaffold00002_48-138142	100
RM2|S1|R3	1285_CK	460257	scaffold2.1_10-583737	100
RM2|S1|R4	evo_1035921.028	745369	contig_5_4-113862	100
RM2|S1|R5	1139_Y	169973	scaffold15.1_21-8412	100

PS: This error does not come systematically, and I managed to make it work for some prediction files.

Issue when running add_length_column.py

I am trying to call the help for the script with python3 add_length_column.py -h. And I am getting the following output:

Traceback (most recent call last):
  File "/home/users/pnovikova/binning-refinement/scripts/add_length_column.py", line 26, in <module>
    import argparse_parents
ModuleNotFoundError: No module named 'argparse_parents'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/users/pnovikova/binning-refinement/scripts/add_length_column.py", line 30, in <module>
    import argparse_parents
ModuleNotFoundError: No module named 'argparse_parents'

The argparse module is installed. And as far as I see from google, there's no such module as argparse_parents. What am I doing wrong?

Thanks!

AMBER in Dockerhub

AMBER should be registered in Dockerhub and also the readme should be updated. By doing this a user does not have to build the image on his system but directly use it. The correct build should also be tested on circleci.

Updates for AMBER requirements

Dear developer,
I found while setting up AMBER in the system via pip that some more packages with specific versions should be installed.

  1. scipy=1.8.0 (to be compatible with NumPy 1.18.3)
  2. jinja2=2.10.1
  3. MarkupSafe=2.0.1
  4. Flask=1.0.3

It would be useful if it is included in the prerequisite file for future users.

Thank you.

Best,
Yazhini

Remove warning when building html plots

When building html plots the following warning is thrown.

Example:

"/home/fmeyer/.local/lib/python3.5/site-packages/bokeh/util/deprecation.py:34:
BokehDeprecationWarning:
Supplying a user-defined data source AND iterable values to glyph
methods is deprecated.

See https://github.com/bokeh/bokeh/issues/2056 for more information.

   warn(message)"

Question of the output

Hi
I have tried to use the AMBER, but I am confused about the output that there are Average completeness (bp) and CAMI 1 average completeness (bp). What is the difference between them?Thanks for your suggestions!

gold standard mapping column BINID vs binning assignment file

Hi,

I am trying to evaluate bins quality with AMBER. Unfortunately I am bit confused about the naming of the columns of the gold standard mapping file and the binning assignment file.
My gold standard mapping file looks like this:
image

Whereas the binning assignment looks like this:
image

I am a bit confused about how BINID contains different objects in each file. I also tried using the gold standard assignment mapping file with the TAXID column:

When the AMBER is executed with the following command:
amber.py -g gold_standard_file.tsv binning_file.tsv -o output_dir
BINID and genome_id columns present the same value:
image

Thank you so much for your help,
Pau

Error creating genome binning plots

Hi,
I get the following error when running amber on some binnings of the medium complexity toy data set of cami:

2020-02-07 10:00:58,977 INFO done
2020-02-07 10:00:58,979 INFO Computing metrics for Gold standard - genome binning, CAMI_toy_medium...
2020-02-07 10:01:01,114 INFO done
2020-02-07 10:01:01,118 INFO Computing metrics for CONCOCT - genome binning, CAMI_toy_medium...
2020-02-07 10:01:01,628 INFO done
2020-02-07 10:01:01,628 INFO Computing metrics for MaxBin2 - genome binning, CAMI_toy_medium...
2020-02-07 10:01:02,127 INFO done
2020-02-07 10:01:02,128 INFO Computing metrics for MetaBAT2 - genome binning, CAMI_toy_medium...
2020-02-07 10:01:02,619 INFO done
2020-02-07 10:01:02,619 INFO Computing metrics for MetaWrap - genome binning, CAMI_toy_medium...
2020-02-07 10:01:02,950 INFO done
2020-02-07 10:01:02,950 INFO Computing metrics for MetaWrap_ra - genome binning, CAMI_toy_medium...
2020-02-07 10:01:03,245 INFO done
2020-02-07 10:01:03,245 INFO Computing metrics for MetaWrap_qc - genome binning, CAMI_toy_medium...
2020-02-07 10:01:03,575 INFO done
2020-02-07 10:01:03,575 INFO Computing metrics for DAS_tool - genome binning, CAMI_toy_medium...
2020-02-07 10:01:03,900 INFO done
2020-02-07 10:01:03,902 INFO Saving computed metrics...
2020-02-07 10:01:03,980 INFO done
2020-02-07 10:01:03,981 INFO Creating genome binning plots...
Traceback (most recent call last):
  File "../../tools/AMBER/amber.py", line 412, in <module>
    main()
  File "../../tools/AMBER/amber.py", line 398, in main
    plot_genome_binning(sample_id_to_queries_list, df_summary, pd_bins, args.plot_heatmaps, output_dir)
  File "../../tools/AMBER/amber.py", line 269, in plot_genome_binning
    plots.plot_avg_precision_recall(df_summary_g, output_dir)
  File "/mnt/lscratch/users/ohickl/binning/tools/AMBER/src/plots.py", line 262, in plot_avg_precision_recall
    'Average completeness per genome [%]')
  File "/mnt/lscratch/users/ohickl/binning/tools/AMBER/src/plots.py", line 240, in plot_summary
    plt.tight_layout()
  File "/home/users/ohickl/anaconda3/envs/amber/lib/python3.7/site-packages/matplotlib/pyplot.py", line 1352, in tight_layout
    fig.tight_layout(pad=pad, h_pad=h_pad, w_pad=w_pad, rect=rect)
  File "/home/users/ohickl/anaconda3/envs/amber/lib/python3.7/site-packages/matplotlib/figure.py", line 2307, in tight_layout
    pad=pad, h_pad=h_pad, w_pad=w_pad, rect=rect)
  File "/home/users/ohickl/anaconda3/envs/amber/lib/python3.7/site-packages/matplotlib/tight_layout.py", line 349, in get_tight_layout_figure
    pad=pad, h_pad=h_pad, w_pad=w_pad)
  File "/home/users/ohickl/anaconda3/envs/amber/lib/python3.7/site-packages/matplotlib/tight_layout.py", line 114, in auto_adjust_subplotpars
    tight_bbox_raw = union([ax.get_tightbbox(renderer) for ax in subplots
  File "/home/users/ohickl/anaconda3/envs/amber/lib/python3.7/site-packages/matplotlib/tight_layout.py", line 115, in <listcomp>
    if ax.get_visible()])
  File "/home/users/ohickl/anaconda3/envs/amber/lib/python3.7/site-packages/matplotlib/axes/_base.py", line 4198, in get_tightbbox
    bb_xaxis = self.xaxis.get_tightbbox(renderer)
  File "/home/users/ohickl/anaconda3/envs/amber/lib/python3.7/site-packages/matplotlib/axis.py", line 1145, in get_tightbbox
    ticks_to_draw = self._update_ticks(renderer)
  File "/home/users/ohickl/anaconda3/envs/amber/lib/python3.7/site-packages/matplotlib/axis.py", line 1028, in _update_ticks
    tick_tups = list(self.iter_ticks())  # iter_ticks calls the locator
  File "/home/users/ohickl/anaconda3/envs/amber/lib/python3.7/site-packages/matplotlib/axis.py", line 978, in iter_ticks
    minorTicks = self.get_minor_ticks(len(minorLocs))
  File "/home/users/ohickl/anaconda3/envs/amber/lib/python3.7/site-packages/matplotlib/axis.py", line 1415, in get_minor_ticks
    tick = self._get_tick(major=False)
  File "/home/users/ohickl/anaconda3/envs/amber/lib/python3.7/site-packages/matplotlib/axis.py", line 1792, in _get_tick
    return XTick(self.axes, 0, '', major=major, **tick_kw)
  File "/home/users/ohickl/anaconda3/envs/amber/lib/python3.7/site-packages/matplotlib/axis.py", line 178, in __init__
    self.gridline = self._get_gridline()
  File "/home/users/ohickl/anaconda3/envs/amber/lib/python3.7/site-packages/matplotlib/axis.py", line 503, in _get_gridline
    **self._grid_kw)
  File "/home/users/ohickl/anaconda3/envs/amber/lib/python3.7/site-packages/matplotlib/lines.py", line 391, in __init__
    self.set_linestyle(linestyle)
  File "/home/users/ohickl/anaconda3/envs/amber/lib/python3.7/site-packages/matplotlib/lines.py", line 1125, in set_linestyle
    self._us_dashOffset, self._us_dashSeq, self._linewidth)
  File "/home/users/ohickl/anaconda3/envs/amber/lib/python3.7/site-packages/matplotlib/lines.py", line 68, in _scale_dashes
    scaled_offset = offset * lw
TypeError: can't multiply sequence by non-int of type 'float'

Could that be some python2 > python3 problem?

Best

Oskar

Error creating html - character encoding

Hi Amber runs until the HTML phase and creates the expected outputs. In the html phase I get the following output:

2020-11-04 19:31:20,810 INFO Creating HTML page
Traceback (most recent call last):
  File "cami-env/bin/amber.py", line 302, in <module>
    main()
  File "cami-env/bin/amber.py", line 297, in main
    args.desc)
  File "/<path>/cami-env/lib/python3.7/site-packages/src/amber_html.py", line 872, in create_html
    f.write(html)
UnicodeEncodeError: 'latin-1' codec can't encode character '\ufffd' in position 775969: ordinal not in range(256)

I am running in a Python3.7 virtual environment and installed Amber using pip3 per the instructions on github.

Do you know what causes this error?

Bug which make the option genome_coverage not usable

Hello,

After the version update, the option genome_coverage it is not usable since the new code doesn't work with it. I did a PR #56 with a possible fix. I hope this was okay to do. Could it also be tested from your side, since it didn't see a difference in the output compare with the output from version 2.0.4. After the review, it is also possible to release the new version?

Thank you!

Filter contigs by length

First of all, great work with AMBER!

I ran AMBER on the mouse gut toy dataset, which contains many very small contigs in the GSA.
Some bins exclusively contain small contigs and are not recoverable by common genome binners.

It is already possible to manually exclude a set of genomes; I propose a complementary feature:
Filter contigs by size (threshold set by user, default e.g. 2.5kb) and exclude these from analyses.
This will also remove some gold standard bins completely (if they don't contain longer contigs).

a question about AMBER 2.0.21-beta

I installed AMBER 2.0.21-beta and my team member installed another version we found that the values of the Completeness (bp) and Purity (bp) in AMBER 2.0.21-beta are same as avg_completeness_per_bp and avg_purity_per_bp in the other version. Has the names of the metrics changed from that used in the paper? What's the difference between Average purity (bp) in AMBER 2.0.21-beta and avg_purity_per_bp in older version? Thanks.

mixed dtype for BINID can cause bins with same IDs to be split into separate bins

if BINID is a mixture of strings/ints I've noticed that individual int values can be imported as both strings and ints essentially splitting a single bin into two.

I believe it's an issue with how pandas imports the dataframes and may only happen with large files. the issue appears to be resolved by using either int or string values for BINID

amber only considers bins with completeness > 0 ?

Hi,
I have a sample that I want to bin only a fraction of - for example I have 10 bins in the gold standard and the sample has a bunch of other sequences that I don't want to put into bins. A binning tool creates 20 bins, of which 15 match to 5 gold standard bins. IE there are 5 bins created by the binner which don't match to any gold standard bin. Is there any way for amber to handle a situation like this?

(I can send you more details and example data by email if you wish - mine is dpellow post.tau.ac.il)

Old files archive are not correct

Hello,

sorry for writing the issue here, since I didn't find an email to contact any of the CAMI staff. I'm currently working on my bachelor thesis which including building a workflow on the web server https://usegalaxy.eu/ which serve a lot of different tools in the bioinformatic fields. Since amber is up there now, I need some benchmarks to test the workflow and I did discover that you are providing the old archive like cami low or mouse gut toy etc. I did work with the cami low and the mouse gut toy low archives, but I also want to test the high or medium archive as well, and now there is the problem. I did download both tarballs [from http://gigadb.org/dataset/100344] and unzip them, but only to get the samples without any other file while there should be also the gsa and binning which are not there in both tarballs. Is it possible to fix this, or is there any other source which contain the correct tarball as download?

This would be a great help and thank you in advance and again I'm sorry if this topic is wrong here!

ImportError: cannot import name 'Markup' from 'jinja2'

Hi there, I ran AMBER some days prior, and since then I updated some packages to run CAMISIM since some of them were incompatible and I cant get it to run anymore.

I get the following error message:

Traceback (most recent call last):
  File "/Users/eparisis/miniconda3/envs/amber/bin/amber.py", line 26, in <module>
    from src import amber_html
  File "/Users/eparisis/miniconda3/envs/amber/lib/python3.7/site-packages/src/amber_html.py", line 41, in <module>
    from bokeh.plotting import figure
  File "/Users/eparisis/miniconda3/envs/amber/lib/python3.7/site-packages/bokeh/plotting/__init__.py", line 2, in <module>
    from ..document import Document; Document
  File "/Users/eparisis/miniconda3/envs/amber/lib/python3.7/site-packages/bokeh/document/__init__.py", line 7, in <module>
    from .document import Document ; Document
  File "/Users/eparisis/miniconda3/envs/amber/lib/python3.7/site-packages/bokeh/document/document.py", line 35, in <module>
    from ..core.templates import FILE
  File "/Users/eparisis/miniconda3/envs/amber/lib/python3.7/site-packages/bokeh/core/templates.py", line 20, in <module>
    from jinja2 import Environment, Markup, FileSystemLoader, PackageLoader
ImportError: cannot import name 'Markup' from 'jinja2' (/Users/eparisis/miniconda3/envs/amber/lib/python3.7/site-packages/jinja2/__init__.py)

I get this same error message running it on Linux and on my local Mac. Installing it from scratch in a new conda env also produces this error.

I've looked up some fixes for jinja2 but nothing worked.

Trying to build the docker image as in the instruction also didn't work:

$ docker build -t amber:latest .
Sending build context to Docker daemon   74.1MB
Step 1/10 : FROM python:3.7-slim
3.7-slim: Pulling from library/python
3f4ca61aafcd: Pull complete 
3f487a3359db: Pull complete 
e87858cc8912: Pull complete 
471900aadde7: Pull complete 
37bdaa58825f: Pull complete 
Digest: sha256:62209b7fcd75e157220c682de6c81e737a3d36a06ce05f449757c7b9ef271f99
Status: Downloaded newer image for python:3.7-slim
 ---> 74e5f3c48333
Step 2/10 : ADD image /usr/local
 ---> efb47449e5cb
Step 3/10 : ADD *.py /usr/local/bin/
 ---> ce875502bb87
Step 4/10 : ADD src /usr/local/bin/src
 ---> c663e9a0e4ef
Step 5/10 : ADD src/utils /usr/local/bin/src/utils
 ---> 2b80cb51a18f
Step 6/10 : ADD requirements /requirements
failed to export image: failed to create image: failed to get layer sha256:3da0f9e1caa5774c47974ea1948ca723ac5a3fad7bebeaeb513002f3ca3cabc4: layer does not exist

The packages versions in the requirements.txt are all installed so I don't know whats wrong.

Evaluating Gold standard encountered RuntimeWarning: overflow

Hello,
AMBER run on my dataset showed a numerical overflow.
evaluating Gold standard (sample marine, genome binning) ~/.local/lib/python3.8/site-packages/src/binning_classes.py:306: RuntimeWarning: overflow encountered in long_scalars return (n * (n - 1)) / 2.0
This comes from the function compute_rand_index in binning_classes.py. What would be the reason for this error?

Thanks.

Best,
Yazhini

Output metrics bp vs seq.

Hi,
could you clarify the difference between the AMBER 2.x seq and bp based metrics? As you explained in #36, Average purity (bp) stems from equation 6 in the original paper. This would then be Average purity (bp) of Quality of bins: all bins have the same weight in the index.html output? Does the seq based metric then mean, the value is based on the number of contigs from the most abundant gold standard genome in each bin? Does that value deviate from the bp metric, because of different contig lenghts and thus a large contig of the "correct" genome will not increase e.g. the seq purity value but the bp one? If so why would we care about that? Wouldn't the bp based measure always be more informative in showing how close bins are to the gold standard?

Best
Oskar

Improve error message(s)

I ran AMBER using a corrupt gold standard file, where _LENGTH was in the header, but the data rows only contained 2 fields. Example:

@Version:0.9.1
@SampleID:gsa

@@SEQUENCEID    BINID    _LENGTH
A    42
B    13

I got a plain IndexError: list index out of range message and therefore suggest that you check that the data rows indeed contain either 2 or 3 fields and print a sensible error message.


For bonus points, I suggest that you check for more and similar edge cases, where better reporting will aid the user in finding the mistake and ultimately a better user experience.

P.S. Eventually, I figured out my mistake (faulty regex removed the BINID field) and AMBER worked like a charm! 👍

CHANGELOG

We should maintain a CHANGELOG.md that always shows the changes made on the code. A user can by reading this file always understand which bugs are solved and which features are added.

VERSION

AMBER should be versioned following the semantic versioning conventions: http://semver.org/
We should create a setup.cfg which states the current release/version.
This information could also be parsed by various scripts and included in a pdf/html/png, ... . The result of AMBER should always state by which release it was produced.
The config is also necessary for issue #5 .

"nan" values in Purity (bp) and Purity (seq)

Hi,

I'm using AMBER to evaluate a set of bins I obtained from a metagenome assembled from the CAMI Toy Mouse Gut Dataset reads. I've noticed that some bins have nan values in the Purity (bp) and Purity (seq) columns. What might be causing that?

To build the gold standard I aligned the reassembled contigs to the original genomes using BLAST, as described in Vamb's paper:

We removed any hits shorter than 500 bp or with lower nucleotide identity than 95%. If a query (reassembled) contig was aligned to multiple reference (original) contigs, we accepted the reference with the longest alignment, if the alignment was more than twice as long of the next longest. If that was not the case for any reference, we accepted the reference with highest nucleotide identity, if the reference was longer than 10 kbp, had an alignment length of at least 90% of the longest-aligning reference, and had at least 0.05% higher nucleotide identity than the second-highest identity reference. If no reference fit those criteria, they were ignored in the benchmarking.

Bin ID Most abundant genome Purity (bp) Completeness (bp) Bin size (bp) True positives (bp) True size of most abundant genome (bp) Purity (seq) Completeness (seq) Bin size (seq) True positives (seq) True size of most abundant genome (seq)
mouse_gut_5.vamb.83 denovo8255.1 0.659 0.983 741631 488937 497562 0.655 0.980 226 148 151
mouse_gut_5.vamb.9 269125.1 0.998 0.977 1809513 1806528 1848507 0.994 0.977 172 171 175
mouse_gut_5.vamb.81 179513.0 1.000 0.963 807379 807379 838686 1.000 0.962 280 280 291
mouse_gut_5.vamb.126 228785.0 1.000 0.960 308660 308660 321520 1.000 0.954 103 103 108
mouse_gut_5.vamb.72 661259.1 1.000 0.940 241230 241230 256496 1.000 0.944 85 85 90
mouse_gut_5.vamb.182 259993.0 1.000 0.922 647949 647949 702765 1.000 0.917 211 211 230
mouse_gut_5.vamb.454 denovo11208.0 0.525 0.919 1065913 559173 608144 0.531 0.895 305 162 181
mouse_gut_5.vamb.11 133719.0 0.760 0.915 534069 405959 443893 0.748 0.888 127 95 107
mouse_gut_5.vamb.111 denovo11993.0 1.000 0.875 636280 636280 727174 1.000 0.882 194 194 220
mouse_gut_5.vamb.793 4471135.0 0.992 0.863 1913155 1898009 2200037 0.990 0.845 583 577 683
mouse_gut_5.vamb.333 denovo12532.0 nan 0.857 202574 199887 233197 nan 0.852 76 75 88
mouse_gut_5.vamb.51 denovo2465.0 nan 0.848 218613 218613 257907 nan 0.863 82 82 95
mouse_gut_5.vamb.115 denovo1032.0 nan 0.816 44891 24716 30297 nan 0.750 11 6 8
mouse_gut_5.vamb.71 denovo10679.0 0.333 0.787 654629 218217 277372 0.341 0.622 82 28 45
mouse_gut_5.vamb.451 denovo11206.0 0.893 0.761 1096925 979172 1287345 0.872 0.718 298 260 362
mouse_gut_5.vamb.428 denovo2609.0 0.998 0.733 1077877 1075782 1468385 0.997 0.717 325 324 452
mouse_gut_5.vamb.1036 263992.0 nan 0.231 66518 41954 181693 nan 0.242 23 15 62

How does mapping work?

Dear developers,
I am trying to understand the evaluation method deeper. How do you get (i) the fraction of base pairs of the genome covered in bins and (ii) overlapping in base pairs between bin and genome without mapping through sequence alignment? The input for gold-standard and final bins has only contig names and lengths but not the coordinates. Your explanation is much appreciated.

Thanks.

Best,
Yazhini

Increase performance of the produced HTML

The html gets quite big and slow.
The size can be reduced by recomputing the plots on demand. This way we don't have to provide all plots when the html is build.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.