Definitions: "new method" = use a very large k-mer size, put in ternary search tri

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Multiple k-mer sizes confirmation and testing about cmash HOT 2 OPEN

dkoslicki commented on July 21, 2024

Multiple k-mer sizes confirmation and testing

from cmash.

Comments (2)

dkoslicki commented on July 21, 2024

@ShaopengLiu1 #19 should be addressed now. I am not closing #19 or #2 until we have a better testing environment spun up, as all tests I have done are locally (very not optimal).

from cmash.

dkoslicki commented on July 21, 2024

@ShaopengLiu1 just a note: I added a class that will now compute the absolute ground truth containment indicies. Recall that the last column of StreamingQueryDNADatabase.py is still an estimate of the containment index (just using un-truncated k-mers). The class to compute the ground truth is at /CMash/CMash/GroundTruth.py. You can see in this comment how the results by the tests/script_tests/./run_small_tests.sh correspond quite nicely with the ground truth values.

If you would like to utilize this ground truth class, I strongly suggest you use my personal server (ping me if you forgot the IP address and login info) as it takes quite a bit of time and memory to brute-force calculate all the k-mers and their reverse complements.

To interact with the class, you can do something like:

import CMash.GroundTruth as G
training_database_file = "<snip>/TrainingDatabase.h5"
query_file = "<snip>/taxid_1192839_4_genomic.fna.gz"
g = G.TrueContainment(training_database_file=training_database_file, k_sizes="4-6-1")  # this step will take a long time if the k_sizes are realistically large 
df = g.return_containment_data_frame(query_file=query_file1, location_of_thresh=-1, coverage_threshold=.1)

Note that the query_file need not be in the TrainingDatabase.h5 (as its k-mers will still be enumerated if it's not in the training database).

from cmash.

Recommend Projects

Multiple k-mer sizes confirmation and testing about cmash HOT 2 OPEN

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent