Code Monkey home page Code Monkey logo

Comments (8)

Zhihan1996 avatar Zhihan1996 commented on August 26, 2024

Thanks for letting me know! Our data reader regards the first line of .tsv file as a header and automatically skip it, and I forget to add a header for the sample data. That's why one data sample is missed. Please add a random header to the sample file, and it should work fine. I will change the sample data soon.

from dnabert.

mtinti avatar mtinti commented on August 26, 2024

Hi, Thanks for the prompt response!
can you also please check the example code to run the step:

  1. Motif analysis
    I think is truncated at the end, probably missing the --return_idx flag

from dnabert.

jerryji1993 avatar jerryji1993 commented on August 26, 2024

Hi @mtinti, thanks for pointing that out! We have removed the trailing '\' at the end of the command.

For the purpose of running the example scripts, --return_idx flag is not necessary. It is only needed if you explicitly want the indices of those motifs.

from dnabert.

mtinti avatar mtinti commented on August 26, 2024

Hi, thanks again to double check.
I'm getting this error now:

*** Begin motif analysis ***

  • pos_seqs: 0; neg_seqs: 0
    Traceback (most recent call last):
    File "DNABERT/motif/find_motifs.py", line 110, in
    main()
    File "DNABERT/motif/find_motifs.py", line 106, in main
    return_idx = args.return_idx
    File "/content/DNABERT/motif/motif_utils.py", line 475, in motif_analysis
    max_seq_len = len(max(pos_seqs, key=len))
    ValueError: max() arg is an empty sequence

I think that the sequences in dev.tsv are not correctly tabbed separated, am I correct?

from dnabert.

Joseph-Vineland avatar Joseph-Vineland commented on August 26, 2024

I would also like to report this bug. Step 6 Motif finding does not work, even with the provided example data and commands. I really hope this bug can be fixed, becasue DNABERT is working out great so far, I don’t want to be stopped at the last step. Motif analyis converts patterns detected by DNABERT to actionable biological insights.

Did you ever solve this @mtinti ? I'm getting the same error you are now. (pos_seqs: 0; neg_seqs: 0)

Thank you.

from dnabert.

Joseph-Vineland avatar Joseph-Vineland commented on August 26, 2024

The solution to the error posted by @mtinti on 19 Oct 2020 is the following:

I modified the script 'find_motifs.py', and now it runs without error. See below. I also edited the example dev.tsv file so the header is tab-seperated not space-seperated.

However, no motifs are output for the provided example data/commands.

import os
import pandas as pd
import numpy as np
import argparse
import motif_utils as utils


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--data_dir",
        default=None,
        type=str,
        required=True,
        help="The input data dir. Should contain the sequence+label .tsv files (or other data files) for the task.",
    )

    parser.add_argument(
        "--predict_dir",
        default=None,
        type=str,
        required=True,
        help="Path where the attention scores were saved. Should contain both pred_results.npy and atten.npy",
    )

    parser.add_argument(
        "--window_size",
        default=24,
        type=int,
        help="Specified window size to be final motif length",
    )

    parser.add_argument(
        "--min_len",
        default=5,
        type=int,
        help="Specified minimum length threshold for contiguous region",
    )

    parser.add_argument(
        "--pval_cutoff",
        default=0.005,
        type=float,
        help="Cutoff FDR/p-value to declare statistical significance",
    )

    parser.add_argument(
        "--min_n_motif",
        default=3,
        type=int,
        help="Minimum instance inside motif to be filtered",
    )

    parser.add_argument(
        "--align_all_ties",
        action='store_true',
        help="Whether to keep all best alignments when ties encountered",
    )

    parser.add_argument(
        "--save_file_dir",
        default='.',
        type=str,
        help="Path to save outputs",
    )

    parser.add_argument(
        "--verbose",
        action='store_true',
        help="Verbosity controller",
    )

    parser.add_argument(
        "--return_idx",
        action='store_true',
        help="Whether the indices of the motifs are only returned",
    )

    # TODO: add the conditions
    args = parser.parse_args()

    atten_scores = np.load(os.path.join(args.predict_dir,"atten.npy"))
    pred = np.load(os.path.join(args.predict_dir,"pred_results.npy"))
    dev = pd.read_csv(os.path.join(args.data_dir,"dev.tsv"),sep='\t',header=0)
    #dev.columns = ['sequence','label']
    #dev['seq'] = dev['sequence']
    dev['sequence'] = dev['sequence'].apply(utils.kmer2seq)
    dev_pos = dev[dev['label'] == 1]
    dev_neg = dev[dev['label'] == 0]
    #print(dev_pos)
    #print(dev_neg)
    pos_atten_scores = atten_scores[dev_pos.index.values]
    neg_atten_scores = atten_scores[dev_neg.index.values]
    assert len(dev_pos) == len(pos_atten_scores)

    #print(dev_pos['sequence'])


    # run motif analysis
    merged_motif_seqs = utils.motif_analysis(dev_pos['sequence'],
                                        dev_neg['sequence'],
                                        pos_atten_scores,
                                        window_size = args.window_size,
                                        min_len = args.min_len,
                                        pval_cutoff = args.pval_cutoff,
                                        min_n_motif = args.min_n_motif,
                                        align_all_ties = args.align_all_ties,
                                        save_file_dir = args.save_file_dir,
                                        verbose = args.verbose
                                        #return_idx  = args.return_idx
                                    )

if __name__ == "__main__":
    main()

from dnabert.

Joseph-Vineland avatar Joseph-Vineland commented on August 26, 2024

FYI - After editing the script as per above,the motif finding procedure worked on our data and identified motifs. GREAT! Thank you very much for the awesome tool! However, if I set the --max_seq_length to 56 instead of 64 during the (4) prediction and (5) visualization steps, then the following eror is thrown:

python find_motifs.py \
>     --data_dir $DATA_PATH \
>     --predict_dir $PREDICTION_PATH \
>     --window_size 24 \
>     --min_len 5 \
>     --pval_cutoff 0.05 \
>     --min_n_motif 3 \
>     --align_all_ties \
>     --save_file_dir $MOTIF_PATH \
>     --verbose
*** Begin motif analysis ***
* pos_seqs: 17938; neg_seqs: 18061
* Finding high attention motif regions
* Filtering motifs by hypergeometric test
Traceback (most recent call last):
  File "find_motifs.py", line 116, in <module>
    main()
  File "find_motifs.py", line 111, in main
    verbose = args.verbose
  File "/home/joneill/DNABERT/motif/motif_utils.py", line 514, in motif_analysis
    **kwargs)
  File "/home/joneill/DNABERT/motif/motif_utils.py", line 233, in filter_motifs
    pvals = motifs_hypergeom_test(pos_seqs, neg_seqs, motifs, **kwargs)
  File "/home/joneill/DNABERT/motif/motif_utils.py", line 196, in motifs_hypergeom_test
    motif_count_all = count_motif_instances(pos_seqs+neg_seqs, motifs, allow_multi_match=allow_multi_match)
  File "/home/joneill/DNABERT/motif/motif_utils.py", line 152, in count_motif_instances
    matches = sorted(map(itemgetter(1), A.iter(seq)))
AttributeError: Not an Aho-Corasick automaton yet: call add_word to add some keys and call make_automaton to convert the trie to an automaton.

from dnabert.

jerryji1993 avatar jerryji1993 commented on August 26, 2024

Hi,

Thanks very much for reporting. We have recently updated the test data and performed general bug fixes for this motif analysis module. The code should work now and please let me know if it doesn't. Will close this one for now. Feel free to reopen if needed.

Thanks,
Jerry

from dnabert.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.