Hi, tanks for this package! I was replicating your example and I got an error

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

The solution to the error posted by <a class="user-mention notranslate" data-hovercard

FYI - After editing the as per above,the motif finding procedure worked on our

shape of atten_scores about dnabert HOT 8 CLOSED

jerryji1993 commented on August 26, 2024

shape of atten_scores

from dnabert.

Comments (8)

Zhihan1996 commented on August 26, 2024

Thanks for letting me know! Our data reader regards the first line of .tsv file as a header and automatically skip it, and I forget to add a header for the sample data. That's why one data sample is missed. Please add a random header to the sample file, and it should work fine. I will change the sample data soon.

from dnabert.

mtinti commented on August 26, 2024

Hi, Thanks for the prompt response!
can you also please check the example code to run the step:

Motif analysis
I think is truncated at the end, probably missing the --return_idx flag

from dnabert.

jerryji1993 commented on August 26, 2024

Hi @mtinti, thanks for pointing that out! We have removed the trailing '\' at the end of the command.

For the purpose of running the example scripts, --return_idx flag is not necessary. It is only needed if you explicitly want the indices of those motifs.

from dnabert.

mtinti commented on August 26, 2024

Hi, thanks again to double check.
I'm getting this error now:

*** Begin motif analysis ***

pos_seqs: 0; neg_seqs: 0
Traceback (most recent call last):
File "DNABERT/motif/find_motifs.py", line 110, in
main()
File "DNABERT/motif/find_motifs.py", line 106, in main
return_idx = args.return_idx
File "/content/DNABERT/motif/motif_utils.py", line 475, in motif_analysis
max_seq_len = len(max(pos_seqs, key=len))
ValueError: max() arg is an empty sequence

I think that the sequences in dev.tsv are not correctly tabbed separated, am I correct?

from dnabert.

Joseph-Vineland commented on August 26, 2024

I would also like to report this bug. Step 6 Motif finding does not work, even with the provided example data and commands. I really hope this bug can be fixed, becasue DNABERT is working out great so far, I don’t want to be stopped at the last step. Motif analyis converts patterns detected by DNABERT to actionable biological insights.

Did you ever solve this @mtinti ? I'm getting the same error you are now. (pos_seqs: 0; neg_seqs: 0)

Thank you.

from dnabert.

Joseph-Vineland commented on August 26, 2024

The solution to the error posted by @mtinti on 19 Oct 2020 is the following:

I modified the script 'find_motifs.py', and now it runs without error. See below. I also edited the example dev.tsv file so the header is tab-seperated not space-seperated.

However, no motifs are output for the provided example data/commands.

import os
import pandas as pd
import numpy as np
import argparse
import motif_utils as utils


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--data_dir",
        default=None,
        type=str,
        required=True,
        help="The input data dir. Should contain the sequence+label .tsv files (or other data files) for the task.",
    )

    parser.add_argument(
        "--predict_dir",
        default=None,
        type=str,
        required=True,
        help="Path where the attention scores were saved. Should contain both pred_results.npy and atten.npy",
    )

    parser.add_argument(
        "--window_size",
        default=24,
        type=int,
        help="Specified window size to be final motif length",
    )

    parser.add_argument(
        "--min_len",
        default=5,
        type=int,
        help="Specified minimum length threshold for contiguous region",
    )

    parser.add_argument(
        "--pval_cutoff",
        default=0.005,
        type=float,
        help="Cutoff FDR/p-value to declare statistical significance",
    )

    parser.add_argument(
        "--min_n_motif",
        default=3,
        type=int,
        help="Minimum instance inside motif to be filtered",
    )

    parser.add_argument(
        "--align_all_ties",
        action='store_true',
        help="Whether to keep all best alignments when ties encountered",
    )

    parser.add_argument(
        "--save_file_dir",
        default='.',
        type=str,
        help="Path to save outputs",
    )

    parser.add_argument(
        "--verbose",
        action='store_true',
        help="Verbosity controller",
    )

    parser.add_argument(
        "--return_idx",
        action='store_true',
        help="Whether the indices of the motifs are only returned",
    )

    # TODO: add the conditions
    args = parser.parse_args()

    atten_scores = np.load(os.path.join(args.predict_dir,"atten.npy"))
    pred = np.load(os.path.join(args.predict_dir,"pred_results.npy"))
    dev = pd.read_csv(os.path.join(args.data_dir,"dev.tsv"),sep='\t',header=0)
    #dev.columns = ['sequence','label']
    #dev['seq'] = dev['sequence']
    dev['sequence'] = dev['sequence'].apply(utils.kmer2seq)
    dev_pos = dev[dev['label'] == 1]
    dev_neg = dev[dev['label'] == 0]
    #print(dev_pos)
    #print(dev_neg)
    pos_atten_scores = atten_scores[dev_pos.index.values]
    neg_atten_scores = atten_scores[dev_neg.index.values]
    assert len(dev_pos) == len(pos_atten_scores)

    #print(dev_pos['sequence'])


    # run motif analysis
    merged_motif_seqs = utils.motif_analysis(dev_pos['sequence'],
                                        dev_neg['sequence'],
                                        pos_atten_scores,
                                        window_size = args.window_size,
                                        min_len = args.min_len,
                                        pval_cutoff = args.pval_cutoff,
                                        min_n_motif = args.min_n_motif,
                                        align_all_ties = args.align_all_ties,
                                        save_file_dir = args.save_file_dir,
                                        verbose = args.verbose
                                        #return_idx  = args.return_idx
                                    )

if __name__ == "__main__":
    main()

from dnabert.

Joseph-Vineland commented on August 26, 2024

FYI - After editing the script as per above,the motif finding procedure worked on our data and identified motifs. GREAT! Thank you very much for the awesome tool! However, if I set the --max_seq_length to 56 instead of 64 during the (4) prediction and (5) visualization steps, then the following eror is thrown:

python find_motifs.py \
>     --data_dir $DATA_PATH \
>     --predict_dir $PREDICTION_PATH \
>     --window_size 24 \
>     --min_len 5 \
>     --pval_cutoff 0.05 \
>     --min_n_motif 3 \
>     --align_all_ties \
>     --save_file_dir $MOTIF_PATH \
>     --verbose
*** Begin motif analysis ***
* pos_seqs: 17938; neg_seqs: 18061
* Finding high attention motif regions
* Filtering motifs by hypergeometric test
Traceback (most recent call last):
  File "find_motifs.py", line 116, in <module>
    main()
  File "find_motifs.py", line 111, in main
    verbose = args.verbose
  File "/home/joneill/DNABERT/motif/motif_utils.py", line 514, in motif_analysis
    **kwargs)
  File "/home/joneill/DNABERT/motif/motif_utils.py", line 233, in filter_motifs
    pvals = motifs_hypergeom_test(pos_seqs, neg_seqs, motifs, **kwargs)
  File "/home/joneill/DNABERT/motif/motif_utils.py", line 196, in motifs_hypergeom_test
    motif_count_all = count_motif_instances(pos_seqs+neg_seqs, motifs, allow_multi_match=allow_multi_match)
  File "/home/joneill/DNABERT/motif/motif_utils.py", line 152, in count_motif_instances
    matches = sorted(map(itemgetter(1), A.iter(seq)))
AttributeError: Not an Aho-Corasick automaton yet: call add_word to add some keys and call make_automaton to convert the trie to an automaton.

from dnabert.

jerryji1993 commented on August 26, 2024

Hi,

Thanks very much for reporting. We have recently updated the test data and performed general bug fixes for this motif analysis module. The code should work now and please let me know if it doesn't. Will close this one for now. Feel free to reopen if needed.

Thanks,
Jerry

from dnabert.

shape of atten_scores about dnabert HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent