Code Monkey home page Code Monkey logo

Comments (20)

iqbal-lab avatar iqbal-lab commented on September 21, 2024

OK, about the lex_count. Sorina found when she ran it, she had left==right for a call to lex_count.
When I ran it through gdb, I got

Program received signal SIGSEGV, Segmentation fault.
0x0000000000438e7c in sdsl::rank_support_v5<1u, 1u>::rank (this=0x7fffffffcd60, idx=249406034312) at /home/sm0179/include/sdsl/rank_support_v5.hpp:123
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.149.el6_6.9.x86_64 libgcc-4.4.7-11.el6.x86_64 libstdc++-4.4.7-11.el6.x86_64
(gdb) bt
#0 0x0000000000438e7c in sdsl::rank_support_v5<1u, 1u>::rank (this=0x7fffffffcd60, idx=249406034312) at /home/sm0179/include/sdsl/rank_support_v5.hpp:123
#1 0x0000000000438fba in sdsl::rank_support_v5<1u, 1u>::operator() (this=0x7fffffffcd60, idx=249406034312) at /home/sm0179/include/sdsl/rank_support_v5.hpp:136
#2 0x0000000000407298 in sdsl::wt_intsdsl::int_vector<1u, sdsl::rank_support_v5<1u, 1u>, sdsl::select_support_mcl<1u, 1u>, sdsl::select_support_mcl<0u, 1u> >::lex_count<std::tupl
e<unsigned long, unsigned long, unsigned long> > (this=0x7fffffffcd30, i=249406034312, j=72198331389279224, c=4) at /home/sm0179/include/sdsl/wt_int.hpp:551
#3 0x0000000000406946 in bidir_search (csa=..., left=@0x883bcf0, right=@0x883bcf8, left_rev=@0x883f5f0, right_rev=@0x883f5f8, c=4 '\004') at ./bidir_search.cpp:43
#4 0x0000000000409f54 in bidir_search_bwd (csa=..., left=249406034312, right=72198331389279224, left_rev=140737488343504, right_rev=140737488343504, pat_begin=1 '\001', pat_end=4 '
\004', sa_intervals=std::list = {...}, sa_intervals_rev=std::list = {...}, sites=std::list = {...}, mask_a=std::vector of length 3141174, capacity 4194304 = {...}, maxx=185, first_d
el=@0x7fffffffd1ef) at ./bidir_search_bwd.cpp:143
#5 0x0000000000440b61 in main (argc=12, argv=0x7fffffffd538) at ./map.cpp:101

and

(gdb) frame 3
#3 0x0000000000406946 in bidir_search (csa=..., left=@0x883bcf0, right=@0x883bcf8, left_rev=@0x883f5f0, right_rev=@0x883f5f8, c=4 '\004') at ./bidir_search.cpp:43
(gdb) p left
$1 = (uint64_t &) @0x883bcf0: 249406034312
(gdb) p right
$2 = (uint64_t &) @0x883bcf8: 72198331389279224
(gdb) frame 4
#4 0x0000000000409f54 in bidir_search_bwd (csa=..., left=249406034312, right=72198331389279224, left_rev=140737488343504, right_rev=140737488343504, pat_begin=1 '\001', pat_end=4 '
\004', sa_intervals=std::list = {...}, sa_intervals_rev=std::list = {...}, sites=std::list = {...}, mask_a=std::vector of length 3141174, capacity 4194304 = {...}, maxx=185, first_d
el=@0x7fffffffd1ef) at ./bidir_search_bwd.cpp:143
(gdb) p (_it).first
$3 = 249406034312
(gdb) p (_it.second)
$4 = 72198331389279224
(gdb) p (*it_rev).first
$5 = 140737488343504
(gdb) p sa_intervals
$6 = std::list = {
[0] = {first = 249406034312, second = 72198331389279224}

(gdb) frame 5
#5 0x0000000000440b61 in main (argc=12, argv=0x7fffffffd538) at ./map.cpp:101
(gdb) p (*it)
$7 = (std::pair<unsigned long, unsigned long> &) @0x7fffffffd1e0: {first = 249406034312, second = 72198331389279224}

So what the hell?

Where are these values coming from? From sa_intervals. Where is this set?

if (kmer_idx.find(kmer)!=kmer_idx.end() && kmer_idx_rev.find(kmer)!=kmer_idx_rev.end() && kmer_sites.find(kmer)!=kmer_sites.end()) {
sa_intervals=kmer_idx[kmer];
sa_intervals_rev=kmer_idx_rev[kmer];
sites=kmer_sites[kmer];

              it=sa_intervals.begin();
              it_rev=sa_intervals_rev.begin();

Ah. So - are we reading stuff in right from the precalc?

from gramtools.

iqbal-lab avatar iqbal-lab commented on September 21, 2024

Assign to @coelias briefly, so he sees this

from gramtools.

iqbal-lab avatar iqbal-lab commented on September 21, 2024

Also for the record this happens after the 84th read is processed ok:

vBWT /data3/projects/pf3k/pilot_paper_analysis/prg-constr/new_format/Chr10_div_in_msp34_prg_letter_added_between_adj_sites.in csa_file /data3/projects/pf3k/pilot_paper_analysis/prg-constr/simulate_reads_from_alt_ref/sim_reads_from_alt.fq /data3/projects/pf3k/pilot_paper_analysis/prg-constr/new_format/Mask_sites /data3/projects/pf3k/pilot_paper_analysis/prg-constr/new_format/Mask_alleles Coverages Reads_processed int_alphabet_file csa_memory 9 /data3/projects/gramtools/wabi/precalc-kmers/kmersandrev


Tue Apr 26 00:14:54 2016
CSA construction
PRG size: 3413540
Alphabet size: 185


Tue Apr 26 00:15:38 2016
Reading KMERS...
DONE!


Tue Apr 26 00:15:43 2016
completed zam 0
completed zam 1
completed zam 2
completed zam 3
...
completed zam 83
completed zam 84
Segmentation fault (core dumped)

from gramtools.

iqbal-lab avatar iqbal-lab commented on September 21, 2024

Here;s that read

completed zam 84
read is ATTTATATATTTATCATGTAAAATATATTATTATTCATACATAAAATTTATATGAAATGTATGTACAAAATTTATATTATTTCTCCCTTATTACAGGTCG
Segmentation fault (core dumped)
bash-4.1$

from gramtools.

iqbal-lab avatar iqbal-lab commented on September 21, 2024

When I

  1. Move that 85th read to the top of the fastq
  2. rerun
    It still segfaults during the (new) 85th read

from gramtools.

iqbal-lab avatar iqbal-lab commented on September 21, 2024

If I step into the iteration of the loop that will segfault, i have

read is ATTTATATATTTATCATGTAAAATATATTATTATTCATACATAAAATTTATATGAAATGTATGTACAAAATTTATATTATTTCTCCCTTATTACAGGTCG

(gdb) p kmer_idx[kmer]
$9 = empty std::list

here in map.cpp

              if (kmer_idx.find(kmer)!=kmer_idx.end() && kmer_idx_rev.find(kmer)!=kmer_idx_rev.end() && kmer_sites.find(kmer)!=kmer_sites.end()) {
                sa_intervals=kmer_idx[kmer]; <<<<<

That should never happen. What is the kmer in question

p kmer
$10 = std::vector of length 9, capacity 9 = {4 '\004', 1 '\001', 2 '\002', 1 '\001', 3 '\003', 3 '\003', 4 '\004', 2 '\002', 3 '\003'}

ie TACAGGTCG

ie the last kmer in the read....

Are we failing to store the sa_intervals for last kmer in read?

from gramtools.

sm0179 avatar sm0179 commented on September 21, 2024

Ok so I guess that’s the problem. That kier should never be in the hash if its list is empty. Need to check precalc function

From: Zamin Iqbal <[email protected]mailto:[email protected]>
Reply-To: iqbal-lab/gramtools <[email protected]mailto:[email protected]>
Date: Tuesday, 26 April 2016 at 00:49
To: iqbal-lab/gramtools <[email protected]mailto:[email protected]>
Subject: Re: [iqbal-lab/gramtools] valgrind issues in parse_masks (#8)

If I step into the iteration of the loop that will segfault, i have

(gdb) p kmer_idx[kmer]
$9 = empty std::list

here in map.cpp

          if (kmer_idx.find(kmer)!=kmer_idx.end() && kmer_idx_rev.find(kmer)!=kmer_idx_rev.end() && kmer_sites.find(kmer)!=kmer_sites.end()) {
            sa_intervals=kmer_idx[kmer]; <<<<<

That should never happen


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHubhttps://github.com//issues/8#issuecomment-214565524

from gramtools.

sm0179 avatar sm0179 commented on September 21, 2024

Right so we need to put a condition in precalc to remove the khmers with empty values from the hash table.

From: Zamin Iqbal <[email protected]mailto:[email protected]>
Reply-To: iqbal-lab/gramtools <[email protected]mailto:[email protected]>
Date: Tuesday, 26 April 2016 at 00:49
To: iqbal-lab/gramtools <[email protected]mailto:[email protected]>
Subject: Re: [iqbal-lab/gramtools] valgrind issues in parse_masks (#8)

If I step into the iteration of the loop that will segfault, i have

(gdb) p kmer_idx[kmer]
$9 = empty std::list

here in map.cpp

          if (kmer_idx.find(kmer)!=kmer_idx.end() && kmer_idx_rev.find(kmer)!=kmer_idx_rev.end() && kmer_sites.find(kmer)!=kmer_sites.end()) {
            sa_intervals=kmer_idx[kmer]; <<<<<

That should never happen


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHubhttps://github.com//issues/8#issuecomment-214565524

from gramtools.

iqbal-lab avatar iqbal-lab commented on September 21, 2024

I've just deleted my last two comments as they were confusing. So we are finding the following

  1. Carlos' script makes a list of kmers in the graph
  2. We store kmer->interval in the hash. Since all these kmers come from the graph, they must have nonzero intervals
  3. Some time later, we query the hash and get a zero-length interval from a kmer.

This can happen because
a) carlos' script accidentally puts in a wrong kmer into the list
b) precalc puts in a wrong kmer
c) precalc correctly finds a non-empty interval for a kmer, but somehow we fail to push it onto the list - not impossible as there are a bunch of if's around there
d) precalc does everything right, but somehow the memory gets corrupted.

I'm digging

from gramtools.

iqbal-lab avatar iqbal-lab commented on September 21, 2024

OK, have found that the reason we were not hitting Sorina's assert()'s is that NDEBUG was set to 1.
Once we remove that, we get this

vBWT /data3/projects/pf3k/pilot_paper_analysis/prg-constr/new_format/Chr10_div_in_msp34_prg_letter_added_between_adj_sites.in csa_file zorro /data3/projects/pf3k/pilot_paper_analysis/prg-constr/new_format/Mask_sites /data3/projects/pf3k/pilot_paper_analysis/prg-constr/new_format/Mask_alleles Coverages Reads_processed int_alphabet_file csa_memory 9 bof


Tue Apr 26 11:04:47 2016
CSA construction
PRG size: 3413540
Alphabet size: 185


Tue Apr 26 11:05:20 2016
Reading KMERS...
DONE!


Tue Apr 26 11:05:25 2016
vBWT: ./bidir_search.cpp:70: uint64_t bidir_search(sdsl::csa_wtsdsl::wt_int<sdsl::int_vector<1u, sdsl::rank_support_v5<> >, 2u, 2u>&, uint64_t&, uint64_t&, uint64_t&, uint64_t&, uint8_t): Assertion `right_rev-left_rev == right-left' failed.
Aborted (core dumped)
bash-4.1$

Will run through gdb

from gramtools.

iqbal-lab avatar iqbal-lab commented on September 21, 2024

(gdb) p left_rev
$4 = (uint64_t &) @0x12b8ce0: 3393726
(gdb) p right_rev
$5 = (uint64_t &) @0x12b8ce8: 3406648
(gdb) p right
$6 = (uint64_t &) @0x88455f8: 3409822
(gdb) p left
$7 = (uint64_t &) @0x88455f0: 3396910
(gdb) p right-left
$8 = 12912
(gdb) p right_rev-left_rev
$9 = 12922
(gdb)

from gramtools.

sm0179 avatar sm0179 commented on September 21, 2024

Oh god, I know what the problem is. Search doesn’t cope with adjacent numbers, so K-mers overlapping adjacent numbers won’t map and will have an empty interval. Quick fix would be to put a condition in precalc, after biddersearchbwd is called, to remove the kmer from hash is the list is empty. When the kmer file is correct and the PRG is correct, this condition will never be applied. But Probably good to have it anyway, in case there are wrong kmers in the file

From: Zamin Iqbal <[email protected]mailto:[email protected]>
Reply-To: iqbal-lab/gramtools <[email protected]mailto:[email protected]>
Date: Tuesday, 26 April 2016 at 10:13
To: iqbal-lab/gramtools <[email protected]mailto:[email protected]>
Cc: Sorina Maciuca <[email protected]mailto:[email protected]>, Comment <[email protected]mailto:[email protected]>
Subject: Re: [iqbal-lab/gramtools] valgrind issues in parse_masks (#8)

I've just deleted my last two comments as they were confusing. So we are finding the following

  1. Carlos' script makes a list of kmers in the graph
  2. We store kmer->interval in the hash. Since all these kmers come from the graph, they must have nonzero intervals
  3. Some time later, we query the hash and get a zero-length interval from a kmer.

This can happen because
a) carlos' script accidentally puts in a wrong kmer into the list
b) precalc puts in a wrong kmer
c) precalc correctly finds a non-empty interval for a kmer, but somehow we fail to push it onto the list - not impossible as there are a bunch of if's around there
d) precalc does everything right, but somehow the memory gets corrupted.


You are receiving this because you commented.
Reply to this email directly or view it on GitHubhttps://github.com//issues/8#issuecomment-214679064

from gramtools.

iqbal-lab avatar iqbal-lab commented on September 21, 2024

I would never have figured that out!

from gramtools.

iqbal-lab avatar iqbal-lab commented on September 21, 2024

OK, tried this fix in precalc_kmer_matches.cpp

            kmer_idx[kmer]=temp;
            kmer_idx_rev[kmer]=temp;
            kmer_sites[kmer]=temp2;
            first_del=false;
            std::vector<uint8_t>::iterator res_it=bidir_search_bwd(csa,0,csa.size(),0,csa.size(),(kmer).begin(),(kmer).end(),kmer_idx[kmer],kmer_idx_rev[kmer],kmer_sites[kmer],m\

ask_a,maxx,first_del);
if ( !((kmer_idx[kmer]).empty()) && !((kmer_idx_rev[kmer]).empty()) ) <<<<<<<<<
{
if (!first_del) kmers_in_ref.insert(kmer);
}

This fixed that problem, and we now go a bit further before hitting this assert just after the 84th read (as above):

completed zam 84
read is ATTTATATATTTATCATGTAAAATATATTATTATTCATACATAAAATTTATATGAAATGTATGTACAAAATTTATATTATTTCTCCCTTATTACAGGTCG
vBWT: ./bidir_search_bwd.cpp:32: std::vector::iterator bidir_search_bwd(sdsl::csa_wtsdsl::wt_int<sdsl::int_vector<1u, sdsl::rank_support_v5<> >, 2u, 2u>&, uint64_t, uint64_t, uint64_t, uint64_t, std::vector::iterator, std::vector::iterator, std::list<std::pair<long unsigned int, long unsigned int> >&, std::list<std::pair<long unsigned int, long unsigned int> >&, std::list<std::vector<std::pair<unsigned int, std::vector > > >&, std::vector&, uint64_t, bool&): Assertion `right<=csa.size()' failed.
Aborted (core dumped)

from gramtools.

iqbal-lab avatar iqbal-lab commented on September 21, 2024

Run in gdb:

(gdb) bt
#0 0x0000003a11e32625 in raise () from /lib64/libc.so.6
#1 0x0000003a11e33e05 in abort () from /lib64/libc.so.6
#2 0x0000003a11e2b74e in __assert_fail_base () from /lib64/libc.so.6
#3 0x0000003a11e2b810 in __assert_fail () from /lib64/libc.so.6
#4 0x00000000004099bc in bidir_search_bwd (csa=..., left=249406034312, right=72198331389279224, left_rev=140737488343216, right_rev=140737488343216, pat_begin=1 '\001', pat_end=4 '
\004', sa_intervals=empty std::list, sa_intervals_rev=empty std::list, sites=empty std::list, mask_a=std::vector of length 3141174, capacity 4194304 = {...}, maxx=185, first_del=@0x
7fffffffd0cf: true) at ./bidir_search_bwd.cpp:32
#5 0x0000000000441d70 in main (argc=12, argv=0x7fffffffd418) at ./map.cpp:111
(gdb) frame 4
#4 0x00000000004099bc in bidir_search_bwd (csa=..., left=249406034312, right=72198331389279224, left_rev=140737488343216, right_rev=140737488343216, pat_begin=1 '\001', pat_end=4 '
\004', sa_intervals=empty std::list, sa_intervals_rev=empty std::list, sites=empty std::list, mask_a=std::vector of length 3141174, capacity 4194304 = {...}, maxx=185, first_del=@0x
7fffffffd0cf: true) at ./bidir_search_bwd.cpp:32
(gdb) p right
$1 = 72198331389279224

from gramtools.

sm0179 avatar sm0179 commented on September 21, 2024

????????
What does <<<<<<<<<<<< mean after the if

From: Zamin Iqbal <[email protected]mailto:[email protected]>
Reply-To: iqbal-lab/gramtools <[email protected]mailto:[email protected]>
Date: Tuesday, 26 April 2016 at 11:47
To: iqbal-lab/gramtools <[email protected]mailto:[email protected]>
Cc: Sorina Maciuca <[email protected]mailto:[email protected]>, Comment <[email protected]mailto:[email protected]>
Subject: Re: [iqbal-lab/gramtools] valgrind issues in parse_masks (#8)

OK, tried this fix in precalc_kmer_matches.cpp

        kmer_idx[kmer]=temp;
        kmer_idx_rev[kmer]=temp;
        kmer_sites[kmer]=temp2;
        first_del=false;
        std::vector<uint8_t>::iterator res_it=bidir_search_bwd(csa,0,csa.size(),0,csa.size(),(kmer).begin(),(kmer).end(),kmer_idx[kmer],kmer_idx_rev[kmer],kmer_sites[kmer],m\

ask_a,maxx,first_del);
if ( !((kmer_idx[kmer]).empty()) && !((kmer_idx_rev[kmer]).empty()) ) <<<<<<<<<
{
if (!first_del) kmers_in_ref.insert(kmer);
}

This fixed that problem, and we now go a bit further before hitting this assert just after the 84th read (as above):

completed zam 84
read is ATTTATATATTTATCATGTAAAATATATTATTATTCATACATAAAATTTATATGAAATGTATGTACAAAATTTATATTATTTCTCCCTTATTACAGGTCG
vBWT: ./bidir_search_bwd.cpp:32: std::vector::iterator bidir_search_bwd(sdsl::csa_wtsdsl::wt_int<sdsl::int_vector<1u, sdsl::rank_support_v5<> >, 2u, 2u>&, uint64_t, uint64_t, uint64_t, uint64_t, std::vector::iterator, std::vector::iterator, std::list >&, std::list >&, std::list > > >&, std::vector&, uint64_t, bool&): Assertion `right<=csa.size()' failed.
Aborted (core dumped)


You are receiving this because you commented.
Reply to this email directly or view it on GitHubhttps://github.com//issues/8#issuecomment-214699838

from gramtools.

iqbal-lab avatar iqbal-lab commented on September 21, 2024

(gdb) frame 5
#5 0x0000000000441d70 in main (argc=12, argv=0x7fffffffd418) at ./map.cpp:111
(gdb) p (*it)
$2 = (std::pair<unsigned long, unsigned long> &) @0x7fffffffd0c0: {first = 249406034312, second = 72198331389279224}
(gdb) p sa_intervals
$3 = empty std::list
(gdb) p it
$4 = {first = 249406034312, second = 72198331389279224}

What happens if you take an empty list and call .begin()? - looks like we have garbage in the iterator

from gramtools.

iqbal-lab avatar iqbal-lab commented on September 21, 2024

I've just rerun after modifying/fixing my fix so it erases kmers from the kmerhash if they have zero interval. I still end up in the above assert

from gramtools.

iqbal-lab avatar iqbal-lab commented on September 21, 2024

The <<<< is just meant to draw your attention to that line - its not there in the real code

from gramtools.

ffranr avatar ffranr commented on September 21, 2024

I think this issue has been solved.

from gramtools.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.