Code Monkey home page Code Monkey logo

urmap's Introduction

urmap's People

Contributors

rcedgar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

urmap's Issues

Bad fastq file

00:00 84Mb 100% Reading index wp2740.bin.37.ufi
00:05 6.2Gb 0.71% Mapping (paired) WP2740_1.fastq

urmap -map2 WP2740_1.fastq -reverse WP2740_2.fastq -ufi wp2740.bin.37.ufi -samout wp2740.bin.37.sam -threads 80
Elapsed time 00:05
Max memory 6.2Gb

---Fatal error---
Bad FASTQ record: 150 bases, 134 quals line 2831660 file WP2740_2.fastq label A00199:323:HJMY7DSXX:2:1107:9426:25113 2:N:0:AACCAGAG+TCTTTCCC

Compile error at myutils.cpp:1403:27

/usr/local/opt/llvm/bin/clang -fopenmp -msse -mfpmath=sse -O3 -DNDEBUG -c -o o/myutils.o myutils.cpp
myutils.cpp:928:2: error: 'va_start' cannot be used in a captured statement
va_start(ArgList, Format);

/usr/local/Cellar/llvm/10.0.1/lib/clang/10.0.1/include/stdarg.h:17:29: note:
expanded from macro 'va_start'
#define va_start(ap, param) __builtin_va_start(ap, param)

myutils.cpp:1403:27: warning: format specifies type 'unsigned long long' but the
argument has type 'uint64' (aka 'unsigned long') [-Wformat]
sprintf(Tmp, "%" PRIu64, i);

bug in ExtendPen (invalid memory access, reading past the end of DBSeq)

I have found a memory access violation in ExtendPen resulting in a segfault. I tried to fix this by adding a bounds-checking loop condition, which resolves the segfault. I do not understand all of urmap's code sufficiently to determine if the proposed fix is the best possible fix.

Please feel free to use the proposed fix or fix it in whatever way you deem appropriate. Thanks.

Original code:

urmap/src/extendpen.cpp

Lines 29 to 32 in 836dd6f

for (int QPos = EndPos + 1; QPos < int(QL); ++QPos)
{
byte q = QSeq[QPos];
byte t = DBSeq[QPos];

The failing code is byte t = DBSeq[QPos] which reads after the end of DBSeq.

...

Proposed fix to the loop:

for (int QPos = EndPos + 1; (QPos < int(QL)) && (QPos + DBLo < m_UFI->m_SeqDataSize); ++QPos)

Another possible fix is to add the following check before the loop:

if (QL + DBLo > m_UFI->m_SeqDataSize)
{
  return -1;
}

This bug might also explain the previously reported segfault issue.

00:32 22020Gb 8 threads
00:32 22020Gb Mapping ../181025_I600_CL100097983_L1_PL1809120059-532_1.fq
00:35 22020Gb 2.0% Mapping unpaired  181025_I600_CL100097983_L1_PL1809120059-532_1.fq=================================================================
==23064==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7f6d52208dc1 at pc 0x555dbb18914d bp 0x7f6c998cc840 sp 0x7f6c998cc830
READ of size 1 at 0x7f6d52208dc1 thread T2
    #0 0x555dbb18914c in State1::ExtendPen(unsigned int, unsigned int, bool) /home/john/orig/urmap/src/extendpen.cpp:32
    #1 0x555dbb1a41b8 in State1::Search_Lo() /home/john/orig/urmap/src/search1m6.cpp:81
    #2 0x555dbb1a1cb7 in State1::Search(SeqInfo*) /home/john/orig/urmap/src/search1.cpp:21
    #3 0x555dbb184a8e in MapThread /home/john/orig/urmap/src/map.cpp:22
    #4 0x555dbb18552a in cmd_map() [clone ._omp_fn.0] /home/john/orig/urmap/src/map.cpp:60
    #5 0x7f739eb4d78d  (/lib/x86_64-linux-gnu/libgomp.so.1+0x1a78d)
    #6 0x7f739eafe608 in start_thread /build/glibc-eX1tMB/glibc-2.31/nptl/pthread_create.c:477
    #7 0x7f739ea23292 in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x122292)

0x7f6d52208dc1 is located 0 bytes to the right of 3088287169-byte region [0x7f6c9a0d0800,0x7f6d52208dc1)
allocated by thread T0 here:
    #0 0x7f739efb3bc8 in malloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dbc8)
    #1 0x555dbb19f74e in UFIndex::FromFile(_IO_FILE*) /home/john/orig/urmap/src/ufindexio.cpp:108
    #2 0x555dbb19ecbd in UFIndex::FromFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /home/john/orig/urmap/src/ufindexio.cpp:55
    #3 0x555dbb184e00 in cmd_map() /home/john/orig/urmap/src/map.cpp:43
    #4 0x555dbb1bd41b in main /home/john/orig/urmap/src/cmds.h:12
    #5 0x7f739e9280b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)

Thread T2 created by T0 here:
    #0 0x7f739eee0805 in pthread_create (/lib/x86_64-linux-gnu/libasan.so.5+0x3a805)
    #1 0x7f739eb4ddea  (/lib/x86_64-linux-gnu/libgomp.so.1+0x1adea)
    #2 0x7f739eb458e0 in GOMP_parallel (/lib/x86_64-linux-gnu/libgomp.so.1+0x128e0)
    #3 0x7fffd646bcdf  ([stack]+0x1ecdf)
    #4 0x7fffd646be00  ([stack]+0x1ee00)

SUMMARY: AddressSanitizer: heap-buffer-overflow /home/john/orig/urmap/src/extendpen.cpp:32 in State1::ExtendPen(unsigned int, unsigned int, bool)
Shadow bytes around the buggy address:
  0x0fee2a439160: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0fee2a439170: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0fee2a439180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0fee2a439190: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0fee2a4391a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0fee2a4391b0: 00 00 00 00 00 00 00 00[01]fa fa fa fa fa fa fa
  0x0fee2a4391c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0fee2a4391d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0fee2a4391e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0fee2a4391f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0fee2a439200: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==23064==ABORTING

stdin for -map not working due to unnecessary optimizations

uint32 ReadStdioFile_NoFail(FILE *f, void *Buffer, uint32 Bytes)

This function appears to avoid using fread(Buffer, 1, Bytes, f) in favor of using fread(Buffer, Bytes, 1, f). Doing it this way means that the number of bytes read is not returned; the function calculates this using the relative difference of file position via ftello(). However:

  1. There is no performance benefit to doing it this way in any environment that I know of.
  2. ftello() will not work on a stream such as STDIN.

I therefore recommend something like:

uint32 ReadStdioFile_NoFail(FILE *f, void *Buffer, uint32 Bytes)
        {
        asserta(f != 0);
        size_t ElementsRead = fread(Buffer, 1, Bytes, f);
        uint32 BytesRead = uint32(ElementsRead);
        IncIO(f, BytesRead);
        return BytesRead;
        }

There are many places where this "optimization" pattern is used and I defer to your judgement to address this issue. Thanks.

Complie error and solution on RHEL 6

When compiling from source on a RHEL 6.7 (Santiago), it generate an error:
makebitvec.cpp:82:18: error: expected ‘)’ before ‘PRIu64’
ProgressLog("%" PRIu64 " bits (%s)\n", BitCount, Int64ToStr(BitCount));
^~~~~~
make: *** [o/makebitvec.o] Error 1

It can be solved by adding:
#define __STDC_FORMAT_MACROS

before:

#include "Utils.h"

in myutils.h file.

Solution is found here:

https://stackoverflow.com/questions/14535556/why-doesnt-priu64-work-in-this-code

Thanks,

Jianshu

Possible to output NM aux tag?

Hi,

I'm the maintainer of CoverM, which thresholds SAM alignments using the NM edit distance, used by BWA and minimap2. Is it possible to add this as a tag in the output sam for urmap somehow?

Thanks, ben

Multithreading

Hi @rcedgar,

I ran URMAP in my benchmark on short-read aligners and noticed that, while it is very fast, it has low CPU usage (between 10% and 90%) regardless of -threads N with N=1 4, 8, or 16. I ran URMAP as follows:

urmap -map2 reads1.fq -reverse reads2.fq -ufi index.ufi -samout outfile.sam -threads N  2>&1 | tee time_and_mem.xt 

I also copied reads and index to the local compute node before aligning to minimize disk latency.

Some details in this twitter thread: https://twitter.com/krsahlin/status/1469327435103621127

Do you have an idea of why that is?

Also, I noticed that URMAP fails with a segfault on all my instances with read lengths of 2*300nt PE reads.

Best,
Kristoffer

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.