yfukasawa / longqc Goto Github PK

LongQC is a tool for the data quality control of the PacBio and ONT long reads.

License: MIT License

Python 14.22% Makefile 0.91% C 34.08% Jupyter Notebook 22.16% HTML 28.53% Dockerfile 0.10%

longqc's Issues

AttributeError: 'Rectangle' object has no property 'normed'

Hello. I just tried your program and after increasing number for (-p) was able to get further but now have new error. My input file is demultiplexed with qcat, concatenated into one fastq. Any ideas? Looking forward to using this software!

python ~/programs/LongQC/longQC.py sampleqc -x ont-ligation -o ontqc -p 4 Cf_KPC_all.fastq
longQC:2020-06-04 13:38:43,599:166:INFO:Cmd: /home/tarah/programs/LongQC/longQC.py sampleqc -x ont-ligation -o ontqc -p 4 Cf_KPC_all.fastq
longQC:2020-06-04 13:38:43,599:216:INFO:Preset "ont-ligation" was applied. Options --pb(--ont) is overwritten.
longQC:2020-06-04 13:38:43,733:288:INFO:Computation of the low complexity region started for a chunk 0
lq_mask:2020-06-04 13:38:43,784:111:INFO:New job was submitted: in->ontqc/analysis/tmp_0.fastq, out->ontqc/analysis/tmp_0.txt
longQC:2020-06-04 13:38:43,785:293:INFO:Adapter search is starting for a chunk 0.
longQC:2020-06-04 13:38:43,785:309:INFO:Computation of the GC fraction started for a chunk 0
lq_utils:2020-06-04 13:38:44,020:358:INFO:list for subsample is not initialized. Initializing now.
lq_adapt:2020-06-04 13:38:44,086:76:INFO:694 reads were skipped due to their short lengths.
lq_adapt:2020-06-04 13:38:44,086:89:INFO:Adapter Sequence: AATGTACTTCGTTCAGTTACGTATTGCT, max identity:0.758621 and the number of trimmed reads: 1
lq_adapt:2020-06-04 13:38:44,218:41:INFO:694 reads were skipped due to their short lengths.
lq_adapt:2020-06-04 13:38:44,218:91:INFO:Adapter Sequence: GCAATACGTAACTGAACG, max identity:0.842105 and the number of trimmed reads: 20
longQC:2020-06-04 13:38:44,563:314:INFO:Adapter search has done for a chunk 0.
longQC:2020-06-04 13:38:44,563:324:INFO:subsample finished for chunk 0.
longQC:2020-06-04 13:38:44,563:344:INFO:Input file parsing was finished. #seqs:3591, #bases: 11155437
lq_mask:2020-06-04 13:38:44,563:114:INFO:Waiting completion of all of jobs...
lq_mask:2020-06-04 13:38:44,658:117:INFO:sdust jobs finished.
lq_mask:2020-06-04 13:38:44,661:87:INFO:sdust output file ontqc/longqc_sdust.txt was made.
lq_mask:2020-06-04 13:38:44,669:93:INFO:tmp file ontqc/analysis/tmp_0.fastq was removed.
lq_mask:2020-06-04 13:38:44,669:93:INFO:tmp file ontqc/analysis/tmp_0.txt was removed.
longQC:2020-06-04 13:38:44,669:348:INFO:Summary table ontqc/longqc_sdust.txt was made.
longQC:2020-06-04 13:38:44,695:354:DEBUG:Highly masked seq list:

longQC:2020-06-04 13:38:44,739:393:INFO:Subsampled seqs were written to a file. #seqs:3591
lq_exec:2020-06-04 13:38:44,749:26:INFO:below command is executed: -Y -l 0 -q 160 -k 15 -w 5 -I 4G -p 160 -t 4 Cf_KPC_all.fastq ontqc/analysis/subsample.fastq
lq_exec:2020-06-04 13:38:44,749:27:INFO:/home/tarah/programs/LongQC/minimap2_mod/minimap2-coverage is started.
longQC:2020-06-04 13:38:44,750:421:INFO:Overlap computation started. Process is 14834
lq_gcfrac:2020-06-04 13:38:44,750:52:INFO:Mean GC composition: 0.510
Traceback (most recent call last):
File "/home/tarah/programs/LongQC/longQC.py", line 917, in
main(args)
File "/home/tarah/programs/LongQC/longQC.py", line 63, in main
args.handler(args)
File "/home/tarah/programs/LongQC/longQC.py", line 424, in command_sample
gc_read_mean, gc_read_sd = lg.plot_unmasked_gc_frac(fp=fig_path_gc)
File "/home/tarah/programs/LongQC/lq_gcfrac.py", line 54, in plot_unmasked_gc_frac
plt.hist(self.r_frac, alpha=0.3, bins=np.arange(min(self.r_frac), max(self.r_frac) + b_width, b_width), color='blue', normed=True)
File "/home/tarah/miniconda3/envs/py36/lib/python3.6/site-packages/matplotlib/pyplot.py", line 2610, in hist
if data is not None else {}), **kwargs)
File "/home/tarah/miniconda3/envs/py36/lib/python3.6/site-packages/matplotlib/init.py", line 1565, in inner
return func(ax, *map(sanitize_sequence, args), **kwargs)
File "/home/tarah/miniconda3/envs/py36/lib/python3.6/site-packages/matplotlib/axes/_axes.py", line 6808, in hist
p.update(kwargs)
File "/home/tarah/miniconda3/envs/py36/lib/python3.6/site-packages/matplotlib/artist.py", line 1006, in update
ret = [_update_property(self, k, v) for k, v in props.items()]
File "/home/tarah/miniconda3/envs/py36/lib/python3.6/site-packages/matplotlib/artist.py", line 1006, in
ret = [_update_property(self, k, v) for k, v in props.items()]
File "/home/tarah/miniconda3/envs/py36/lib/python3.6/site-packages/matplotlib/artist.py", line 1002, in _update_property
.format(type(self).name, k))
AttributeError: 'Rectangle' object has no property 'normed'

AttributeError: 'float' object has no attribute 'split'

Hello....

When I run the below code,

python longQC.py sampleqc -x pb-rs2 -o prueba14 /home/jforero/mis_datos/tutoriales/anaconda3/prueba10.fasta -p 72 -d --fast -m 2 -i 10 --trim_output trimmedsequences

It gives me an error saying that there is attribute error: 'float' object has no attribute 'split' .

I would like to know why this error comes about.

lq_coverage:2020-08-03 10:38:59,423:122:INFO:Estimation of coverage distribution finished.
Traceback (most recent call last):
  File "longQC.py", line 932, in <module>
    main(args)
  File "longQC.py", line 62, in main
    args.handler(args)
  File "longQC.py", line 592, in command_sample
    adp3_pos=np.mean(adp_pos3) if args.adp3 and adp_pos3 and np.mean(adp_pos3) > 0 else None)
  File "/datos/datosjforero/tutoriales/anaconda3/LongQC/lq_coverage.py", line 373, in plot_unmapped_frac_terminal
    t5l, t3l, il = self.__region_analysis(3, 1)
  File "/datos/datosjforero/tutoriales/anaconda3/LongQC/lq_coverage.py", line 596, in __region_analysis
    regs = [(int(reg.split('-')[0]), int(reg.split('-')[1])) for reg in str.split(',')]
AttributeError: 'float' object has no attribute 'split'

UnicodeDecodeError

Hello

I hope all is well.

Sorry for the bother but I have an issue running longQC and was hoping for some help as I haven't been able to figure it out.

I believe I have everything installed correctly as per the instructions however when I run the below command to get sampleqc on a pacbio seuel bam file called 37.bam:

python /pub01/mgemmell/programs_chos_7/longqc/LongQC/longQC.py sampleqc -x pb-sequel -o 37_longqc_results 37.bam

I get the below:

longQC:2021-01-19 13:43:57,198:169:INFO:Cmd: /pub01/mgemmell/programs_chos_7/longqc/LongQC/longQC.py sampleqc -x pb-sequel -o 37_longqc_results 37.bam
longQC:2021-01-19 13:43:57,198:233:INFO:Preset "pb-sequel" was applied. Options --pb(--ont) is overwritten.
Traceback (most recent call last):
File "/pub01/mgemmell/programs_chos_7/longqc/LongQC/longQC.py", line 956, in
main(args)
File "/pub01/mgemmell/programs_chos_7/longqc/LongQC/longQC.py", line 62, in main
args.handler(args)
File "/pub01/mgemmell/programs_chos_7/longqc/LongQC/longQC.py", line 235, in command_sample
file_format_code = guess_format(args.input)
File "/pub01/mgemmell/programs_chos_7/longqc/LongQC/lq_utils.py", line 125, in guess_format
l = f.readline()
File "/pub01/mgemmellprograms_chos_8/anaconda3/envs/longqc/lib/python3.8/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x99 in position 4: invalid start byte

Any help would be appreciated and feel free to ask me any questions if it helps.

minimap2-coverage fatal error: zlib.h: No such file or directory

Hi, I tried to compile minimap2-coverage within the cloned LongQC repo and seems to be missing a file

cd LongQC/minimap2-coverage && make cc -c -g -O2 -Wall -Wc++-compat -DHAVE_KALLOC minimap2-coverage.c -o minimap2-coverage.o minimap2-coverage.c:4:10: fatal error: zlib.h: No such file or directory 4 | #include <zlib.h> | ^~~~~~~~ compilation terminated. make: *** [Makefile:29: minimap2-coverage.o] Error 1

Any ideas?

conda error for pysam and edlib

Hi, I am trying to install pysam and edlib. It shows following error. Please suggest.

--
b) conda install -c bioconda pysam
c) conda install -c bioconda edlib

error-----------
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.

ResolvePackageNotFound:

python=3.1

More explanations on `sampleqc` module

Hi @yfukasawa ,

Again, thanks heaps for this wonderful tool.

Could you kindly explain a bit of what happens in/logic behind (or point me to a location where I can find a thorough explanation) the sampleqc step? It seems to be generating multiple fastq files at the moment (I started processing only ~1 hour ago). I executed the following script:
python longQC.py sampleqc -x pb-sequel -p 8 -o ${OUT_DIR}/${subdir} ${BAM}

Cheers,
Shani.

UnboundLocalError: local variable 'tuple_3'

Hello,
I've just installed the software. When running:

conda activate LongQC_env
export PATH=/home/m.sevi/software/LongQC/minimap2_mod/:$PATH
python /home/m.sevi/software/LongQC/longQC.py sampleqc -p 4 -x ont-rapid -o fne_qc_out_dir /scratch/m.sevi/processing/WW_HRSD/data/long/basecalled_fq/fne/fne_fastq/fne.fastq

The run fails with standard output:

longQC:2020-06-18 15:11:19,615:166:INFO:Cmd: /home/m.sevi/software/LongQC/longQC.py sampleqc -p 4 -x ont-rapid -o fne_qc_out_dir /scratch/m.sevi/processing/WW_HRSD/data/long/basecalled_fq/fne/fne_fastq/fne.fastq
longQC:2020-06-18 15:11:19,615:216:INFO:Preset "ont-rapid" was applied. Options --pb(--ont) is overwritten.
longQC:2020-06-18 15:11:21,621:288:INFO:Computation of the low complexity region started for a chunk 0
lq_mask:2020-06-18 15:11:22,791:111:INFO:New job was submitted: in->fne_qc_out_dir/analysis/tmp_0.fastq, out->fne_qc_out_dir/analysis/tmp_0.txt
longQC:2020-06-18 15:11:22,792:293:INFO:Adapter search is starting for a chunk 0.
longQC:2020-06-18 15:11:22,792:309:INFO:Computation of the GC fraction started for a chunk 0
lq_utils:2020-06-18 15:11:25,979:380:INFO:list for subsample is not initialized. Initializing now.
lq_adapt:2020-06-18 15:11:30,810:76:INFO:7038 reads were skipped due to their short lengths.
lq_adapt:2020-06-18 15:11:30,811:96:INFO:Adapter Sequence: GTTTTCGCATTTATCGTGAAACGCTTTCGCGTTTTTCGTGCGCCGCTTCA, max identity:-1.000000 and the number of trimmed reads: 0
longQC:2020-06-18 15:11:45,124:320:INFO:Adapter search has done for a chunk 0.
longQC:2020-06-18 15:11:45,125:324:INFO:subsample finished for chunk 0.
Traceback (most recent call last):
File "/home/m.sevi/software/LongQC/longQC.py", line 920, in
main(args)
File "/home/m.sevi/software/LongQC/longQC.py", line 63, in main
args.handler(args)
File "/home/m.sevi/software/LongQC/longQC.py", line 335, in command_sample
if tuple_3:
UnboundLocalError: local variable 'tuple_3' referenced before assignment

Below find information about my environment:

_libgcc_mutex 0.1 main
blas 1.0 mkl
bzip2 1.0.8 h7b6447c_0
ca-certificates 2020.1.1 0 anaconda
certifi 2020.4.5.2 py37_0 anaconda
cycler 0.10.0 py_2 conda-forge
dbus 1.13.6 he372182_0 conda-forge
edlib 1.2.3 h2d50403_1 bioconda
expat 2.2.9 he1b5a44_2 conda-forge
fontconfig 2.13.1 he4413a7_1000 conda-forge
freetype 2.10.2 he06d7ca_0 conda-forge
glib 2.63.1 h3eb4bd4_1
gst-plugins-base 1.14.0 hbbd80ab_1
gstreamer 1.14.0 hb31296c_0
h5py 2.10.0 py37h7918eee_0
hdf5 1.10.4 hb1b8bf9_0
icu 58.2 hf484d3e_1000 conda-forge
intel-openmp 2019.4 243
jinja2 2.11.2 py_0 anaconda
joblib 0.15.1 py_0 anaconda
jpeg 9d h516909a_0 conda-forge
kiwisolver 1.2.0 py37h99015e2_0 conda-forge
krb5 1.17.1 h173b8e3_0
ld_impl_linux-64 2.33.1 h53a641e_7
libcurl 7.69.1 h20c2e04_0
libdeflate 1.6 h516909a_0 conda-forge
libedit 3.1.20181209 hc058e9b_0
libffi 3.3 he6710b0_1
libgcc-ng 9.1.0 hdf63c60_0
libgfortran-ng 7.3.0 hdf63c60_0
libpng 1.6.37 hed695b0_1 conda-forge
libssh2 1.9.0 h1ba5d50_1
libstdcxx-ng 9.1.0 hdf63c60_0
libuuid 2.32.1 h14c3975_1000 conda-forge
libxcb 1.13 h14c3975_1002 conda-forge
libxml2 2.9.10 he19cac6_1
markupsafe 1.1.1 py37h7b6447c_0 anaconda
matplotlib 3.2.1 0 conda-forge
matplotlib-base 3.2.1 py37hef1b27d_0
mkl 2019.4 243
mkl-service 2.3.0 py37he904b0f_0
mkl_fft 1.0.14 py37hd81dba3_0 r
mkl_random 1.0.4 py37hd81dba3_0 r
ncurses 6.2 he6710b0_1
numpy 1.17.0 py37h7e9f1db_0 r
numpy-base 1.17.0 py37hde5b4d6_0 r
openssl 1.1.1g h7b6447c_0 anaconda
pandas 1.0.4 py37h0573a6f_0 anaconda
pcre 8.44 he1b5a44_0 conda-forge
pip 20.1.1 py37_1
pthread-stubs 0.4 h14c3975_1001 conda-forge
pyparsing 2.4.7 pyh9f0ad1d_0 conda-forge
pyqt 5.9.2 py37hcca6a23_4 conda-forge
pysam 0.16.0.1 py37hc501bad_0 bioconda
python 3.7.7 hcff3b4d_5
python-dateutil 2.8.1 py_0 anaconda
python-edlib 1.3.8.post1 py37h99015e2_1 bioconda
python_abi 3.7 1_cp37m conda-forge
pytz 2020.1 py_0 anaconda
qt 5.9.7 h5867ecd_1
readline 8.0 h7b6447c_0
scikit-learn 0.22.1 py37hd81dba3_0 anaconda
scipy 1.4.1 py37h0b6359f_0 anaconda
setuptools 47.3.0 py37_0
sip 4.19.8 py37hf484d3e_0
six 1.15.0 py_0
sqlite 3.31.1 h62c20be_1
tk 8.6.8 hbc83047_0
tornado 6.0.4 py37h8f50634_1 conda-forge
wheel 0.34.2 py37_0
xorg-libxau 1.0.9 h14c3975_0 conda-forge
xorg-libxdmcp 1.1.3 h516909a_0 conda-forge
xz 5.2.5 h7b6447c_0
zlib 1.2.11 h7b6447c_3

I'd appreciate any feedback.

Thank you,
Maria

runqc module errors

Hi @yfukasawa ,

Thanks for this tool kit. It is definitely very flexible and important. I am wondering whether you have fully implemented the runqc module within LongQC, or is it still ongoing? I can't seem to find a proper tutorial/readme on this process as well as am facing some issues when running it in python 3.7 (accessing the scripts manually through a cloned repo). If the implementation of runqc is complete, I am happy to show the errors so, hopefully, they can be troubleshot.

Thanks heaps,
Shani.

docker install issue

Hi @yfukasawa
I am having trouble installing the docker image of LongQC, it is showing some error related to glibc version.
error:
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Solving environment: ...working... failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.

Found conflicts! Looking for incompatible packages.
This can take several minutes. Press CTRL-C to abort.
failed

UnsatisfiableError: The following specifications were found to be incompatible with each other:

Output in format: Requested package -> Available versionsThe following specifications were found to be incompatible with your system:

feature:/linux-64::__glibc==2.28=0
pysam -> libgcc-ng[version='>=9.3.0'] -> __glibc[version='>=2.17']
python=3.8 -> libgcc-ng[version='>=7.5.0'] -> __glibc[version='>=2.17']

Your installed version is: 2.28

Is there any other other option than docker file to install LongQC? or is there a way I can fix this bug?

Thanks in Advance
Saraswati Awasthi

Nanopore reads without adapters

I received processed nanopore data and wanted to see the overall quality of the final dataset.
However I have the feeling that the adapters are already removed from the dataset.
Is there a way to run LongQC without adapter information?

How did the author find the adapter sequences for Pacbio sequel?

I read the longQC.py script, there are some adapter sequences. For example,

    if args.preset:
        p = args.preset
        if p == 'pb-rs2':
            args.pb = True
            args.adp5 = "ATCTCTCTCTTTTCCTCCTCCTCCGTTGTTGTTGTTGAGAGAGAT" if not args.adp5 else args.adp5
            args.adp3 = "ATCTCTCTCTTTTCCTCCTCCTCCGTTGTTGTTGTTGAGAGAGAT" if not args.adp3 else args.adp3
            minimap2_params    = "-Y -l 0 -q 160"
            minimap2_med_score_threshold = 80
            if args.short:
                minimap2_med_score_threshold_short = 60
        elif p == 'pb-sequel':
            args.pb = True
            args.sequel = True
            args.adp5 = "ATCTCTCTCAACAACAACAACGGAGGAGGAGGAAAAGAGAGAGAT" if not args.adp5 else args.adp5
            args.adp3 = "ATCTCTCTCAACAACAACAACGGAGGAGGAGGAAAAGAGAGAGAT" if not args.adp3 else args.adp3
            minimap2_params = "-Y -l 0 -q 160"
            minimap2_med_score_threshold = 80
            if args.short:
                minimap2_med_score_threshold_short = 60

For example, I have done a 'pb-sequel' sequencing, and I don't know what the adapter sequence is. Are the adapter sequences in the longQC.py script correct for my case? How can I judge it?
How did the author find the adapter sequences for Pacbio sequel? Could you show us the reference website?

incomprehensible issue

Goodmorning,

I run longqc with ONT data with the following command :
srun longQC.py sampleqc -x ont-rapid -s ${sample} -p 30 -o ${folder_sampleqc}/${sample}_500X_rapid ${long_read}
and some of my samples succeeded but some other seemed to crash and I don't understand why (the issue is uncomprehensible for me, I'm biologist...-> see below for the issue)
I've questioned the cluster manager @lecorguille and he thinks it's rather a tool dependant issue than an installation one.
Could you help us to resolve it ?

thans a lot
regards

Chloé
lq_coverage:2021-03-16 18:57:46,714:374:INFO:Coordinates of coverage analysis were parsed.
Traceback (most recent call last):
File "/opt/LongQC/longQC.py", line 933, in
main(args)
File "/opt/LongQC/longQC.py", line 63, in main
args.handler(args)
File "/opt/LongQC/longQC.py", line 598, in command_sample
lc.plot_length_vs_coverage(fig_path_cl)
File "/opt/LongQC/lq_coverage.py", line 461, in plot_length_vs_coverage
self.__check_outlier_coverage(interval)
File "/opt/LongQC/lq_coverage.py", line 482, in __check_outlier_coverage
meds = stats['median'][np.where(stats['size']>=LqCoverage.LENGTH_BIN_THRESHOLD)[0]]
File "/opt/conda/lib/python3.8/site-packages/pandas/core/series.py", line 908, in getitem
return self._get_with(key)
File "/opt/conda/lib/python3.8/site-packages/pandas/core/series.py", line 943, in _get_with
return self.loc[key]
File "/opt/conda/lib/python3.8/site-packages/pandas/core/indexing.py", line 879, in getitem
return self._getitem_axis(maybe_callable, axis=axis)
File "/opt/conda/lib/python3.8/site-packages/pandas/core/indexing.py", line 1099, in _getitem_axis
return self._getitem_iterable(key, axis=axis)
File "/opt/conda/lib/python3.8/site-packages/pandas/core/indexing.py", line 1037, in _getitem_iterable
keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False)
File "/opt/conda/lib/python3.8/site-packages/pandas/core/indexing.py", line 1254, in _get_listlike_indexer
self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
File "/opt/conda/lib/python3.8/site-packages/pandas/core/indexing.py", line 1315, in _validate_read_indexer
raise KeyError(
KeyError: "Passing list-likes to .loc or [] with any missing labels is no longer supported. The following labels were missing: Int64Index([2, 3], dtype='int64', name='Binned read length'). See https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike"
srun: error: cpu-node-18: task 0: Exited with exit code 1

Value error

Good afternoon,
I'm trying to run LongQC on a pacbio sequel library in fastq format.
The software starts just fine, but then crashes with the following error:

longQC:2020-06-08 15:02:23,100:573:INFO:Generating coverage related plots...
lq_coverage:2020-06-08 15:02:23,199:120:INFO:Estimating coverage distribution..
Traceback (most recent call last):
  File "/PATH/TO/Andrea/LongQC/longQC.py", line 920, in <module>
    main(args)
  File "/PATH/TO/Andrea/LongQC/longQC.py", line 63, in main
    args.handler(args)
  File "/PATH/TO/Andrea/LongQC/longQC.py", line 577, in command_sample
    lc = LqCoverage(cov_path, isTranscript=args.transcript, control_filtering=pb_control)
  File "/PATH/TO/Andrea/LongQC/lq_coverage.py", line 121, in __init__
    self.__est_coverage()
  File "/PATH/TO/Andrea/LongQC/lq_coverage.py", line 220, in __est_coverage
    model_main_comp = self.__est_coverage_dist_gmm(k_i=2)
  File "/PATH/TO/Andrea/LongQC/lq_coverage.py", line 545, in __est_coverage_dist_gmm
    nonzeros  = self.df[LqCoverage.COVERAGE_COLUMN].values[np.nonzero(self.df[LqCoverage.COVERAGE_COLUMN])]
  File "<__array_function__ internals>", line 6, in nonzero
  File "/PATH/TO/Andrea/myanaconda/longqc/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 1896, in nonzero
    return _wrapfunc(a, 'nonzero')
  File "/PATH/TO/Andrea/myanaconda/longqc/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 58, in _wrapfunc
    return _wrapit(obj, method, *args, **kwds)
  File "/PATH/TO/Andrea/myanaconda/longqc/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 51, in _wrapit
    result = wrap(result)
  File "/PATH/TO/Andrea/myanaconda/longqc/lib/python3.7/site-packages/pandas/core/generic.py", line 1918, in __array_wrap__
    return self._constructor(result, **d).__finalize__(self)
  File "/PATH/TO/Andrea/myanaconda/longqc/lib/python3.7/site-packages/pandas/core/series.py", line 292, in __init__
    f"Length of passed values is {len(data)}, "
ValueError: Length of passed values is 1, index implies 5000.

Not sure if the problem is with the data or with the dependencies.
Thanks for the help
Andrea

pandas.errors.EmptyDataError: No columns to parse from file

Hi,

Thanks for the great work.

I experience a similar issue as described here #28 and here #34.

longQC:2021-10-27 08:06:14,443:598:INFO:Generating coverage related plots...
Traceback (most recent call last):
  File "/storage/home/hcoda1/3/apfennig3/LongQC/longQC.py", line 956, in <module>
    main(args)
  File "/storage/home/hcoda1/3/apfennig3/LongQC/longQC.py", line 62, in main
    args.handler(args)
  File "/storage/home/hcoda1/3/apfennig3/LongQC/longQC.py", line 602, in command_sample
    lc = LqCoverage(cov_path, isTranscript=args.transcript, control_filtering=pb_control)
  File "/storage/home/hcoda1/3/apfennig3/LongQC/lq_coverage.py", line 88, in __init__
    self.df = pd.read_table(table_path, sep='\t', header=None, dtype={3: str, 4: str})
  File "/storage/home/hcoda1/3/apfennig3/.conda/envs/GBL/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/storage/home/hcoda1/3/apfennig3/.conda/envs/GBL/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 683, in read_table
    return _read(filepath_or_buffer, kwds)
  File "/storage/home/hcoda1/3/apfennig3/.conda/envs/GBL/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 482, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/storage/home/hcoda1/3/apfennig3/.conda/envs/GBL/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 811, in __init__
    self._engine = self._make_engine(self.engine)
  File "/storage/home/hcoda1/3/apfennig3/.conda/envs/GBL/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1040, in _make_engine
    return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
  File "/storage/home/hcoda1/3/apfennig3/.conda/envs/GBL/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 69, in __init__
    self._reader = parsers.TextReader(self.handles.handle, **kwds)
  File "pandas/_libs/parsers.pyx", line 549, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file

However, I don't think it's a memory issue. I already reduced the index size to 100M. The peak RSS is 6.7G and 22.7G during the spiked-in control, which seems to run through normal. I requested 64G of Ram, which is why I don't think memory is the issue here. This is the command I used to execute the pipeline:

python ${home_dir}LongQC/longQC.py sampleqc -o ${home_dir}scratch/QC/ -i 100M -x pb-sequel --sample_name gbl -m 1 -p 64 ${home_dir}scratch/gbl.subreads.bam

The coverage_out.txt file is empty, causing the error. I attached the coverage_err.txt file, the log file, and the files corresponding to the spiked-in control:

coverage_err_gbl.txt
qc.log
spiked_in_control_gbl.txt
spiked_in_control_gbl_stderr.txt

Any thoughts on this?

Thanks,
Aaron

Sequencing kit for -x

Hi, as stated on the GitHub page, for ONT data, there are two optional kits to choose from: 1D ligation and rapid sequencing kit. But as I understand from the ONT document here: https://nanoporetech.com/sites/default/files/s3/Product_brochure_Final_July_2018.pdf (pages 4 to 7 about available kits for library preparation), both are kits for DNA library preparation. But how about RNA? We actually used cDNA-PCR Sequencing Kit in our project, which is a type of RNA library.

SyntaxError

hi, @yfukasawa
When I tried to run longQC, it gave we following error:

python ~/LongQC/longQC.py -h
  File "~/longQC.py", line 255
    le.exec(*le_args, out=cov_path, err=cov_path_e)
          ^
SyntaxError: invalid syntax

It seems thant "exec" is a key word in python, how to avoid this?

Thank you,
Xiucz.

Question about bases masked

Hi, Yoshinori. First thanks for developing this nice tool.
You explained that the second column of 'longqc_sdust.txt' table is 'the number of bases masked (MDUST)'. I am wondering how you define a masked base. Since for my read, I didn't find a base pair in lower case.

Error in running LongQC sampleqc for ONT ligation data

Hi @yfukasawa ,

Thanks for this useful tool kit. I was running LongQC on my ONT direct cDNA sequencing data using the following script.
python longQC.py sampleqc -x ont-ligation -p 4 -o $out/barcode01 $input/barcode01.fq.gz

The analysis/subsample.fastq file were successfully generated together with minimap coverage error and out txt file. However, the figs folder is empty, and I got the following error:

lq_gcfrac:2020-07-20 15:02:39,582:58:INFO:Kernel density estimation done for read GC composition
Traceback (most recent call last):
  File "longQC.py", line 932, in <module>
    main(args)
  File "longQC.py", line 62, in main
    args.handler(args)
  File "longQC.py", line 435, in command_sample
    gc_read_mean, gc_read_sd = lg.plot_unmasked_gc_frac(fp=fig_path_gc)
  File "/stornext/Home/data/allstaff/d/dong.x/Programs/LongQC/lq_gcfrac.py", line 60, in plot_unmasked_gc_frac
    plt.hist(self.c_frac, alpha=0.3, bins=np.arange(min(self.c_frac), max(self.c_frac) + b_width, b_width), color='red', density=True)
ValueError: min() arg is an empty sequence

Could you please tell me how to fix the problem?

Thanks,
Xueyi

longQC.py error

Hi, I am trying to run longQC.py with the following command. However, it is showing following error. Please suggest.

python longQC.py sampleqc -x pb-rs2 -o /longqc/ merge.fastq.gz

error:

Traceback (most recent call last):
File "/media/bmaurice/Data2/Hybrid_assembly_virus/DMV10_IBV/longqc/LongQC/longQC.py", line 20, in
import pandas as pd
ModuleNotFoundError: No module named 'pandas'

minimap2-coverage.c:565:9: error: implicit declaration of function 'compute_reliable_region' is invalid in C99

Hi, I'm new to this, but im currently working with fast5 files generated by minion
trying to install longQC in my Mac, I get run ´LongQC/minimap2-coverage && make´ I get the following error

´minimap2-coverage.c:565:9: error: implicit declaration of function 'compute_reliable_region' is invalid in C99
[-Werror,-Wimplicit-function-declaration]
compute_reliable_region(v, fopt.min_coverage, &regs, &mregs);
^
1 error generated.
make: *** [minimap2-coverage.o] Error 1´

can anyone help me on this?

Thank you

How to run LongQC with Singularity

Hi, I have no problem running LongQC with Docker with the command in the Doc:

docker run -it \
-v YOUR_INPUT_DIR:/input \
-v YOUR_OUTPUT_DIR:/output \
longqc sampleqc \
-x pb-sequel \ **specify a preset and change accordingly.**
-p $(nproc) \ **number of process/cores, this uses all of your cores. change accordingly.**
-o /output/YOUR_SAMPLE_NAME \ **keep /output as this is binded.**
/input/YOUR_INPUT_READ_FILE **keep /input as this is binded.**

But because of Docker's high privilege requirement, it is not allowed on our server. But Singularity is supported on the server. So I converted LongQC's Docker image into a Singularity file longqc.sif, but I'm having problems getting it run with singularity. Do you have any suggestions on how to run this longqc.sif with Singularity? Thank you in advance!

KeyError: "Passing list-likes to .loc or [] with any missing labels is no longer supported

Im Running LongQC in my snakemake Nanopore genome pipe as on a cluster with no admin rights.

rule LongQC:
    input:
        'filtered_reads/{sample}.fastq.gz',
    output:
        'QC_results/{sample}',
    threads: t
    shell:
        '''
        python ~/SCRATCH_NOBAK/workplace_5z/LongQC/longQC.py sampleqc -x ont-ligation --ncpu {threads} -o {output} {input}
        '''

It runs fine with no error from minimap until the end.
I attached the output (without subsample.fastq[to big]) i think only the .html report is missing

A22.zip

Strange is it worked twice when testing after make:

python ~/SCRATCH_NOBAK/workplace_5z/LongQC/longQC.py sampleqc -x ont-ligation --ncpu 100 -o ~/SCRATCH_NOBAK/workplace_5z/LongQC/hybrid/A22 ~/SCRATCH_NOBAK/workplace_5z/hybrid/reads/long/A22.fastq.gz

Error

longQC:2021-04-07 12:55:48,036:489:INFO:Genarated the sample read length plot.
longQC:2021-04-07 12:55:48,037:491:INFO:Throughput: 500003168
longQC:2021-04-07 12:55:48,037:492:INFO:Length of longest read: 69248
longQC:2021-04-07 12:55:48,037:493:INFO:The number of reads: 57175
longQC:2021-04-07 12:55:48,038:524:INFO:Calculating overlaps of sampled reads...
longQC:2021-04-07 12:55:58,047:524:INFO:Calculating overlaps of sampled reads...
longQC:2021-04-07 12:56:08,055:524:INFO:Calculating overlaps of sampled reads...
longQC:2021-04-07 12:56:18,063:524:INFO:Calculating overlaps of sampled reads...
longQC:2021-04-07 12:56:28,071:524:INFO:Calculating overlaps of sampled reads...
longQC:2021-04-07 12:56:38,079:524:INFO:Calculating overlaps of sampled reads...
longQC:2021-04-07 12:56:48,087:524:INFO:Calculating overlaps of sampled reads...
longQC:2021-04-07 12:56:58,095:522:INFO:Process 66315 for ~/SCRATCH_NOBAK/workplace_5z/LongQC/minimap2-coverage/minimap2-coverage terminated.
longQC:2021-04-07 12:56:58,096:526:INFO:Overlap computation finished.
longQC:2021-04-07 12:57:08,103:598:INFO:Generating coverage related plots...
lq_coverage:2021-04-07 12:57:08,118:121:INFO:Estimating coverage distribution..
lq_coverage:2021-04-07 12:57:08,186:571:DEBUG:GaussianMixture(n_components=2)
lq_coverage:2021-04-07 12:57:08,192:576:INFO:The order of componens 0.009008786180148729 0.001502131247968907 
lq_coverage:2021-04-07 12:57:08,192:577:INFO:Means of components: 77.92692972768067 52.171159446345385 k=2
lq_coverage:2021-04-07 12:57:08,192:578:INFO:Covariances of components: 76.03986535434747 209.68414862634 k=2
lq_coverage:2021-04-07 12:57:08,229:123:INFO:Estimation of coverage distribution finished.
lq_coverage:2021-04-07 12:57:08,795:392:INFO:Coordinates of coverage analysis were parsed.
Traceback (most recent call last):
  File "~/SCRATCH_NOBAK/workplace_5z/LongQC/longQC.py", line 956, in <module>
    main(args)
  File "~/SCRATCH_NOBAK/workplace_5z/LongQC/longQC.py", line 62, in main
    args.handler(args)
  File "~/SCRATCH_NOBAK/workplace_5z/LongQC/longQC.py", line 611, in command_sample
    lc.plot_length_vs_coverage(fig_path_cl)
  File "/scratch/blumenscheitc/workplace_5z/LongQC/lq_coverage.py", line 479, in plot_length_vs_coverage
    self.__check_outlier_coverage(interval)
  File "/scratch/blumenscheitc/workplace_5z/LongQC/lq_coverage.py", line 500, in __check_outlier_coverage
    meds = stats['median'][np.where(stats['size']>=LqCoverage.LENGTH_BIN_THRESHOLD)[0]]
  File "~/.conda/envs/bio39/lib/python3.8/site-packages/pandas/core/series.py", line 877, in __getitem__
    return self._get_with(key)
  File "~/.conda/envs/bio39/lib/python3.8/site-packages/pandas/core/series.py", line 912, in _get_with
    return self.loc[key]
  File "~/.conda/envs/bio39/lib/python3.8/site-packages/pandas/core/indexing.py", line 895, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File "~/.conda/envs/bio39/lib/python3.8/site-packages/pandas/core/indexing.py", line 1113, in _getitem_axis
    return self._getitem_iterable(key, axis=axis)
  File "~/.conda/envs/bio39/lib/python3.8/site-packages/pandas/core/indexing.py", line 1053, in _getitem_iterable
    keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False)
  File "~/.conda/envs/bio39/lib/python3.8/site-packages/pandas/core/indexing.py", line 1266, in _get_listlike_indexer
    self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
  File "~/.conda/envs/bio39/lib/python3.8/site-packages/pandas/core/indexing.py", line 1321, in _validate_read_indexer
    raise KeyError(
KeyError: "Passing list-likes to .loc or [] with any missing labels is no longer supported. The following labels were missing: Int64Index([0], dtype='int64', name='Binned read length'). See https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike"
Error in job LongQC while creating output file QC_results/A22.
RuleException:
CalledProcessError in line 33 of /scratch/blumenscheitc/workplace_5z/long_reads_preprocess/Snakefile:
Command '
        python ~/SCRATCH_NOBAK/workplace_5z/LongQC/longQC.py sampleqc -x ont-ligation --ncpu 100 -o QC_results/A22 filtered_reads/A22.fastq.gz
        ' returned non-zero exit status 1
  File "/scratch/blumenscheitc/workplace_5z/long_reads_preprocess/Snakefile", line 33, in __rule_LongQC
  File "/usr/lib/python3.5/concurrent/futures/thread.py", line 55, in run
Removing output files of failed job LongQC since they might be corrupted:
QC_results/A22
Skipped removing non-empty directory QC_results/A22
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message

Blank QV plots?

Hello,

Thanks for making this awesome tool! I'm recently installed it and am trying to use the sampleqc tool. I am running with the below call:

longQC.py sampleqc -p 16 -m 2 -d -x pb-hifi -s <sample_name> -o <out_dir> <sample_name.bam>

The tools appears to have run without any errors, but some of the output is not as I expected?

The fig_longQC_sampleqc_average_qv_<sample_name>.png and fig_longQC_sampleqc_olp_qv_<sample_name>.png files appear as below:

Is this expected for a given error, or do I need to review my installation? No error messages were created.

Many thanks in advance!

Adaptor trimming

Hi Yoshinori,
I am trying to trim adaptors from ONT reads:
Do we just use the flags --adapter_5 ADP5 --adapter_3 ADP3?
I have used this:
for s in $(cat samples_ont.txt);do
longQC.py sampleqc -x ont-rapid -o ${today}OntQC_results/${s} LRfastqs/${s}.fastq.gz --adapter_5 ADP5 --adapter_3 ADP3 --trim_output ${today}OntQC_results --ncpu ${SLURM_CPUS_PER_TASK}
done

I keep having the error
IsADirectoryError: Is a directory: '20220405OntQC_results/barcode02'
Thanks,
TJ

LOngQc installtion issue

cd LongQC-1.2.0c/minimap2-coverage && make
cc -c -g -O2 -Wall -Wc++-compat -DHAVE_KALLOC minimap2-coverage.c -o minimap2-coverage.o
minimap2-coverage.c:1:10: fatal error: stdlib.h: No such file or directory
1 | #include <stdlib.h>
| ^~~~~~~~~~
compilation terminated.
make: *** [Makefile:29: minimap2-coverage.o] Error 1
im facing this issue. kindly help me to sort out this. Thanks

Handling of no coverage case

Hello @yfukasawa,

I started with the same error as @PerisD. After adding -p 4 it actually ran (got some stats) and got an error that looks like this:
longQC:2020-07-07 08:11:12,923:475:INFO:Genarated the sample read length plot.
longQC:2020-07-07 08:11:12,923:477:INFO:Throughput: 737228
longQC:2020-07-07 08:11:12,923:478:INFO:Length of longest read: 25168
longQC:2020-07-07 08:11:12,923:479:INFO:The number of reads: 121
longQC:2020-07-07 08:11:12,924:508:INFO:Process 450 for /home/user/LongQC/minimap2_mod/minimap2-coverage terminated.
longQC:2020-07-07 08:11:12,924:512:INFO:Overlap computation finished.
longQC:2020-07-07 08:11:22,933:584:INFO:Generating coverage related plots...
lq_coverage:2020-07-07 08:11:22,938:120:INFO:Estimating coverage distribution..
/opt/conda/lib/python3.7/site-packages/pandas/core/ops/array_ops.py:253: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
res_values = method(rvalues)
Traceback (most recent call last):
File "longQC.py", line 932, in
main(args)
File "longQC.py", line 62, in main
args.handler(args)
File "longQC.py", line 588, in command_sample
lc = LqCoverage(cov_path, isTranscript=args.transcript, control_filtering=pb_control)
File "/home/user/LongQC/lq_coverage.py", line 121, in init
self.__est_coverage()
File "/home/user/LongQC/lq_coverage.py", line 220, in __est_coverage
model_main_comp = self.__est_coverage_dist_gmm(k_i=2)
File "/home/user/LongQC/lq_coverage.py", line 546, in __est_coverage_dist_gmm
m_f = mixture.GaussianMixture(n_components=k).fit(nonzeros[nonzeros < th_per].reshape(-1,1),1)
File "/opt/conda/lib/python3.7/site-packages/sklearn/mixture/_base.py", line 193, in fit
self.fit_predict(X, y)
File "/opt/conda/lib/python3.7/site-packages/sklearn/mixture/_base.py", line 220, in fit_predict
X = _check_X(X, self.n_components, ensure_min_samples=2)
File "/opt/conda/lib/python3.7/site-packages/sklearn/mixture/_base.py", line 53, in _check_X
ensure_min_samples=ensure_min_samples)
File "/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py", line 73, in inner_f
return f(**kwargs)
File "/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py", line 654, in check_array
context))
ValueError: Found array with 0 sample(s) (shape=(0, 1)) while a minimum of 2 is required.

What went wrong and how can I fix it?
Thanks
Asta

Originally posted by @astulaaa in #3 (comment)

Long time for processing

I am using 1.1.1 version from docker. I have used following command
python longQC.py sampleqc -x ont-ligation -p 6 -d -o /data/out_dir221 /data/Arun1.CCV.fastq
I have aborted the process because it did not finish even after 15 minutes.
My computer has 64 GB RAM and 6 CPU cores

Can you help me to optimize the performance?

longqc

while running longqc I had this error can someone tell me what the problem is
ValueError: truncated quality string in [my path to the fastq file]

pandas.errors.EmptyDataError: No columns to parse from file - no Html file is generated

Hi,
I am having the following issue for few samples, two other samples were completed successfully and generated the Html file
Could you please take a look?
Thank you in advance

longQC:2021-09-10 13:56:57,131:598:INFO:Generating coverage related plots...
Traceback (most recent call last):
  File "/home/kgagalova/src/LongQC/longQC.py", line 956, in <module>
    main(args)
  File "/home/kgagalova/src/LongQC/longQC.py", line 62, in main
    args.handler(args)
  File "/home/kgagalova/src/LongQC/longQC.py", line 602, in command_sample
    lc = LqCoverage(cov_path, isTranscript=args.transcript, control_filtering=pb_control)
  File "/home/kgagalova/src/LongQC/lq_coverage.py", line 88, in __init__
    self.df = pd.read_table(table_path, sep='\t', header=None, dtype={3: str, 4: str})
  File "/home/kgagalova/miniconda3/envs/py3.6bis/lib/python3.6/site-packages/pandas/io/parsers.py", line 767, in read_table
    return read_csv(**locals())
  File "/home/kgagalova/miniconda3/envs/py3.6bis/lib/python3.6/site-packages/pandas/io/parsers.py", line 688, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/kgagalova/miniconda3/envs/py3.6bis/lib/python3.6/site-packages/pandas/io/parsers.py", line 454, in _read
    parser = TextFileReader(fp_or_buf, **kwds)
  File "/home/kgagalova/miniconda3/envs/py3.6bis/lib/python3.6/site-packages/pandas/io/parsers.py", line 948, in __init__
    self._make_engine(self.engine)
  File "/home/kgagalova/miniconda3/envs/py3.6bis/lib/python3.6/site-packages/pandas/io/parsers.py", line 1180, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/home/kgagalova/miniconda3/envs/py3.6bis/lib/python3.6/site-packages/pandas/io/parsers.py", line 2010, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 540, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file

Docker build apt signature errors

I am getting signature errors when trying to build the docker image. I'm trying on macOS 11.5.2, Docker Desktop 3.6.0.

% docker build -t longqc .
[+] Building 2.9s (6/11)
 => [internal] load build definition from Dockerfile                                                                                                                                                                             0.0s
 => => transferring dockerfile: 1.44kB                                                                                                                                                                                           0.0s
 => [internal] load .dockerignore                                                                                                                                                                                                0.1s
 => => transferring context: 2B                                                                                                                                                                                                  0.0s
 => [internal] load metadata for docker.io/continuumio/miniconda3:latest                                                                                                                                                         1.8s
 => CACHED [1/7] FROM docker.io/continuumio/miniconda3@sha256:592a60b95b547f31c11dc6593832e962952e3178f1fa11db37f43a2afe8df8d7                                                                                                   0.0s
 => CACHED https://api.github.com/repos/yfukasawa/longqc/git/refs/heads/minimap2_update                                                                                                                                          0.0s
 => ERROR [2/7] RUN apt-get clean all &&     apt-get update &&     apt-get upgrade -y &&     apt-get install -y      git     build-essential     libc6-dev     zlib1g-dev &&     apt-get clean &&     apt-get purge              0.9s
------
 > [2/7] RUN apt-get clean all &&     apt-get update &&     apt-get upgrade -y &&     apt-get install -y      git     build-essential     libc6-dev     zlib1g-dev &&     apt-get clean &&     apt-get purge:
#4 0.566 Get:1 http://security.debian.org/debian-security buster/updates InRelease [65.4 kB]
#4 0.570 Get:2 http://deb.debian.org/debian buster InRelease [122 kB]
#4 0.603 Get:3 http://deb.debian.org/debian buster-updates InRelease [51.9 kB]
#4 0.712 Err:1 http://security.debian.org/debian-security buster/updates InRelease
#4 0.712   At least one invalid signature was encountered.
#4 0.777 Err:2 http://deb.debian.org/debian buster InRelease
#4 0.777   At least one invalid signature was encountered.
#4 0.849 Err:3 http://deb.debian.org/debian buster-updates InRelease
#4 0.849   At least one invalid signature was encountered.
#4 0.858 Reading package lists...
#4 0.871 W: GPG error: http://security.debian.org/debian-security buster/updates InRelease: At least one invalid signature was encountered.
#4 0.871 E: The repository 'http://security.debian.org/debian-security buster/updates InRelease' is not signed.
#4 0.871 W: GPG error: http://deb.debian.org/debian buster InRelease: At least one invalid signature was encountered.
#4 0.871 E: The repository 'http://deb.debian.org/debian buster InRelease' is not signed.
#4 0.871 W: GPG error: http://deb.debian.org/debian buster-updates InRelease: At least one invalid signature was encountered.
#4 0.871 E: The repository 'http://deb.debian.org/debian buster-updates InRelease' is not signed.
------
executor failed running [/bin/sh -c apt-get clean all &&     apt-get update &&     apt-get upgrade -y &&     apt-get install -y      git     build-essential     libc6-dev     zlib1g-dev &&     apt-get clean &&     apt-get purge]: exit code: 100

OSError: [Errno 5] Input/output error

Hi, yfukasawa,
I am running LongQC for a batch of my Pacbio unaligned BAMs. For the small samples, it run smoothly. But for one of a little bit large BAM file, it take a very long time and finally report OSE error. My command is
longqc sampleqc -x pb-sequel -p 8 -o HG002_90pM_read_LongQC ./m64304e_211014_201856.reads.bam
I cut the error related message as the following. Could you help me figure out the reason? Additionally, it take close two weeks to run this sample and get such an OSError, whether I can specific more nodes (such as -p 32) to linearly speed up? Thank you in advance.
Wenchao
File "/opt/LongQC/longQC.py", line 63, in main
args.handler(args)
File "/opt/LongQC/longQC.py", line 829, in command_sample
tpl = env.get_template('web_summary.tpl.html')
File "/opt/conda/lib/python3.9/site-packages/jinja2/environment.py", line 997, in get_template
File "/opt/conda/lib/python3.9/site-packages/jinja2/environment.py", line 958, in _load_template
File "/opt/conda/lib/python3.9/site-packages/jinja2/loaders.py", line 125, in load
File "/opt/conda/lib/python3.9/site-packages/jinja2/loaders.py", line 201, in get_source
OSError: [Errno 5] Input/output error

too high non-sense read fraction

When i run:
python longQC.py sampleqc -x pb-hifi -o longqc ccs.bam
I get a message saying too high non-sense read fraction
Non-sense fraction is 0.647
but if I use:
python longQC.py sampleqc -x pb-sequel -o longqc ccs.bam
Non-sense fraction goes down to 0.282

I a trying to analyse a Sequel II Hifi run, which result would be the correct one?

It looks like the only thing that changes for minimap2-coverage are the database kmer size parameter?
HiFi : -k 15
Sequel : -k 12

docker error: ImportError: /opt/conda/lib/python3.9/site-packages/h5py/defs.cpython-39-x86_64-linux-gnu.so: undefined symbol: H5Pset_fapl_ros3

Hi,
I just built the docker image (version 1.2) , but when I try to run it I get this error:

Traceback (most recent call last):
  File "/root/LongQC/longQC.py", line 29, in <module>
      import lq_nanopore             
  File "/root/LongQC/lq_nanopore.py", line 1, in <module>
    import os, sys, time, h5py, json  
  File "/opt/conda/lib/python3.9/site-packages/h5py/__init__.py", line 33, in <module>        
    from . import version       
  File "/opt/conda/lib/python3.9/site-packages/h5py/version.py", line 15, in <module> 
    from . import h5 as _h5     
  File "h5py/h5.pyx", line 1, in init h5py.h5    
ImportError: /opt/conda/lib/python3.9/site-packages/h5py/defs.cpython-39-x86_64-linux-gnu.so: undefined symbol: H5Pset_fapl_ros3

Any idea what the problem might be?
Cheers

Can you introduce the meaning for each column in longqc_sdust.txt ? 3KU

Can you introduce the meaning for each column in longqc_sdust.txt ?

The longqc_sdust.txt is one of output files for LongQC.

For example with the pacbio sequel bam as input,

	column2	column3	column4	column5
m64062_47008_47682	197	674	0.292	-0.000
m64062_0_35905	976	35905	0.027	-0.000
m64062_0_157	7	157	0.045	-0.000

Thank you.

Error in running LongQC `sampleqc`

Hello @yfukasawa ,

I am running LongQC on my PacBio data with the following script.
python longQC.py sampleqc -x pb-sequel -d -p 8 -m 2 -o ${OUT_DIR}/RAW/${subdir} ${BAM}

It seems like something is happening (it generates a fastq file in the analysis directory), but in the end, I get the following error:

longQC:2020-06-19 01:31:56,627:348:INFO:Summary table /stornext/General/data/
/long_read_benchmark/LongQC_output/RAW/S1R/longqc_sdust.txt was made.
Traceback (most recent call last):
  File "/stornext/General/data/long_read_benchmark/LongQC/longQC.py", line 920, in <module>
    main(args)
  File "/stornext/General/data/long_read_benchmark/LongQC/longQC.py", line 63, in main
    args.handler(args)
  File "/stornext/General/data/long_read_benchmark/LongQC/longQC.py", line 351, in command_sample
    df_mask      = pd.read_table(lm.get_outfile_path(), sep='\t', header=None)
  File "/home/.local/lib/python3.7/site-packages/pandas/io/parsers.py", line 676, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/home/.local/lib/python3.7/site-packages/pandas/io/parsers.py", line 448, in _read
    parser = TextFileReader(fp_or_buf, **kwds)
  File "/home/.local/lib/python3.7/site-packages/pandas/io/parsers.py", line 880, in __init__
    self._make_engine(self.engine)
  File "/home/.local/lib/python3.7/site-packages/pandas/io/parsers.py", line 1114, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/home/.local/lib/python3.7/site-packages/pandas/io/parsers.py", line 1891, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 532, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file

Also, the figs and the analysis/minimap2 folders are empty.

I have loaded minimap2 into my PATH:
echo $PATH: /stornext/System/data/apps/minimap2/minimap2-2.17/bin:/stornext/System/data/apps/python/python-3.7.0/bin:/stornext/System/data/apps/anaconda2/anaconda2-2019.10/condabin:/stornext/System/data/apps/R/R-3.6.1/lib64/R/bin:/stornext/System/data/apps/hdf5/hdf5-1.8.20/bin:/stornext/System/data/apps/java/java-1.8.0_131/bin:/stornext/System/data/apps/scala/scala-2.12.2/bin:/usr/local/bioinf/bin:/stornext/System/data/apps/fastqc/fastqc-0.11.8/bin:/stornext/System/data/apps/samtools/samtools-1.7/bin:/usr/local/bioinf/bin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/users/allstaff/bin/Linux:/home/users/all_users/bin:/home/users/user/.local/bin

What is happening? Can you please explain whether I'm doing something wrong here. It would be great to use your tool for our analysis!

Thanks and I hope to hear from you soon!
Shani.

"UnsatisfiableError" during installation

Hello! I encountered several errors while installing. Please, help to solve it.

Steps to reproduce:

With manual installation:

Install Anaconda 3:

# install anaconda prerequisites
sudo apt install libgl1-mesa-glx libegl1-mesa libxrandr2 libxrandr2 libxss1 libxcursor1 libxcomposite1 libasound2 libxi6 libxtst6
# install anaconsa itself
cd ./Downloads
wget https://repo.anaconda.com/archive/Anaconda3-2020.11-Linux-x86_64.sh
bash ./Anaconda3-2020.11-Linux-x86_64.sh
# allow to install to /home/username/anaconda3
# reject running conda init at start-up of system
# add conda to system $PATH varialbe
export PATH="/home/username/anaconda3/bin:$PATH"

This part works fine.
2. Install prerequisites, as described in README.md file:

# the following line works without problem:
conda install h5py
# the next line returns the 'UnsatisfiableError'
conda install -c bioconda pysam
# the following line works without problem:
conda install -c bioconda edlib
# the next line returns the 'UnsatisfiableError'
conda install -c bioconda python-edlib

The error message for pysam is:

~$ conda install -c bioconda pysam
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: /
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
failed

UnsatisfiableError: The following specifications were found to be incompatible with each other:

Output in format: Requested package -> Available versions

The error message for python-edlib is:

~$ conda install -c bioconda python-edlib
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: /
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
failed

UnsatisfiableError: The following specifications were found to be incompatible with each other:

Output in format: Requested package -> Available versions

With DockerFile

Installing Docker

sudo apt update && sudo apt upgrade -y
sudo apt install -y apt-transport-https ca-certificates curl gnupg
sudo apt install -y docker-ce docker-ce-cli containerd.io

It works. No error here.
2. following instructions from README.md file:

cd /mnt/d/biotools
wget https://raw.githubusercontent.com/yfukasawa/LongQC/master/Dockerfile
sudo docker build -t longqc .

And the last line raises an error with the following log:

$ sudo docker build -t longqc .
Sending build context to Docker daemon  278.5MB
Step 1/26 : FROM continuumio/miniconda3
 ---> 52daacd3dd5d
Step 2/26 : MAINTAINER Yoshinori Fukasawa <[email protected]>
 ---> Using cache
 ---> 9cfac16065e5
Step 3/26 : RUN apt-get clean all &&     apt-get update &&     apt-get upgrade -y &&     apt-get install -y      git     build-essential     libc6-dev     zlib1g-dev &&     apt-get clean &&     apt-get purge
 ---> Using cache
 ---> 793730a77067
Step 4/26 : ENV USER user
 ---> Using cache
 ---> 80979e2b1106
Step 5/26 : ENV HOME /home/${USER}
 ---> Using cache
 ---> 0155bb0b6c1a
Step 6/26 : LABEL base_image="miniconda3"
 ---> Using cache
 ---> 0a7264391517
Step 7/26 : LABEL software="LongQC docker"
 ---> Using cache
 ---> 4b21c78cfca5
Step 8/26 : LABEL software.version="1.2"
 ---> Using cache
 ---> 90faa00bc325
Step 9/26 : RUN useradd -m ${USER}
 ---> Using cache
 ---> 243296d01b69
Step 10/26 : RUN echo "${USER}:test_pass" | chpasswd
 ---> Using cache
 ---> 9c07be4b060a
Step 11/26 : ADD https://api.github.com/repos/yfukasawa/longqc/git/refs/heads/minimap2_update version.json
Downloading     373B
 ---> Using cache
 ---> e41881c5db42
Step 12/26 : RUN git clone https://github.com/yfukasawa/LongQC.git $HOME/LongQC
 ---> Using cache
 ---> e0e39f42f46a
Step 13/26 : RUN cd $HOME/LongQC/minimap2-coverage && make
 ---> Using cache
 ---> 99e02c3d2a24
Step 14/26 : RUN conda update -y conda
 ---> Using cache
 ---> 2e13d687d385
Step 15/26 : RUN conda install -y numpy
 ---> Using cache
 ---> 0f257a341327
Step 16/26 : RUN conda install -y pandas'>=0.24.0'
 ---> Using cache
 ---> 6d627a9d63ec
Step 17/26 : RUN conda install -y scipy
 ---> Using cache
 ---> 822e6bb47c1f
Step 18/26 : RUN conda install -y jinja2
 ---> Using cache
 ---> fc01b5c48375
Step 19/26 : RUN conda install -y h5py
 ---> Using cache
 ---> 8652a02b3a1e
Step 20/26 : RUN conda install -y matplotlib'>=2.1.2'
 ---> Using cache
 ---> 6dc66c09087f
Step 21/26 : RUN conda install -y scikit-learn
 ---> Using cache
 ---> 987f6afa4db1
Step 22/26 : RUN conda install -y -c bioconda pysam
 ---> Running in b4ee5f7ad191
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Solving environment: ...working... failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.

Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
failed

UnsatisfiableError: The following specifications were found to be incompatible with each other:

Output in format: Requested package -> Available versions

The command '/bin/sh -c conda install -y -c bioconda pysam' returned a non-zero code: 1

Please, advise the fix.

System info:

OS : Ubuntu 20.04.2 LTS x64
Python 3.8.5

docker build error

Hello,

I am using docker build to install longqc, but it occupyied more than 1T tmp space (as shown below). Is it right? This is my first time to use docker.

$docker build -t longqc .
$Sending build context to Docker daemon 745G

Thank you

Tian

lq_coverage failing

Hi,

I am using longQC for my sequel II data and it ran fine in the past but currently, it is giving me errors for some samples and other samples are running fine.
The surprising thing is that all the figures are generated but it fails to give the following error:

longQC::INFO:Generated coverage related plots.
lq_coverage : WARNING:Mode of lognormal has no value. Do estimation first.
Traceback (most recent call last):
File "longQC.py", line 918, in
main(args)
File "/longQC.py", line 63, in main
args.handler(args)
File "longQC.py", line 591, in command_sample
or (lc.is_low_coverage() and float(lc.get_logn_mode()) < very_low_coverage_threshold)
TypeError: float() argument must be a string or a number, not 'NoneType'

I have tried different very_low_coverage_threshold, went as low as 1.

I am using the following version:
Project Name: longQC.py
Start Date: 2017-10-10
Version: 0.1

This is my command line
python longQC.py sampleqc -o Sample/longQC -x pb-sequel -s Sample Sample.ccs.fastq

Please let me know where I am going wrong.

Thanks for your help!

Error for read length related plots (numpy.linalg.LinAlgError: Singular matrix)

Self posting.

Issue happened in qscore and length_vs_coverage plotting while analyzing datasets having very small variance in read length.
The cause of the error was that the number of bins for read length can be very small in such a case (< 5).
Checking the number of bins before plotting must be done.

"What's the difference of ont-ligation, ont-rapid, ont-1dsq"

Hi, I have doubts about some options under the -x parameter.
What is the difference between "ont-ligation", "ont-rapid", "ont-1dsq" options for ONT sequencing and which option should I choose for Nanopore PromethION Ultra-long sequencing?
Thank you!

Unable to open object (object 'UniqueGlobalKey' doesn't exist)

Dear All,

I got an error that I can't find a solution. I paste here the output after the process breaks just at the beginning.

Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/usr/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/disk2/Users/FLIR/Software/LongQC/lq_nanopore.py", line 155, in wrapper
c_id = get_channel_id(fast5) -1
File "/disk2/Users/FLIR/Software/LongQC/lq_nanopore.py", line 120, in get_channel_id
return int(f['/UniqueGlobalKey']['channel_id'].attrs['channel_number'])
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/disk2/Software/Venv/venv36/lib/python3.6/site-packages/h5py/_hl/group.py", line 288, in getitem
oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: "Unable to open object (object 'UniqueGlobalKey' doesn't exist)"

Do you have any clue about how to fix this and make the script runs normally?

Thank you in advance.

Flanking region analysis plot

In the Flanking region analysis plot, what does the y-axis represent and how is it calculated?

stuck at "INFO:list for subsample is not initialized. Initializing now" for about a day

info:
SRR10520596.fastq is ~3 GB in size

command:
python longQC.py sampleqc -x pb-rs2 -o /data/out_dir /data/SRR10520596.fastq

logs:
longQC:2020-12-30 16:26:53,142:169:INFO:Cmd: longQC.py sampleqc -x pb-rs2 -o /data/out_dir /data/SRR10520596.fastq
longQC:2020-12-30 16:26:53,143:233:INFO:Preset "pb-rs2" was applied. Options --pb(--ont) is overwritten.
longQC:2020-12-30 16:26:58,169:306:INFO:Computation of the low complexity region started for a chunk 0
lq_mask:2020-12-30 16:27:13,623:111:INFO:New job was submitted: in->/data/out_dir/analysis/tmp_0.fastq, out->/data/out_dir/analysis/tmp_0.txt
longQC:2020-12-30 16:27:13,630:311:INFO:Adapter search is starting for a chunk 0.
longQC:2020-12-30 16:27:13,635:327:INFO:Computation of the GC fraction started for a chunk 0
lq_utils:2020-12-30 16:27:21,706:380:INFO:list for subsample is not initialized. Initializing now.

It freezed here for a day that I had to force quit.

Number of processes must be at least 1

Hi Yoshinori,
Finally, I tried to use LongQC with a real data, but I got this error:
The command line
MYPATH/conda_env/circlator/bin/python MYPATH/software/programs/LongQC/longQC.py sampleqc -x pb-rs2 -o longqc/ MYPATH/CompGenomics_Species/raw_reads/PacBio/TF100210M3/TF100210M3_1.fq -p 1
The error output:
Traceback (most recent call last): File "MYPATH/software/programs/LongQC/longQC.py", line 919, in <module> main(args)
File "MYPATH/software/programs/LongQC/longQC.py", line 65, in main args.handler(args)
File "MYPATH/software/programs/LongQC/longQC.py", line 268, in command_sample lm = LqMask(os.path.join(path_minimap2, "sdust"), args.out, suffix=suffix, max_n_proc=10 if ncpu > 10 else ncpu)
File "MYPATH/software/programs/LongQC/lq_mask.py", line 41, in __init__ self.pool = mp.Pool(self.n_proc)
File "MYPATH/conda_env/circlator/lib/python3.6/multiprocessing/context.py", line 119, in Pool context=self.get_context())
File "MYPATH/conda_env/circlator/lib/python3.6/multiprocessing/pool.py", line 168, in __init__ raise ValueError("Number of processes must be at least 1")
ValueError: Number of processes must be at least 1

Do you know how could I fix it?

Thanks,

Peris

pandas.errors.EmptyDataError: No columns to parse from file

After the command:
python longQC.py sampleqc -x pb-rs2 -o /data/out_dir /data/SRR10520596.fastq

The error:

Questions about Index Size and Short Mode

Hi @yfukasawa,
In the first place, thank you for developing LongQC.
I am currently testing the tool to understand all the parameters better and choose their optimal configuration. However, I have several questions about the Index Size and the Short Mode since my test results seem unclear.
I have used two public datasets for my tests: flnc.bam (PacBio, Transcriptomic, ~4 Gb) and pb.bam (Pacbio, Genomic, ~12 Gb).
These are the results of my tests:

Test 1 - flnc.bam

Command (Only modifying the index size on each iteration):

longQC.py sampleqc -o /tmp/results -x pb-hifi -n 10000 -p 8 -m 2 -i 1G -t /data/input/flnc.bam

Results

Metrics table

CPUs and Memory use over time

Index Size = 1G

Index Size = 8G

Test 2 - pb.bam

Command (This time I have modified both index size and short mode):

longQC.py sampleqc -o /tmp/results -x pb-sequel -n 10000 -p 8 -m 2 -i 1G -b /data/input/pb.bam

Results

Metrics table

CPUs and Memory use over time

Index Size = 1G

Index Size = 8G

Conclusions

There is a significant variation of results across index sizes.
However, non-sense reads fraction is quite similar between different runs.
I have not found a linear correlation between index size and the execution time.
Enabling the Short Mode tends to reduce the non-sense read fraction slightly.

According to the results, my questions are the next:

Since bigger index sizes are much more costly in terms of memory and time, what are the advantages of selecting a big index size (e.g., 8G) compared with a smaller one (e.g., 1G)?
In your opinion, what Index size value should provide the most accurate results?
In which cases is it advisable to activate the short mode?

Thanks!,
Adolfo

fastq: file does not exist

Hi I am running longQC with docker
sudo docker run -it longqc sampleqc -x pb-sequel -p $(nproc) -o /home/haley/Desktop/LongQC/testPlato/H10_60411L2/barcode05_60411L2/longqctest_fastq1 /home/haley/Desktop/LongQC/testPlato/H10_60411L2/barcode05_60411L2/EC-D-6041-1L2-D-CuP-CeP_LR_1.fastq

I tried with one fastq file but plan to loop the command over many.
I got this error: Error: input file /home/haley/Desktop/LongQC/testPlato/H10_60411L2/barcode05_60411L2/EC-D-6041-1L2-D-CuP-CeP_LR_1.fastq does not exist.

Does anyone know why longqc will not recognize the files?

Thanks!

pandas.errors.EmptyDataError: No columns to parse from file

Hi Yoshinori

I am getting the following error when running:

python longQC.py sampleqc -x pb-hifi -o longqc ccs.bam

I made sure that ccs.bam is a 16GB file

Thank you for the help

longQC:2021-05-07 09:18:02,824:592:INFO:Filteration finished.
longQC:2021-05-07 09:18:12,834:598:INFO:Generating coverage related plots...
Traceback (most recent call last):
File "/LongQC-1.2.0b/longQC.py", line 956, in
main(args)
File "/LongQC-1.2.0b/longQC.py", line 62, in main
args.handler(args)
File "/LongQC-1.2.0b/longQC.py", line 602, in command_sample
lc = LqCoverage(cov_path, isTranscript=args.transcript, control_filtering=pb_control)
File "/LongQC-1.2.0b/lq_coverage.py", line 88, in init
self.df = pd.read_table(table_path, sep='\t', header=None)
File "/opt/conda/lib/python3.9/site-packages/pandas/io/parsers.py", line 689, in read_table
return _read(filepath_or_buffer, kwds)
File "/opt/conda/lib/python3.9/site-packages/pandas/io/parsers.py", line 462, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/opt/conda/lib/python3.9/site-packages/pandas/io/parsers.py", line 819, in init
self._engine = self._make_engine(self.engine)
File "/opt/conda/lib/python3.9/site-packages/pandas/io/parsers.py", line 1050, in _make_engine
return mapping[engine](self.f, **self.options) # type: ignore[call-arg]
File "/opt/conda/lib/python3.9/site-packages/pandas/io/parsers.py", line 1898, in init
self._reader = parsers.TextReader(self.handles.handle, **kwds)
File "pandas/_libs/parsers.pyx", line 521, in pandas._libs.parsers.TextReader.cinit
pandas.errors.EmptyDataError: No columns to parse from file

struct.error: 'i' format requires

Hi Yoshinori,

Thank you for this tool.

After processing some reads I get an error of format. Previously you solved a problem related to UTF-8 format. You guessed it was related to the new PacBio sequel codification format. Finally, I could start to run the algorithm of longQC.py "sampleqc" and get the following error, I do not if it is related to the same problem. Any help will be appreciated. Thank you,

python /opt/LongQC/longQC.py sampleqc -x pb-sequel -o ./PacBio_QC/ chk_PacBio_/m54336U_201216_191137.subreads.bam -p 15 -m 2 -i 128 longQC:2021-01-25 16:29:47,375:169:INFO:Cmd: /opt/LongQC/longQC.py sampleqc -x pb-sequel -o ./PacBio_QC/ chk_PacBio_/m54336U_201216_191137.subreads.bam -p 15 -m 2 -i 128 longQC:2021-01-25 16:29:47,376:233:INFO:Preset "pb-sequel" was applied. Options --pb(--ont) is overwritten. lq_utils:2021-01-25 16:29:47,383:127:DEBUG:chk_PacBio_/m54336U_201216_191137.subreads.bam is a compressed BAM. longQC:2021-01-25 16:29:47,383:238:INFO:Temporary work file was made at ./PacBio_QC/analysis/pbbam_converted_seq_file.fastq longQC:2021-01-25 16:31:03,240:306:INFO:Computation of the low complexity region started for a chunk 0 lq_mask:2021-01-25 16:31:17,932:111:INFO:New job was submitted: in->./PacBio_QC/analysis/tmp_0.fastq, out->./PacBio_QC/analysis/tmp_0.txt longQC:2021-01-25 16:31:17,933:311:INFO:Adapter search is starting for a chunk 0. longQC:2021-01-25 16:31:17,933:327:INFO:Computation of the GC fraction started for a chunk 0 lq_adapt:2021-01-25 16:32:09,463:77:INFO:2330 reads were skipped due to their short lengths. lq_adapt:2021-01-25 16:32:09,519:90:INFO:Adapter Sequence: ATCTCTCTCAACAACAACAACGGAGGAGGAGGAAAAGAGAGAGAT, max identity:0.977778 and the number of trimmed reads: 67 lq_utils:2021-01-25 16:32:17,424:380:INFO:list for subsample is not initialized. Initializing now. lq_adapt:2021-01-25 16:32:19,634:42:INFO:2333 reads were skipped due to their short lengths. lq_adapt:2021-01-25 16:32:19,635:92:INFO:Adapter Sequence: ATCTCTCTCAACAACAACAACGGAGGAGGAGGAAAAGAGAGAGAT, max identity:0.956522 and the number of trimmed reads: 64 longQC:2021-01-25 16:33:17,551:332:INFO:Adapter search has done for a chunk 0. longQC:2021-01-25 16:33:17,551:342:INFO:subsample finished for chunk 0. longQC:2021-01-25 16:34:31,081:306:INFO:Computation of the low complexity region started for a chunk 1 lq_mask:2021-01-25 16:34:46,149:111:INFO:New job was submitted: in->./PacBio_QC/analysis/tmp_1.fastq, out->./PacBio_QC/analysis/tmp_1.txt longQC:2021-01-25 16:34:46,149:311:INFO:Adapter search is starting for a chunk 1. longQC:2021-01-25 16:34:46,150:327:INFO:Computation of the GC fraction started for a chunk 1 lq_adapt:2021-01-25 16:35:46,389:77:INFO:1358 reads were skipped due to their short lengths. lq_adapt:2021-01-25 16:35:46,389:90:INFO:Adapter Sequence: ATCTCTCTCAACAACAACAACGGAGGAGGAGGAAAAGAGAGAGAT, max identity:0.957447 and the number of trimmed reads: 63 lq_adapt:2021-01-25 16:35:55,091:42:INFO:1363 reads were skipped due to their short lengths. lq_adapt:2021-01-25 16:35:55,091:92:INFO:Adapter Sequence: ATCTCTCTCAACAACAACAACGGAGGAGGAGGAAAAGAGAGAGAT, max identity:0.957447 and the number of trimmed reads: 62 longQC:2021-01-25 16:36:40,406:332:INFO:Adapter search has done for a chunk 1. Traceback (most recent call last): File "/opt/LongQC/longQC.py", line 956, in <module> main(args) File "/opt/LongQC/longQC.py", line 62, in main args.handler(args) File "/opt/LongQC/longQC.py", line 341, in command_sample s_reads = pool_res['subsample'].get() File "/home/mldbotero/anaconda3/lib/python3.7/multiprocessing/pool.py", line 657, in get raise self._value File "/home/mldbotero/anaconda3/lib/python3.7/multiprocessing/pool.py", line 431, in _handle_tasks put(task) File "/home/mldbotero/anaconda3/lib/python3.7/multiprocessing/connection.py", line 206, in send self._send_bytes(_ForkingPickler.dumps(obj)) File "/home/mldbotero/anaconda3/lib/python3.7/multiprocessing/connection.py", line 393, in _send_bytes header = struct.pack("!i", n)

AttributeError: module 'pysam' has no attribute 'FastxFile'

Hello,

I tried to execute LongQC with the following command:

python LongQC/longQC.py sampleqc -o longqc -x ont-rapid -p 4 test/test.fasta

However, I got the following error:

longQC:2020-08-04 11:17:56,650:165:INFO:Cmd: LongQC/longQC.py sampleqc -o longqc -x ont-rapid -p 4 test/test.fasta
longQC:2020-08-04 11:17:56,650:215:INFO:Preset "ont-rapid" was applied. Options --pb(--ont) is overwritten.
Traceback (most recent call last):
  File "LongQC/longQC.py", line 932, in <module>
    main(args)
  File "LongQC/longQC.py", line 62, in main
    args.handler(args)
  File "LongQC/longQC.py", line 281, in command_sample
    for (reads, n_seqs, n_bases) in open_seq_chunk(args.input, file_format_code, chunk_size=args.mem*1024**3, is_upper=True):
  File "/home/jovyan/LongQC/lq_utils.py", line 65, in open_seq_chunk
    yield from parse_fastx_chunk(fn, chunk_size, is_upper=is_upper)
  File "/home/jovyan/LongQC/lq_utils.py", line 268, in parse_fastx_chunk
    with pysam.FastxFile(fn) as f:
AttributeError: module 'pysam' has no attribute 'FastxFile'

Can you please help with that?

Thank you in advance!

yfukasawa / longqc Goto Github PK

longqc's Issues

Can you introduce the meaning for each column in longqc_sdust.txt ?

The longqc_sdust.txt is one of output files for LongQC.

For example with the pacbio sequel bam as input,

Thank you.

With manual installation:

With DockerFile

Test 1 - flnc.bam

Command (Only modifying the index size on each iteration):

Results

Metrics table

CPUs and Memory use over time

Index Size = 1G

Index Size = 8G

Test 2 - pb.bam

Command (This time I have modified both index size and short mode):

Results

Metrics table

CPUs and Memory use over time

Index Size = 1G

Index Size = 8G

Conclusions

Recommend Projects

Recommend Topics

Recommend Org