The remixt from amcpherson

How to specify hg38 in create_ref_data process?

I am going to analyze BAM mapped using the hg38 reference. But how could I "create_ref_data" using hg38?
Thank you.

Default pipeline create_ref_data -> mappability_bwa does not work properly

Hi,

The following command

remixt create_ref_data $ref_data_dir

works and ends properly. However, the subsequent command

remixt mappability_bwa $ref_data_dir

fails because bwa does not find the index of the reference genome.

If the default behavior is for generating a mappability file based on bwa alignments, would be possible to add the proper bwa indexing as an additional step in remixt create_ref_data?

Migration to OpenSSL 1.1.1 in bioconda

Hi,

With the most updated version of conda and bioconda, ReMixT pipeline fails when running samtools with the following error:

samtools: error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory

This is due to the recent transition to OpenSSL 1.1.1 in bioconda. To solve this issue, samtools must be updated at least to the following version:

conda install samtools=1.9=*_11

Would be possible to add this update version of samtools to the conda distribution of remixt?

AttributeError: 'DataFrame' object has no attribute 'sort'

Hi @amcpherson ,

When trying the the example code for ReMixT, we encountered this following error:

2018-05-17 12:29:25,210 - pypeliner.scheduler - INFO - job /remixt_seqdata_workflow/create_segments executing 2018-05-17 12:29:25,219 - pypeliner.scheduler - INFO - job /remixt_seqdata_workflow/create_segments -> remixt.analysis.segment.create_segments('/gpfs/commons/groups/imielinski_lab/git/mskilab/flows/modules/remixt/testing/raw_data/segments.tsv.tmp', {'ensembl_assemblies': ['chromosome.15'], 'chromosomes': ['15']}, '/gpfs/commons/groups/imielinski_lab/git/mskilab\ /flows/modules/remixt/testing/ref_data', breakpoint_filename='/gpfs/commons/groups/imielinski_lab/git/mskilab/flows/modules/remixt/testing/HCC1395_breakpoints.tsv') 2018-05-17 12:29:36,620 - pypeliner.scheduler - ERROR - job /remixt_seqdata_workflow/create_segments failed to complete --- stdout --- --- stderr --- Traceback (most recent call last): File "/gpfs/commons/home/mimielinski/software/anaconda2/lib/python2.7/site-packages/pypeliner/jobs.py", line 286, in __call__ self.ret_value = self.func(*self.callset.args, **self.callset.kwargs) File "/gpfs/commons/home/mimielinski/software/anaconda2/lib/python2.7/site-packages/remixt/analysis/segment.py", line 71, in create_segments changepoints.sort(['chromosome', 'position'], inplace=True) File "/gpfs/commons/home/mimielinski/software/anaconda2/lib/python2.7/site-packages/pandas/core/generic.py", line 3614, in __getattr__ return object.__getattribute__(self, name) AttributeError: 'DataFrame' object has no attribute 'sort'

I suspect this is related to pandas version, we are using pandas-0.22.0.

Please let me know how we can resolve this. Thanks!

error"Exception: No submit queue specified"

Hello，

I met one problem when I ran remix, the command I executed is that
"
remixt run ref_data_dir result_remix breakpoit/bp.hg19.txt --normal_sample_id oec --normal_bam_file bam/OEC130618.rlg.bam --tumour_sample_ids lm130227 --tumour_bam_files bam/LM130227.rlg.bam --results_files lm130227.hd --tmpdir temp
",

and the error is that:
"
/home/yuzh/miniconda2/lib/python2.7/site-packages/statsmodels/compat/pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.
from pandas.core import datetools
Traceback (most recent call last):
File "/home/yuzh/miniconda2/bin/remixt", line 11, in
load_entry_point('remixt==0.5.5', 'console_scripts', 'remixt')()
File "/home/yuzh/miniconda2/lib/python2.7/site-packages/remixt/ui/main.py", line 24, in main
func(**args)
File "/home/yuzh/miniconda2/lib/python2.7/site-packages/remixt/ui/run.py", line 32, in run
pyp = pypeliner.app.Pypeline([remixt], pypeliner_config)
File "/home/yuzh/miniconda2/lib/python2.7/site-packages/pypeliner/app.py", line 195, in init
config_filename=self.config['submit_config'])
File "/home/yuzh/miniconda2/lib/python2.7/site-packages/pypeliner/execqueue/factory.py", line 6, in create
raise Exception('No submit queue specified')
Exception: No submit queue specified
“

By the way, the reference genome I used is hg19 instead of GRCh37, I was wondering whether it will have an effect on the result.

I have no idea about how to solve this problem. I will appreciate it very much if you could help me.

Thank you!

BAM files with chr notation

Dear authors,

I would like to test this very interesting method! I have a collection of BAM files corresponding to multiple samples from the same patient and a corresponding matched-normal sample. Unfortunately, these BAM files have the chromosomes specified with the chr notation (e.g. chr1, chr2, chr3, ..., chr22) which seem different from the assumptions in the default values of the config file.

As such, I would like to know whether there is a simple and easy way to run Remixt without changing the BAM files, is it sufficient to provide the names chromosomes: ['chr1, ...'] in the config file? Is there anything else that needs to be changed? For example the name of the corresponding 1000G files, where the files are specified as _chr{chromosome} and should probably become _{chromosome}?

Thank you

Problem downloading reference data

I have installed remixt and started downloading the reference data, but that process failed with message

   --2023-10-11 06:02:33--  http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20220422_3202_phased_SNV_INDEL_SV/1kGP_high_coverage_Illumina.chrX.filtered.SNV_INDEL_SV_phased_panel.vcf.gz
   Resolving ftp.1000genomes.ebi.ac.uk (ftp.1000genomes.ebi.ac.uk)... 193.62.193.167
   Connecting to ftp.1000genomes.ebi.ac.uk (ftp.1000genomes.ebi.ac.uk)|193.62.193.167|:80... connected.
   HTTP request sent, awaiting response... 404 Not Found
   2023-10-11 06:05:37 ERROR 404: Not Found.

Using a web browser, I visited site http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20220422_3202_phased_SNV_INDEL_SV and saw that file

1kGP_high_coverage_Illumina.chrX.filtered.SNV_INDEL_SV_phased_panel.vcf.gz

is not there; instead, I see a similarly-named file

1kGP_high_coverage_Illumina.chrX.filtered.SNV_INDEL_SV_phased_panel.v2.vcf.gz

How should I address this situation?

Can I restart the download process from the point where it halted without having to download the files that I have already?

Regards,
Eric Sisson

Fail when X server is not running

Hi,

When running on a server machine without an active X server, the pipeline of ReMixT fails as it is unable to save some .pdf files.
Would be possible to provide an optional argument to switch matplotlib and similar libraries to use Agg instead?

Temporary solution is to manually set the corresponding environmental variable.

amcpherson / remixt Goto Github PK

remixt's People

Contributors

Stargazers

Watchers

Forkers

remixt's Issues

How to specify hg38 in create_ref_data process?

Default pipeline create_ref_data -> mappability_bwa does not work properly

Migration to OpenSSL 1.1.1 in bioconda

AttributeError: 'DataFrame' object has no attribute 'sort'

error"Exception: No submit queue specified"

BAM files with chr notation

Problem downloading reference data

Fail when X server is not running

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent