h3abionet / h3agwas Goto Github PK
View Code? Open in Web Editor NEWGWAS Pipeline for H3Africa
License: Other
GWAS Pipeline for H3Africa
License: Other
Hello,
Thank you for the great effort putting different aspects of GWAS analysis into handy pipelines. I'm trying out the plink-qc
pipeline with the slurmSingularity profile. The core bioinformatics processes work as one would hope, but the pipeline fails at all the steps of generating plots. Below is an error report from such a run that fails on the generateSnpMissingnessPlot
process:
Command error:
Traceback (most recent call last):
File ".command.sh", line 44, in <module>
plt.savefig(args.output)
File "/home/p287664/.local/lib/python3.6/site-packages/matplotlib/pyplot.py", line 716, in savefig
res = fig.savefig(*args, **kwargs)
File "/home/p287664/.local/lib/python3.6/site-packages/matplotlib/figure.py", line 2180, in savefig
self.canvas.print_figure(fname, **kwargs)
File "/home/p287664/.local/lib/python3.6/site-packages/matplotlib/backend_bases.py", line 2014, in print_figure
canvas = self._get_output_canvas(format)
File "/home/p287664/.local/lib/python3.6/site-packages/matplotlib/backend_bases.py", line 1950, in _get_output_canvas
canvas_class = get_registered_canvas_class(fmt)
File "/home/p287664/.local/lib/python3.6/site-packages/matplotlib/backend_bases.py", line 125, in get_registered_canvas_class
backend_class = importlib.import_module(backend_class).FigureCanvas
File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 994, in _gcd_import
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 678, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/home/p287664/.local/lib/python3.6/site-packages/matplotlib/backends/backend_pdf.py", line 42, in <module>
from matplotlib import _png
ImportError: Error relocating /home/p287664/.local/lib/python3.6/site-packages/matplotlib/.libs/libz-a147dcb0.so.1.2.3: __fprintf_chk: symbol not found
For completion, for now, I'm using the sample data in this repo; with nextflow version 19.04.1
And I'm attaching the complete
nextflow.log too.
Thank you,
Azza
Hi,
Sorry for making another issue π
When I try running the association pipeline with Gemma using the following command:
nextflow run assoc.nf --input_dir sample/ --input_pat sampleA --data sample/sample.phe --pheno PHE --covariates SEX --gemma 1 --gemma_num_cores 4 -profile docker
I get the following error:
ERROR ~ Error executing process > 'getGemmaRel (1)'
Caused by:
Process `getGemmaRel (1)` terminated with an error exit status (127)
Command executed:
export OPENBLAS_NUM_THREADS=4
cat sampleA.fam |awk '{print $1" "$2" "0.2}' > pheno
gemma -bfile sampleA -gk 1 -o sampleA -p pheno -n 3
Command exit status:
127
Command output:
(empty)
Command error:
.command.sh: line 4: gemma: command not found
I think this is because the getGemmaRel
process is using the default docker container (quay.io/h3abionet_org/py3plink
) rather than the gemmaImage
& so the gemma
command cannot be found.
However, if I try changing the container to the gemmaImage
I get a segmentation fault:
Caused by:
Process `getGemmaRel (1)` terminated with an error exit status (139)
Command executed:
export OPENBLAS_NUM_THREADS=4
cat sampleA.fam |awk '{print $1" "$2" "0.2}' > pheno
gemma -bfile sampleA -gk 1 -o sampleA -p pheno -n 3
Command exit status:
139
Command output:
Reading Files ...
Command error:
.command.sh: line 4: 34 Segmentation fault gemma -bfile sampleA -gk 1 -o sampleA -p pheno -n 3
Any ideas what's causing the issue? Any help would be much appreciated.
Many thanks in advance,
Phil
HI,
I am trying to run of the association analysis on sample files using plink-assoc.nf. The analysis completed successfully. However, in the subsequent report generated, the PCA plot (of snps) seems to be missing.
for eg; nextflow run assoc.nf --input_pat raw-GWA-data --chi2 1 --logistic 1 --adjust 1
Can you please let me know how can we include the PCA plot in the report. ?
Although, the 'print_pca' option is set to 1, the PCA plot is generated seperately. But its not included in the final analysis report.
pairA
and pairB
for curr
may require str formatting to work.
h3agwas/templates/batchReport.py
Lines 339 to 340 in 525272b
suggesting replace with: str(pairA)
& str(pairB)
sudo ./nextflow run h3abionet/h3agwas/assoc/main.nf
Both docker and singularity installed and nextflow also
Error on running code:
N E X T F L O W ~ version 22.10.0
Launching https://github.com/h3abionet/h3agwas
[grave_stonebraker] DSL1 - revision: c8a7881 [master]
Unknown parameter : Check parameter <shared-storage-mount=/mnt/shared>
No phenotype given -- set params.pheno
I tried to use a mask file to exclude some individuals in the call2plink workflow. But the removal seemed not work and there were several other issues when using a mask file.
I gave a one-column mask file as follows to exclude these individuals. mask_type=sample-label.
G4968
G4969
G4970
G4971
G4972
sheet2fam.py reported an error at line 130. because the mask file doesn't have a second column.
indivs[(data[0],data[1])]=data[-1]
so modify the code to:
indivs[(data[0],data[0])]=data[-1]
the plink log file shows that a --remove
command was added to the process, but no individuals were removed. no idea about this bug.
PLINK v1.90b7 64-bit (16 Jan 2023)
Options in effect:
--a2-allele emptyZ0ref.txt
--bed raw.bed
--bim raw.bim
--fam raw.fam
--flip flips.lst
--make-bed
--out plink_out
--remove mask.inds
4819 people (0 males, 0 females, 4819 ambiguous) loaded from .fam.
--remove: 4819 people remaining.
the fixfam process marked mask individuals with 'MSK', but why they are not removed?
G4969_MSK__ G4969 0 0 1 0
G4970_MSK__ G4970 0 0 1 0
G4971_MSK__ G4971 0 0 2 0
G4972_MSK__ G4972 0 0 2 0
When using AWS it is possible to launch the ami-9ca7b2fa AMI, but any attempts to copy produce the error message: "You do not have permission to access the storage of this ami".
I suspect that although the AMI itself is public, the underlying snapshot is not public, preventing copying and preventing the ability to copy the snapshot to: 1) a different region and, 2) an encrypted snapshot in order to enable root volume encryption.
Could this be enabled, or is there a reason that the underlying storage should not be publicly accessible?
We have some Groovy code that checks e.g. that the phenotype file is good. This requires that the data is local -- but some users want it to be in S3
I'm running the pipeline on Ubuntu 14.04 machine, with python 3.4 (locally installed dependencies). The behavior with python seems odd though. Kindly see below:
Line 92 in templates/drawPCA.py
produces the error
Command error:
File ".command.sh", line 91
draw(*evecs,labels,the_colours)
^
SyntaxError: only named arguments may follow *expression
A dirty fix is to change that line such it reads:
draw(*(evecs+(labels,the_colours)))
With that fix, the pipeline continues until a similar problem occurs from the script batchReport.py
line 503.
Command error:
output = TAB.join(map(str, [*i,bfrm.loc[i][args.batch_col],"%5.3f"%row['F_MISS'],pfrm.loc[i]]))+ TAB+TAB.join(map(xstr,sxAnalysis.loc[i]))+EOL
^
SyntaxError: can use starred expression only as assignment target
Reading around, the advice is to solve this problem by using python 3.5, but in the later case the error reads:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/pandas/core/indexes/base.py", line 3078, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'all'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/users_current/amelbadawi/training_data/h3agwas/work/36/ebc4dbc425b2eb3db495983cda14a5/.command.sh", line 588, in <module>
text = text + detailedSexAnalysis(pfrm,ifrm,args.sx_pickle,args.pheno_col,bfrm)
File "/home/users_current/amelbadawi/training_data/h3agwas/work/36/ebc4dbc425b2eb3db495983cda14a5/.command.sh", line 517, in detailedSexAnalysis
dumpMissingSexTable(sex_fname,ifrm,sxAnalysis,pfrm[pheno_col],bfrm)
File "/usr/local/lib/python3.5/dist-packages/pandas/core/frame.py", line 2688, in __getitem__
return self._getitem_column(key)
File "/usr/local/lib/python3.5/dist-packages/pandas/core/frame.py", line 2695, in _getitem_column
return self._get_item_cache(key)
File "/usr/local/lib/python3.5/dist-packages/pandas/core/generic.py", line 2489, in _get_item_cache
values = self._data.get(item)
File "/usr/local/lib/python3.5/dist-packages/pandas/core/internals.py", line 4115, in get
loc = self.items.get_loc(item)
File "/usr/local/lib/python3.5/dist-packages/pandas/core/indexes/base.py", line 3080, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'all'
Would you kindly advice?
Thank you,
Alyaa
The name aux
is reserved in Windows: https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file#naming-conventions
Therefore, it isn't possible to clone the repository to a Windows system.
Not that anyone is going to use this workflow on Windows, but this also prevents me from making edits on my local machine.
Hi,
I would like to perform fine mapping analysis on a Coronary Artery Disease (CAD) dataset (GWAS file available here : https://data.mendeley.com/datasets/gbbsrpx6bs/1). Can I use your pipeline from this data ? Do I need to use plink format ? Can you give me some clues about the command line to use?
Thanks four yor help
Hi,
If I try running the following command:
nextflow run assoc.nf --input_dir sample --input_pat sampleA --data sample/sample.phe --pheno PHE --covariates SEX -profile docker
I get the following error:
ERROR ~ Error executing process > 'doReport'
Caused by:
Missing output file(s) `out-report.pdf` expected by process `doReport`
Command executed [/h3agwas/templates/make_assoc_report.py]:
..........
Command exit status:
0
Command output:
(empty)
Command error:
sh: pdflatex: not found
chown: unrecognized option: from
BusyBox v1.27.2 (2017-12-12 10:41:50 GMT) multi-call binary.
Usage: chown [-RhLHPcvf]... USER[:[GRP]] FILE...
Change the owner and/or group of each FILE to USER and/or GRP
-R Recurse
-h Affect symlinks instead of symlink targets
-L Traverse all symlinks to directories
-H Traverse symlinks on command line only
-P Don't traverse symlinks (default)
-c List changed files
-v List all files
-f Hide errors
The following files are in the work dir:
cleaned-pca.pdf input.1 out-report.tex
So it looks like the doReport
process failed because the output pdf could not be made (or rather converted from the latex report). So it looks like the problem is with the pdflatex
command in the make_assoc_report.py
script, which I guess then calls chown
incorrectly?
Perhaps the problem is that pdflatex
is not installed in the quay.io/h3abionet_org/py3plink
docker container?
I think this is the case because if I try entering the docker container interactively & typing pdflatex
I get bash: pdflatex: command not found
Do you think this might be the case? Or could something else be causing the issue?
Thanks again & sorry for keep on making issues
The documentation on the parameters needed for running call2plink is excellent. What would make it even better is if we had some "fake" data to give users an idea of the structure of the inputs they need to provide.
If there are no non-missing SNPs, the reports are odd because there are extensive graphs etc talking about this. This needs cleaning up.
I am not sure if this the right question here, but I was wondering if there is a specific Illumina cluster file for the H3Africa chip.
I found the manifest (.bpm and .csv) files. I also found the human reference suitable for the H3Africa chip from http://www.bioinf.wits.ac.za/data/h3agwas/
However, I cannot seem to find any cluster file suitable for the H3Africa chip.
Thanks in advance for assistance.
Esoh
Hi there,
I'm trying to run this pipeline but I'm getting some errors:
./nextflow run qc.nf -profile docker
Error executing process > 'findRelatedIndiv (1)'
Caused by:
Process findRelatedIndiv (1)
terminated with an error exit status (1)
Command executed [/netapp/raony/gwas/src/h3agwas/templates/removeRelInds.py]:
Command error:
WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2890, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 128, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index_class_helper.pxi", line 91, in pandas._libs.index.Int64Engine._check_type
KeyError: '1'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "pandas/_libs/index.pyx", line 672, in pandas._libs.index.BaseMultiIndexCodesEngine.get_loc
File "/usr/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2892, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 128, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index_class_helper.pxi", line 91, in pandas._libs.index.Int64Engine._check_type
KeyError: '1'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/pandas/core/indexes/multi.py", line 2815, in get_loc_level
return (self._engine.get_loc(key), None)
File "pandas/_libs/index.pyx", line 675, in pandas._libs.index.BaseMultiIndexCodesEngine.get_loc
KeyError: ('1', '1')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File ".command.sh", line 46, in
rel, deg = getDegrees(remove)
File ".command.sh", line 33, in getDegrees
deg[x]=deg[x]+1
File "/usr/lib/python3.6/site-packages/pandas/core/series.py", line 1106, in getitem
return self._get_with(key)
File "/usr/lib/python3.6/site-packages/pandas/core/series.py", line 1120, in _get_with
return self._get_values_tuple(key)
File "/usr/lib/python3.6/site-packages/pandas/core/series.py", line 1168, in _get_values_tuple
indexer, new_index = self.index.get_loc_level(key)
File "/usr/lib/python3.6/site-packages/pandas/core/indexes/multi.py", line 2817, in get_loc_level
raise KeyError(key) from e
KeyError: ('1', '1')
Any ideas what could be wrong?
Hi,
Is there any way to use the plink-assoc.nf
(or rather assoc.nf
) pipeline with continuous phenotypic data or phenotypic data with more factors than just case and control?
As I understand it currently the pipeline accepts a phenotype file eg sample.phe which looks like the following:
FID IID PAT MAT SEX PHE
1 1 0 0 1 2
2 2 0 0 1 1
3 3 0 0 1 2
4 4 0 0 1 2
5 5 0 0 2 2
Here the phenotype PHE
is encoded as 1
= control & 2
= case.
Am I able to use a phenotype with more than two values eg 1
, 2
, 3
, 4
... etc.
Or can I use a phenotype with continuous values eg 1.23
, 12.32
, 7.32
.. etc.
If it's not currently possible, do you have any idea how either this pipeline could be adapted or any other tools which may allow this to be done?
Many thanks in advance.
Any help would be much appreciated
Hi,
I am trying to run the qc.nf
script. It works with the sample
data, however, when I try & run it with my own data it fails during the getDuplicateMarkers
process (see nextflow.log)
I think the problem comes from this line in dups.py
as the error message is (edited for brevity):
Caused by:
Process `getDuplicateMarkers (1)` terminated with an error exit status (1)
..........
Command error:
Traceback (most recent call last):
File ".command.sh", line 71, in <module>
removeOnBP(sys.argv[1],out)
File ".command.sh", line 43, in removeOnBP
elif (chrom, bp) > (old_chrom, old_bp):
TypeError: '>' not supported between instances of 'str' and 'int'
I have tried printing the value of the chrs that cause a problem (see print_chr_nextflow.log) & it seems to be due to the chromosome being chr1
(it also fails with chr X
). The problem occurs because this is not an int
and python cannot make the comparison.
Any ideas on the best way to fix this?
(It's also worth noting that I am running my fork of the pipeline which I have made changes to but I think this is also a problem for this pipeline)
Many thanks in advance,
Phil
If sex information isn't available, workflow crases
Thie underlying issue is that Batch demands that file channels must contain files. You can't output a value onto a channel and interpret it as a file. We do this in a few places to deal with optional input
Having just used the information in the Wiki to get things set up last night, I suggest converting some, if not all, of the information in the Wiki to one or more doc files within a dedicated directory in the repo. IMO, doing this would have several benefits, including:
I would be willing to create a first draft of this over the next week or so if you're interested and barring any unforeseen demands on my time from my employer.
Having just used the information in the Wiki last night for installation, I found what I would consider installation information was actually spread out over several different sections of the Wiki. This meant that instead of going to one section and finding what I needed, I read the installation section, didn't find what I needed there, and would have given up if I didn't have such a strong desire to use the project that I read the entire Wiki over several times. I understand projects of any size usually have a hard time getting people to write documentation, since most people prefer to just write code, and I also understand that small projects also have a huge challenge finding the resources to create and maintain good documentation. If you would like, I would be willing to at least see what I could do to improve and consolidate the documentation regarding installation and setup.
Hello, I'm trying to run plink-qc.nf on my Linux HPC cluster with:
nextflow run -c my_nf.config /Users/mchiment/.nextflow/assets/h3abionet/h3agwas/plink-qc.nf -profile sgeSingularity --samplesheet "0" --idatpat ""
Where "sgeSingularity" is:
sgeSingularity {
process.executor = "sge"
singularity.autoMounts = true
singularity.enabled = true
process.queue = queue
}
I keep getting errors like this:
N E X T F L O W ~ version 19.01.0
Launching /Users/mchiment/.nextflow/assets/h3abionet/h3agwas/plink-qc.nf
[agitated_lavoisier] - revision: 418f49aacb
Check idatpat=true ************** is it a valid parameter -- are you using one rather than two - signs or vice-versa
ERROR ~ Argument of file
function cannot be empty
-- Check script 'plink-qc.nf' at line: 237 or see '.nextflow.log' file for more details
Line 237:
sample_sheet_ch = file(params.samplesheet)
Why is it checking for a samplesheet when the directions say we only need a BIM/BED/FAM file as input for this pipeline?
Similarly, if the pipeline also requires a "sample.phe" file, you should put that in the instructions. I had to create this by hand.
Need to improve the way in which we handle (a) problems in the samplesheet and (b) duplicates (e.g. for qc)
While running
nextflow run qc -profile docker I am having the following problem
I am using a Mac terminal
raise AttributeError("module {!r} has no attribute "
AttributeError: module 'numpy' has no attribute 'float'
N E X T F L O W ~ version 22.10.6
Launching qc/main.nf
[cranky_brahmagupta] DSL1 - revision: 7d401873bd
Downloading plugin [email protected]
The batch file is 0
Sexinfo available command
WARN: The echo
directive has been deprecated - use to debug
instead
0
executor > local (13)
[3e/4a0c85] process > inMD5 (1) [100%] 1 of 1 β
[2d/169242] process > noSampleSheet [100%] 1 of 1 β
[41/1da057] process > getDuplicateMarkers (1) [100%] 1 of 1 β
[42/26b684] process > removeDuplicateSNPs (1) [100%] 1 of 1 β
[29/8db8f9] process > getX (1) [100%] 1 of 1 β
[67/c71474] process > analyseX (1) [ 0%] 0 of 1
[1f/38fd77] process > identifyIndivDiscSexinfo (1) [100%] 1 of 1 β
[f6/0abe06] process > generateSnpMissingnessPlot (1) [ 0%] 0 of 1
[07/c7f4df] process > generateIndivMissingnessPlot (1) [ 0%] 0 of 1
[4f/31c923] process > getInitMAF (1) [100%] 1 of 1 β
[86/bff059] process > showInitMAF (1) [ 0%] 0 of 1
[92/c96228] process > showHWEStats (1) [ 0%] 0 of 1
[e1/337150] process > removeQCPhase1 (1) [ 0%] 0 of 1
[- ] process > compPCA -
[- ] process > drawPCA -
[- ] process > pruneForIBD -
[- ] process > findRelatedIndiv -
[- ] process > calculateSampleHeterozygosity -
[- ] process > generateMissHetPlot -
[- ] process > getBadIndivsMissingHet -
[- ] process > removeQCIndivs -
[- ] process > calculateSnpSkewStatus -
[- ] process > generateDifferentialMissingnessPlot -
[- ] process > findSnpExtremeDifferentialMissingness -
[- ] process > removeSkewSnps -
[- ] process > calculateMaf -
[- ] process > generateMafPlot -
[- ] process > findHWEofSNPs -
[- ] process > generateHwePlot -
[- ] process > outMD5 -
[- ] process > batchProc -
[- ] process > produceReports -
Error executing process > 'generateIndivMissingnessPlot (1)'
Caused by:
executor > local (13)
[3e/4a0c85] process > inMD5 (1) [100%] 1 of 1 β
[2d/169242] process > noSampleSheet [100%] 1 of 1 β
[41/1da057] process > getDuplicateMarkers (1) [100%] 1 of 1 β
[42/26b684] process > removeDuplicateSNPs (1) [100%] 1 of 1 β
[29/8db8f9] process > getX (1) [100%] 1 of 1 β
[- ] process > analyseX (1) -
[1f/38fd77] process > identifyIndivDiscSexinfo (1) [100%] 1 of 1 β
[- ] process > generateSnpMissingnessPlot (1) -
[07/c7f4df] process > generateIndivMissingnessPlot (1) [100%] 1 of 1, failed: 1 β
[4f/31c923] process > getInitMAF (1) [100%] 1 of 1 β
[- ] process > showInitMAF (1) -
[- ] process > showHWEStats (1) -
[- ] process > removeQCPhase1 (1) -
[- ] process > compPCA -
[- ] process > drawPCA -
[- ] process > pruneForIBD -
[- ] process > findRelatedIndiv -
[- ] process > calculateSampleHeterozygosity -
[- ] process > generateMissHetPlot -
[- ] process > getBadIndivsMissingHet -
[- ] process > removeQCIndivs -
[- ] process > calculateSnpSkewStatus -
[- ] process > generateDifferentialMissingnessPlot -
[- ] process > findSnpExtremeDifferentialMissingness -
[- ] process > removeSkewSnps -
[- ] process > calculateMaf -
[- ] process > generateMafPlot -
[- ] process > findHWEofSNPs -
[- ] process > generateHwePlot -
[- ] process > outMD5 -
[- ] process > batchProc -
[- ] process > produceReports -
Error executing process > 'generateIndivMissingnessPlot (1)'
Caused by:
Process generateIndivMissingnessPlot (1)
terminated with an error exit status (1)
Command executed [/Users/devina/h3agwas/qc/templates/missPlot.py]:
#!/usr/bin/env python3
#Load SNP frequency file and generate histogram
import pandas as pd
import numpy as np
import sys
from matplotlib import use
use('Agg')
import argparse
import matplotlib
import matplotlib.pyplot as plt
import sys
def parseArguments():
if len(sys.argv)<=1:
sys.argv="snpmissPlot.py sampleA-nd.imiss samples sampleA-nd-indmiss_plot.pdf".split()
parser=argparse.ArgumentParser()
parser.add_argument('input', type=str, metavar='input'),
parser.add_argument('label', type=str, metavar='label'),
parser.add_argument('output', type=str, metavar='output'),
args = parser.parse_args()
return args
args = parseArguments()
data = pd.read_csv(args.input,delim_whitespace=True)
fig = plt.figure(figsize=(17,14))
fig,ax = plt.subplots()
matplotlib.rcParams['ytick.labelsize']=13
matplotlib.rcParams['xtick.labelsize']=13
miss = data["F_MISS"]
big = min(miss.mean()+2*miss.std(),miss.nlargest(4).iloc[3])
interesting = miss[miss<big]
if len(interesting)>0.9 * len(miss):
miss = interesting
miss = np.sort(miss)
n = np.arange(1,len(miss)+1) / np.float(len(miss))
ax.step(miss,n)
ax.set_xlabel("Missingness",fontsize=14)
ax.set_ylabel("Proportion of %s"%args.label,fontsize=14)
ax.set_title("Cumulative prop. of %s with given missingness"%args.label,fontsize=16)
fig.tight_layout()
plt.savefig(args.output)
Command exit status:
1
Command output:
(empty)
Command error:
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-afe5mj8_ because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Traceback (most recent call last):
File ".command.sh", line 38, in
n = np.arange(1,len(miss)+1) / np.float(len(miss))
File "/usr/local/lib/python3.8/dist-packages/numpy/init.py", line 284, in getattr
raise AttributeError("module {!r} has no attribute "
AttributeError: module 'numpy' has no attribute 'float'
Work dir:
/Users/devina/h3agwas/work/07/c7f4dfd568806c00cebe0c50ebf0d7
Tip: you can replicate the issue by changing to the process work dir and entering
suggest labelling of the 'time' and 'memory' params in the 'illumina2lgen' process in order to accommodate the processing of studies with larger sample sizes.
The sheet2fam.py script fails if there are characters with accents in them. Having these is probably a bad idea, but who are we to judge -- should be protected against.
Detect and report number of autosomal SNPs
Dear Prof Scott
Thank you for making GWAS so simple with this h3agwas pipeline. I am using the pipeline in parallel with local gwas tools like Plink, QCTOOL, IMPUTE2 etc to compare my results.
I just recently installed the pipeline on my account on the DELGEME bio server since it requires no root privileges and will run faster there.
However, it spilt some errors (below) when I tried running the tutorial.
nextflow run plink-qc.nf --input_pat sampleA --input_dir sample
N E X T F L O W ~ version 18.10.1
Launching plink-qc.nf
[cranky_perlman] - revision: f986962b17
Sexinfo available command
[warm up] executor > local
[8a/01c9f4] Submitted process > inMD5 (1)
[40/3c2c85] Submitted process > getDuplicateMarkers (1)
[58/52ead8] Submitted process > removeDuplicateSNPs (1)
ERROR ~ Error executing process > 'removeDuplicateSNPs (1)'
Caused by:
Process removeDuplicateSNPs (1)
terminated with an error exit status (1)
Command executed:
plink --keep-allele-order --bfile sampleA --must-have-sex --exclude sampleA.dups --missing --make-bed --out sampleA-nd
wc -l sampleA.bim > sampleA.orig
wc -l sampleA.fam >> sampleA.orig
Command exit status:
1
Command output:
(empty)
Command error:
plink: unknown option "--keep-allele-order"
plink: unknown option "--bfile"
plink: unknown option "--must-have-sex"
plink: unknown option "--exclude"
I observed the plink-qc.nf
file and noticed that plink is written simply as plink
which is the same as the connectivity tool
Plink: command-line connection utility
Release 0.67
Usage: plink [options] [user@]host [command]
("host" can also be a PuTTY saved session name)
I am not sure how significant the error is but I just thought it is worth pointing out since some users managing the pipeline with Git might potentially encounter this.
We have plink on the server, but the executable file was renamed to plink1 in order to prevent the conflict. We also have plink1.9 which is actually named as plink1.9
One fix for the conflict that I thought of doing was to manually edit the pink in all the executables to plink1.9.
However, is there a possibility to add program as a parameter in the .config file so that users can specify what version they want to use, and then call it up in the .nf and .sh executables as a variable? This may potentially be useful in making the pipeline support plink2 which you raised as an issue, and other versions.
Please pardon the typos, I am just a novice :)
Thanks,
Esoh
Plink 2 has different naming conventions for output files. It would be good to have code that supports both plink 1.9 and 2.
Quick question about the intended behavior of templates/removeRelInds.py
.
The line if x in remove or y in remove : pass
doesn't actually do anything since pass
is a null operation in Pythonβso it is as if the if
didn't happen at all.
Shouldn't the line actually be if x in remove or y in remove : continue
if the true intention is to return control to the beginning of the loop?
Nice to have
The batchProc.py is probably the worst code I have written in 25 years. Needs to be redone from scratch.
Hi, I am trying to run the association pipeline with the test data included in the sample
directory.
I am familiar with nextflow and docker but less familiar with GWAS/plink etc.
I have run the following command:
nextflow run plink-assoc.nf --input_dir sample --input_pat sampleA --chi2 1 --logistic 1 --adjust 1 --data sample/sample.phe --pheno "SEX,PHE" --covariates "PAT,MAT" -profile docker
It ran to completion producing a very nice report:
out-report.pdf
However, I am not completley sure I understand exactly what the pipeline is doing.
For example how are the pheno
& covariates
parameters are used.
The documentation says the following:
pheno: a comma-separated list of phenotypes that you want to test. Each phenotype is tested separately.
covariates: a comma-separated list of phenotypes that you want to use
So are each of the pheno
paramters tested against the covariates
parameters to see if the vary together at the particular genetic regions?
Also what are the observed and expected p-value in the output report for? Is this for each trait/phenotype that is used in the covariants and how strongly they correlate with particular genomic regions for example? And why are some of the plots empty?
Great pipeline though and many thanks in advance. Any help would be much appreciated
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.