atulkakrana / phasis Goto Github PK

Suite for phased clusters discovery, comparison, annotation and to identify miRNA triggers - uses MapReduce model [In review]

License: Other

Python 87.01% Shell 0.76% Perl 12.22%

phasis's Introduction

👋 Hi, I’m @atulkakrana
👀 I’m interested in precision medicine, RWE, and applying computer science for improving human health
💞️ I would love to collaborate for technologies that helps improve human health or health outcomes
📫 You can reach me at [email protected]

phasis's People

Stargazers

Watchers

Forkers

unix0000 alexharkess asparagusgenome swuklu blakemeyerslab ucdavis hj1994412 thalescherubino

phasis's Issues

Indexing the genome

Hi Atul,

In response to your email of yesterday. I weird, it works on your computer… Yes, I used ‘python3’ and bowtie ‘v1’.

As troubleshooting, I try some things.
(1) Build the index by myself and indicate the path of the index in the ‘phaser.set’ file. It is not working.
(2) I used only for the 7 chromosomes without unassembled scaffold and contig. It is not working.
(3) I download a second time the genome and tried against. It is not working.

See the log information bellow:

mbpdesebastien:phasiRNA sebel76$ python3 phaser

Verifying User Authorization

Hello 'sebel76' - Please report issues at: https://github.com/atulkakrana/phasTER/issues

Fn: checkLibs

--Python v3.0 or higher : found
--Perl v5.14 or higher : found
--Bowtie (v1) : found
--Scalar::Util (perl) : found
--Data::Dumper (perl) : found
--Parallel::ForkManager (perl) : found
--Getopt::Long (perl) : found
--phasTER Core : found

Fn: Settings Reader

User Input runType: G
User Input reference location: /Users/sebel76/NucleicAcid/ensembl_relase31/dna/HvuASM32608v1Chr.fa
User Input Libs: ['UNMDay0clippedCollapsed.txt']
User Input to auto fetch libs: T
User Input for phase length: 21
User Input index location:

7 cores reserved for analysis

7 threads assigned to one lib

This is first run - create index

Fn: indexBuilder

Reference file located - Preparing to create index
PHASER uses FASTA header as key for identifying the phased loci
Cleaning header '/Users/sebel76/NucleicAcid/ensembl_relase31/dna/HvuASM32608v1Chr.fa' reference FASTA file
Traceback (most recent call last):
File "phaser.py", line 990, in
File "phaser.py", line 854, in main
File "phaser.py", line 572, in indexBuilder
File "phaser.py", line 653, in FASTAClean
OSError: [Errno 22] Invalid argument

Have you an idea how I can resolve the problem?
Thanks,
Sébastien

Script hangs while handling large amount of sRNA files

I'm using PHASdetect for n~100 sRNA libraries. The script hangs while processing these large amount of libraries. However, I do see results folder being generated and see the output for every sRNA library in it.
The issue is it's the finishing analysis, but not showing on screen (just hangs there) for many sRNA libraries.

Thanks
Parth.

phasmerge: error message

Hi Atul,

I used your program PHASIS on my personal computer. I first performed the phasdetect with no problem. Then, I tried to used the phasmerge as explicated in your wiki tutorial. But, I got a message error.

Can you help me to resolve the problem? Please check the following error message:

MacBook-Pro-de-Sebastien:24-nt sebel76$ python3 phasmerge -mode merge -dir D0

Fn: checkLibs

--Python v3.0 or higher : found

Fn: usagedata

Traceback (most recent call last):
File "phasmerge.py", line 2704, in
File "phasmerge.py", line 2457, in main
File "phasmerge.py", line 2439, in usagedata
File "/Users/sebel76/miniconda3/lib/python3.5/smtplib.py", line 251, in init
(code, msg) = self.connect(host, port)
File "/Users/sebel76/miniconda3/lib/python3.5/smtplib.py", line 335, in connect
self.sock = self._get_socket(host, port, self.timeout)
File "/Users/sebel76/miniconda3/lib/python3.5/smtplib.py", line 306, in _get_socket
self.source_address)
File "/Users/sebel76/miniconda3/lib/python3.5/socket.py", line 711, in create_connection
raise err
File "/Users/sebel76/miniconda3/lib/python3.5/socket.py", line 702, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 61] Connection refused

Do you have any recommendation to help me?

Many thanks and best regard,
Sébastien

phasmerge- OperationalError: unrecognized token

Hi Autl,

After the successfully running of phasdetect step (runType = T), I tried to do the phasmerge (merge mode) with gtf annotation file (from Phytozome). My command was:
$ python3 phasmerge -mode merge -dir phased_09_18_15_31/ -pval 0.005 -gtf SBI.gtf -debug F

However, my running was stop and I got this error message:

Fn: Annotate

Preparing SQL table with GTF features
Feature table made with 313837 entries
Analysis for overlapping trans initialized
Traceback (most recent call last):
File "phasmerge.py", line 2704, in
File "phasmerge.py", line 2651, in main
File "phasmerge.py", line 1984, in overlapChecker
File "phasmerge.py", line 2069, in overlapTrans
sqlite3.OperationalError: unrecognized token: ".001G077000"

Would it be a format problem of gene locus name?

Cheers,
Louis

phasdetect_phasing analysis error

hello
When I run phasdetect to the step of phasing analysis , the program reports an error saying:
sh: ./phased_08_31_09_47//home/appl/software/PHASIS-3.3/test-data/2600_norm.txt.GoEz7MqgxwW: No such file or directory
Tue Aug 31 09:48:52 2021: aligning is done
Cannot open the bowtie outputed DATA at /home/appl/.phasis/phasclust.genome.v2.pl line 111.
** Problem with Phasing script - Return code not 0

Can you answer me，please.
Thank you.

Memory issue

Hi Atul,
While running phasTER, I got stuck into a weird issue: ** Problem with Phasing script - Return code not 0

Cannot fork: Cannot allocate memory at /usr/local/perls/lib/site_perl/5.20.0/Parallel/ForkManager.pm line 515, line 7791362.
Cannot fork: Cannot allocate memory at /usr/local/perls/lib/site_perl/5.20.0/Parallel/ForkManager.pm line 515, line 8033261.
Cannot fork: Cannot allocate memory at /usr/local/perls/lib/site_perl/5.20.0/Parallel/ForkManager.pm line 515, line 6822425.
Cannot fork: Cannot allocate memory at /usr/local/perls/lib/site_perl/5.20.0/Parallel/ForkManager.pm line 515, line 6170777.
Cannot fork: Cannot allocate memory at /usr/local/perls/lib/site_perl/5.20.0/Parallel/ForkManager.pm line 515, line 6252872.
And the program terminates automatically without giving phased loci.
Can you please help me to find the solution.

Thanks,
Suresh

Using another version of bowtie does work for phasTER?

Hi Atul,
I was trying to run the phasTER in wheat small RNA libraries, phasTER couldn't able to build the index for wheat genome which is 3.6 GB. Because the default bowtie version in tarkan has limitation of around or less than 4GB. So I used higher version of bowtie, bowtie-1.1.2, and ran the phasTER, it builds the index but it cannot process further it search for .ebwt file but for large genome bowtie outputs .ebwtl files.
the error message was:
Traceback (most recent call last):
File "phaser.py", line 969, in
File "phaser.py", line 833, in main
File "phaser.py", line 595, in indexBuilder
FileNotFoundError: [Errno 2] No such file or directory: '~/index/Triticum_aestivum.TGACv1.dna.toplevel.clean.1.ebwt'
Can you please help to sort out this isssue?

many perl processes in phasedetect

We found that during the phasedetect, a huge number of perl processes were run in parallel and render our computer in a "hang" state. Is there a option way to set the limited of processor or thread?

gtf file error

Hi Atul,
I was trying to annotate the PHAS loci using gtf file as recommended in phasmerge wiki, it works for Arabidopsis, but doesn't work for the Flax. Here is the error:

Fn: gtfParser

Parsing 'Lusitatissimum_200_v1.0.gene_exons.gtf' gtf file
Total entries fetched from GTF file:262280

Fn: Annotate

Preparing SQL table with GTF features
Feature table made with 262280 entries
Analysis for overlapping trans initialized
Traceback (most recent call last):
File "phasmerge.py", line 2704, in
File "phasmerge.py", line 2651, in main
File "phasmerge.py", line 1984, in overlapChecker
File "phasmerge.py", line 2069, in overlapTrans
sqlite3.OperationalError: no such column: C7915081

The genomes and gff files are from phytozome. Can you please help to resolve the issue.

Thanks,
Suresh

Will will read more on phasiRNAs and piRNAs

Error when running phasmerge: too many files open

I'm trying to run phasmerge and it's throwing the following error:

Traceback (most recent call last):
File "phasmerge.py", line 2704, in
File "phasmerge.py", line 2539, in main
File "phasmerge.py", line 1708, in PPResults
File "/grid/sw/python/3.9.5/lib/python3.9/multiprocessing/context.py", line 119, in Pool
File "/grid/sw/python/3.9.5/lib/python3.9/multiprocessing/pool.py", line 212, in init
File "/grid/sw/python/3.9.5/lib/python3.9/multiprocessing/pool.py", line 303, in _repopulate_pool
File "/grid/sw/python/3.9.5/lib/python3.9/multiprocessing/pool.py", line 326, in _repopulate_pool_static
File "/grid/sw/python/3.9.5/lib/python3.9/multiprocessing/process.py", line 121, in start
File "/grid/sw/python/3.9.5/lib/python3.9/multiprocessing/context.py", line 277, in _Popen
File "/grid/sw/python/3.9.5/lib/python3.9/multiprocessing/popen_fork.py", line 19, in init
File "/grid/sw/python/3.9.5/lib/python3.9/multiprocessing/popen_fork.py", line 65, in _launch
OSError: [Errno 24] Too many open files

I tried increasing the ulimit on my server, but I can't increase it beyond 4096.

There is some problem preparing index of reference

Hi,
I have got the following error in the logs:

In case of any issue at any point in PHASIS analyses, contact authors at:
https://github.com/atulkakrana/phasis/issues

Done:      phasdetect
Done:      phasmerge
Done:      phastrigs
Done:      core scripts
Deleted:   source files

Note:'install.sh' cannot be re-used after a successful installation
Note: If installation fails, then recopy all files and try again

phasis suite is ready be used
See readme here: https://github.com/atulkakrana/phasis

Settings:
  Output files: "/lustre/scratch/waterhouse_team/phasirna/test2/fastp2fasta/index/NbV1ChF.clean.*.ebwt"
  Line rate: 6 (line is 64 bytes)
  Lines per side: 1 (side is 64 bytes)
  Offset rate: 5 (one in 32)
  FTable chars: 10
  Strings: unpacked
  Max bucket size: default
  Max bucket size, sqrt multiplier: default
  Max bucket size, len divisor: 4
  Difference-cover sample period: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  /lustre/scratch/waterhouse_team/phasirna/test2/fastp2fasta/NbV1ChF.clean.fa
Reading reference sizes
  Time reading reference sizes: 00:01:34
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:19
bmax according to bmaxDivN setting: 685716146
Using parameters --bmax 514287110 --dcv 1024
  Doing ahead-of-time memory usage test
  Passed!  Constructing with these parameters: --bmax 514287110 --dcv 1024
	Constructing suffix-array element generator
	Building DifferenceCoverSample
	  Building sPrime
	  Building sPrimeOrder
	  V-Sorting samples
	  V-Sorting samples time: 00:01:46
	  Allocating rank array
	  Ranking v-sort output
	  Ranking v-sort output time: 00:00:25
	  Invoking Larsson-Sadakane on ranks
	  Invoking Larsson-Sadakane on ranks time: 00:00:37
	  Sanity-checking and returning
	Building samples
	Reserving space for 12 sample suffixes
	Generating random suffixes
	QSorting 12 sample offsets, eliminating duplicates
	QSorting sample offsets, eliminating duplicates time: 00:00:00
	Multikey QSorting 12 samples
	  (Using difference cover)
	  Multikey QSorting samples time: 00:00:00
	Calculating bucket sizes
	Splitting and merging
	  Splitting and merging time: 00:00:00
	Avg bucket size: 2.74286e+09 (target: 514287109)
	Converting suffix-array elements to index image
	Allocating ftab, absorbFtab
	Entering Ebwt loop
	Getting block 1 of 1
	  No samples; assembling all-inclusive block

	#### Checking user ###########################
	Hello 'lorencm' - Please report issues at: https://github.com/atulkakrana/PHASIS/issues
	#### Fn: checkLibs ###########################
--Python v3.0 or higher          : found
--Perl v5.14 or higher           : found
--Bowtie (v1)                    : found
--Scalar::Util (perl)            : found
--Data::Dumper (perl)            : found
--Parallel::ForkManager (perl)   : found
--Getopt::Long (perl)            : found

#### Fn: Settings Reader #####################
User Input runType               : G
User Input reference location    : /scratch/waterhouse_team/Taek_Test/NbV1ChF_JBrowse/NbV1ChF.fasta
User Input Libs                  : SRR1555764.fastp.fasta,SRR1555765.fastp.fasta,SRR6074038.fastp.fasta,SRR6074039.fastp.fasta,SRR935013.fastp.fasta,SRR947064.fastp.fasta
user library format              : F
User Input for phase length      : 21
User Input index location        : None
User Input for min. sRNA depth   : 3
User Input distance b/w clusters : 300

#### 8 computing core(s) reserved for analysis ##########
#### 3 computing core(s) assigned to one lib ############

This is first run - create index

#### Fn: indexBuilder #######################
Reference file located - Preparing to create index
phasdetect uses FASTA header as key for identifying the phased loci
Caching '/scratch/waterhouse_team/Taek_Test/NbV1ChF_JBrowse/NbV1ChF.fasta' reference FASTA file
Fasta file with reduced header: '/lustre/scratch/waterhouse_team/phasirna/test2/fastp2fasta/NbV1ChF.clean.fa' with total entries 19 is prepared
There were 0 entries found with empty sequences and were removed

**Deleting old index 'folder' !!!!!!!!!!!**
Creating index of cDNA/genomic sequences:/lustre/scratch/waterhouse_team/phasirna/test2/fastp2fasta/index/NbV1ChF.clean**

There is some problem preparing index of reference '/scratch/waterhouse_team/Taek_Test/NbV1ChF_JBrowse/NbV1ChF.fasta'
Is 'Bowtie' installed? And added to environment variable?
Script will exit now

However, ls -hal index shows:

-rw-rw---- 1 lorencm default  16K Aug  1 21:57 NbV1ChF.clean.1.ebwt
-rw-rw---- 1 lorencm default    0 Aug  1 21:56 NbV1ChF.clean.2.ebwt
-rw-rw---- 1 lorencm default  16K Aug  1 21:55 NbV1ChF.clean.3.ebwt
-rw-rw---- 1 lorencm default 654M Aug  1 21:55 NbV1ChF.clean.4.ebwt

What did I miss?

Thank you in advance,

Michal

Header row missing for phasdetect output

Hi. I was able to successfully run phasdetect, but in the output_all_sRNA_21_out.txt file there is no header row explaining the columns. I also couldn't find this information on the wiki. Can you please provide me with an explanation of the columns?

Thanks,
Julie

1 + 524790 seq_74542|14 CCGTCTACTTGTACAATGGGT 21 14 5 1 524790 0.217327026756377
1 - 524802 seq_151702|4 TACAAGTAGACGGCACATGGC 21 4 2 1 524802 0.0930688978411321
1 - 524809 seq_139924|9 CCCATTGTACAAGTAGACGGC 21 9 5 1 524809 0.217327026756377
1 - 524810 seq_365561|10 ACCCATTGTACAAGTAGACGG 21 10 5 1 524810 0.217327026756377

phasemerge: _collapsed.txt and _summary.txt report different number of loci

When running phasemerge the resulting summary folder contains _collapsed.txt and _summary.txt both listing predicted phas loci. However they differ in the number of reported loci regardless of the used parameter and and libraries, with the summary.txt loci being a subset of collapsed.txt.

For example:
21PHAS_p1e-06_collapsed.txt:

Name	p-val	Chr	Start	End	Strand	Lib
Phas-1	1e-07	1	18549462	18549648	NONE	nd
Phas-2	1e-07	1	23178442	23178628	NONE	nd
Phas-3	1e-07	1	23299603	23299831	NONE	nd
Phas-4	1e-07	1	23413412	23413682	NONE	nd
Phas-5	1e-07	1	23419942	23420149	NONE	nd
Phas-6	1e-07	1	23490185	23490371	NONE	nd
Phas-7	1e-07	1	23507890	23508076	NONE	nd
Phas-8	5e-07	2	11721883	11722090	NONE	nd
Phas-9	1e-07	2	16539751	16540000	NONE	nd
Phas-10	5e-07	5	23394349	23394430	NONE	nd

21PHAS_p1e-06_summary.txt

Name	P-val	Chr	Start	End	Identifier	Best k-val	Phasi ratio	 Max Tag Ratio	SRR1634280.fa	Total Phasi Abundance	Most Abun Tag (MAT)	 MAT Abun	MAT2	MAT2 Abun	BestLib
Phas-1	1e-07	1	18549462	18549649	1_18549462_18549648	15	0.92	0.25	1484	1484	TATTATCAGAGTAGTTATGAT	368	TTCTAAGTCCAACATAGCGTA	340	nd
Phas-3	1e-07	1	23299603	23299832	1_23299603_23299831	11	0.84	0.46	699	699	ATGGGATATAAACCTGATACC	323	AACGGATTATGTAAGAGAGGT	115	nd
Phas-5	1e-07	1	23419942	23420150	1_23419942_23420149	10	0.84	0.46	700	700	ATGGGATATAAACCTGATACC	323	AACGGATTATGTAAGAGAGGT	115	nd
Phas-8	5e-07	2	11721883	11722091	2_11721883_11722090	14	0.87	0.33	1129	1129	ATGATATTTGTAGTAATGGCG	373	TTCTAAGTCCAACATAGCGTA	340	nd
Phas-9	1e-07	2	16539751	16540001	2_16539751_16540000	21	0.91	0.45	5252	5252	TTTGAACTTGTGTATTTTGAA	2338	TCCAAGCGAATGATGATACTT	1347	nd

Is there a reason for this?
Note, I've specified a pvalue of 0.01 (and tried few more setttings), so this is not the discriminating factor as can also be seen above.
It would be nice if _summary.txt would contain the complete _collapsed.txt list since it contains useful additional information for each loci.

Phastrings error

Hi Atul,

I used your program PHASIS. I first performed the phasdetect and phasmerge (in merge and compare mode) with any problem. But when I'm trying to run phastrings in the correct folder (were is located phasis.set, scripts, libraries, fasta miRNA file and phastmerge results) I obtain this error:

$ python3 phastrigs -mode auto -dir summary_09_05_16_59 -mir miRNAs_secuencias_para_buscar_blancos.fasta

Fn: checkLibs

--sPARTA : found
--Bowtie2 : found
--scipy : found
--numpy : found

Fn: Settings Reader

User Input runType : G
User Input reference : /home/dinkova/sRNAs_analysis/phasis_final_genome_Zeamays_AGPv4/Zea_mays.AGPv4.dna.toplevel.fa
User Input Libs : ZmaEm1-1.txt,ZmaEm1-2.txt
User Input for phase length : 21
User Input index location : /home/dinkova/sRNAs_analysis/phasis_final_genome_Zeamays_AGPv4/index/Zea_mays.AGPv4.dna.toplevel.clean

Fn: memReader

collapse phase : 21
collapse pval : 1e-05
collapse file : ./summary_09_05_16_59/21PHAS_p1e-05_collapsed.txt
Creating dictionary of phased loci
This is the phaseList [-105, -84, -63, -42, -21, 0, 21, 42, 63, 84, 105]

Fn: PHAS Reader

Head dictionary made with entries:8
Tail dictionary made with entries:8
** Strange, as you shouldn't have reached to this end of logic
** There is some problem validating correct reference location
** Reference file with clean headers located in /newdata/data2/homes/dinkova/sRNAs_analysis/phasis_final_genome_Zeamays_AGPv4/Zea_mays.AGPv4.dna.toplevel.clean.fa will be used
** You might face some issues in phastrigs run - Keep this message in mind

Fn: cacheGenome

Traceback (most recent call last):
File "phastrigs.py", line 1751, in
main()
File "phastrigs.py", line 1630, in main
coordsfile,extractseq = extractSeq(reference,PHASList,phasbuff) ## runType aware
File "phastrigs.py", line 371, in extractSeq
fastaD,fastalenD = cacheGenome(fastaclean)
File "phastrigs.py", line 969, in cacheGenome
name = ent[0].split()[0].strip()
IndexError: list index out of range

I used this same location for reference and index to run phasdetect and phasmerge. my phasis.set file is this:

<<< Settings file for PHASIS >>>

<<< Mandatory Settings, see descriptions below >>>
runType = G
reference = /home/dinkova/sRNAs_analysis/phasis_final_genome_Zeamays_AGPv4/Zea_mays.AGPv4.dna.toplevel.fa
userLibs = ZmaEm1-1.txt,ZmaEm1-2.txt
libFormat = T
phase = 21

<<< Optional Settings, leave empty to make index on fly and reuse, value in text>>>
index = /home/dinkova/sRNAs_analysis/phasis_final_genome_Zeamays_AGPv4/index/Zea_mays.AGPv4.dna.toplevel.clean

<<< Advanced Settings, value in text>>>
minDepth = 3
clustBuffer = 300
mismat = 0

<<>>
<runType - G: Running on whole genome | T: running on transcriptome | S: running on scaffolde$
<reference - If @runtype = ‘G’ then genome FASTA | @runtype = ’S’ or ’T’ then your scaffolds or transc$
<userLibs - Specify library IDs in comma separated format to fetch data. Used only if @fetchLi$
<libFormat - Specify the sRNA library format. F: FASTA Format | T: Tag count format>
<phase - Desired phase to use for prediction. 21 for 21 nt PHAS | 24 for 24 nt PHAS>
<index - If bowtie index exist already provide the path and index suffix. If not, then leave blank,$
<minDepth - Minimum depth of sRNA to be considered for p-value computation>
<clustBuffer - Minimum distance between two clusters>
<mismat - Number of mismatches allowed between sRNA and reference for mapping>

Can you help me with that?

I really appreciate your answer :)

Thanks,
Thamara

No result found

my phasdetect script completed successfully. However, the result files are empty.
I used transcriptome as reference.

Could you plz suggest what parameters can be relaxed?

thanks

phasdetect error

Hi Atul,
What is the reason for the following situation?
the data was the test_data: 2599_norm.txt
Can you help me with that?

I really appreciate your answer!
Thanks,
Shiqi

reads processed: 9603241
reads with at least one reported alignment: 3527326 (36.73%)
reads that failed to align: 5938655 (61.84%)
reads with alignments suppressed due to -m: 137260 (1.43%)
Reported 3527330 alignments to 1 output stream(s)
Use of uninitialized value $seqid in pattern match (m//) at /home/.phasis/phasclust.genome.v2.pl line 119, line 679287.
Use of uninitialized value $seqid in pattern match (m//) at /home/.phasis/phasclust.genome.v2.pl line 122, line 679287.
we can't locate the abundance from the seq_id at /home/student/.phasis/phasclust.genome.v2.pl line 126, line 679287.
Tue Sep 18 17:02:31 2018: aligning is done

Fn: phaser

sRNA library located - Running phasing analysis
2599_norm.txt
** Problem with Phasing script - Return code not 0

phasmerge, ValueError: too many values to unpack (expected 2)

After phasdetect reported a successful run, I type the next command:
'python3 phasmerge -mode merge -dir phased_04_26_21_06'
Similar command gave me correct result with test dataset, but I got the following messages:

Fn: checkLibs

--Python v3.0 or higher : found

Fn: usagedata

Fn: Settings Reader

User Input runType : S
User Input reference : ABD_Genome.20170209.fa
User Input Libs: : WheatsRNA_tagcount.txt
User Input for library type : T
User Input for phase length : 21
User Input index location : None

Fn: pvalue capture

Total 10 clusters cached at different p-values: 0.005, 0.001, 0.0005, 0.0001, 5e-05, 1e-05, 5e-06, 1e-06, 5e-07, 1e-07
Best guessed p-value being used: 1e-06
User can specify a lower or higher p-val cutoff using '-pval' option

Fn: prepare

WARNING: Temporary files exits from earlier run, these will be deleted
1 list files found
10 clust files found
--1 files prepared for collapsing
--Working folder:summary_04_26_21_10 | Temporary Folder:./summary_04_26_21_10/temp

List files selected for analysis - Converting them to readable format
Total files to analyze: 1

Fn: listConvertor

Traceback (most recent call last):
File "phasmerge.py", line 2704, in
File "phasmerge.py", line 2484, in main
File "phasmerge.py", line 747, in listConverter
ValueError: too many values to unpack (expected 2)

Is there something I can do to fix this ?

Thanks!

Read papers on Phased siRNAs and piRNAs

Will will read papers on these topics

Not supporting large genome

Hello, thanks for creating this package for phase siRNA analysis. We are working with a large genome (17 Gb), which is more than the 4 billion bp allowed for the 32bit indexing of bowtie. As a result, bowtie treats it as 'large genome' and creates 'ebwtl' files rather than 'ebwt' files. And phasdetect reports an error (attached). Is there a way to work with large genome in phasis? Thanks!

Fn: checkLibs

Fn: Settings Reader

User Input runType : S
User Input reference location : /home/wheat_0421/ABD_Genome.20170209.fa
User Input Libs : /home/wheat_0421/WheatsRNA_tagcount.txt
user library format : T
User Input for phase length : 21
User Input index location : None
User Input for min. sRNA depth : 3
User Input distance b/w clusters : 300

7 cores reserved for analysis

7 cores assigned to one lib

This is first run - create index

Fn: indexBuilder

Reference file located - Preparing to create index
PHASER uses FASTA header as key for identifying the phased loci
Cleaning header '/home/wheat_0421/ABD_Genome.20170209.fa' reference FASTA file
Fasta file with reduced header: '/home/wheat_0421/ABD_Genome.20170209.clean.fa' with total entries 10663309 is prepared
There were 113398 entries found with empty sequences and were removed

Deleting old index 'folder' !!!!!!!!!!!
If its a mistake cancel now by pressing ctrl+D and continue from index step by turning off earlier steps- You have 2 seconds
Creating index of cDNA/genomic sequences:/home/wheat_0421/index/ABD_Genome.20170209.clean**

Generating MD5 hash for reference
Generating MD5 hash for Bowtie index
File extension for index couldn't be determined properly
It could be an issue from Bowtie
This needs to be reported to 'PHASwroks' developer - Script will exit

Stuck at Clusters have been scored

Hi,
It appears that PHASIS get stuck at the second round of clusters have been scored

#### Fn: phaser #############################
sRNA library located - Running phasing analysis
SRR1555764.fix.fastp.fas
** Problem with Phasing script - Return code not 0
Thu Aug 29 07:57:10 2019: Clusters have been scored...
# reads processed: 2971890
# reads with at least one reported alignment: 1335851 (44.95%)
# reads that failed to align: 1136504 (38.24%)
# reads with alignments suppressed due to -m: 499535 (16.81%)
Reported 3187434 alignments
# reads processed: 4152909
# reads with at least one reported alignment: 1743245 (41.98%)
# reads that failed to align: 1791404 (43.14%)
# reads with alignments suppressed due to -m: 618260 (14.89%)
Reported 4013018 alignments
Thu Aug 29 07:57:26 2019: aligning is done
Thu Aug 29 07:57:39 2019: finished the loading of sRNA alignment data.
Thu Aug 29 07:57:41 2019: finished the sorting of presplitted sRNA clusters data
Thu Aug 29 07:57:41 2019: total 2599 presplitted sRNA clusters
# reads processed: 7739944
# reads with at least one reported alignment: 5575799 (72.04%)
# reads that failed to align: 1174960 (15.18%)
# reads with alignments suppressed due to -m: 989185 (12.78%)
Reported 10475353 alignments
Thu Aug 29 07:57:30 2019: aligning is done
Thu Aug 29 07:57:43 2019: finished the loading of sRNA alignment data.
Thu Aug 29 07:57:44 2019: finished the sorting of presplitted sRNA clusters data
Thu Aug 29 07:57:45 2019: total 2796 presplitted sRNA clusters
Thu Aug 29 07:58:07 2019: Clusters have been scored...
Thu Aug 29 07:57:43 2019: aligning is done
Thu Aug 29 07:58:09 2019: finished the loading of sRNA alignment data.
Thu Aug 29 07:58:14 2019: finished the sorting of presplitted sRNA clusters data
Thu Aug 29 07:58:17 2019: total 41726 presplitted sRNA clusters
Thu Aug 29 07:58:17 2019: Clusters have been scored...
Thu Aug 29 08:02:16 2019: Clusters have been scored...
=>> PBS: job killed: walltime 154899 exceeded limit 154800

What did I miss?

Thank you in advance,

Michal

Python and PERL executable paths should probably be linked to environment variable

Hey there,

I've been meaning to check out this tool for a while, and finally got around to running the test data through. We had issues due the python and perl binaries being specified in different locations depending on the script.

For example, phasdetect.py's shebang line looks in /usr/local/bin/python3 but many systems have python installed in /usr/bin and manage local python versions with pyenv or similar. After editing the line to our system's location it runs without issue.

Likewise, phasclust.genome.v2.pl looks for perl in /usr/local/bin/perl and ours is in /local/bin.

Changing these paths to /usr/bin/env python3 and /usr/bin/env perl will enable it to run regardless of where the binary is located, solving the issue.

Just a suggestion. Thanks for the tool!

atulkakrana / phasis Goto Github PK

phasis's Introduction

phasis's People

Stargazers

Watchers

Forkers

phasis's Issues

Verifying User Authorization

Fn: checkLibs

Fn: Settings Reader

7 cores reserved for analysis

7 threads assigned to one lib

Fn: indexBuilder

Fn: checkLibs

Fn: usagedata

Fn: Annotate

Fn: gtfParser

Fn: Annotate

Fn: checkLibs

Fn: Settings Reader

Fn: memReader

Fn: PHAS Reader

Fn: cacheGenome

Fn: phaser

Fn: checkLibs

Fn: usagedata

Fn: Settings Reader

Fn: pvalue capture

Fn: prepare

Fn: listConvertor

Fn: checkLibs

Fn: Settings Reader

7 cores reserved for analysis

7 cores assigned to one lib

Fn: indexBuilder

Recommend Projects

Recommend Topics

Recommend Org