Code Monkey home page Code Monkey logo

ska's People

Contributors

johnlees avatar simonrharris avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ska's Issues

Less memory for `merge` and `align`

We just has a chat about this. I was trying to merge two kmerge files of about 450-500MB each. I ran out of memory on a 32GB machine. I have a feeling you could do merging with a lot less memory if you did some memory mapping or if you held the kmers in memory but only used a reference to the bitstring for the alleles.

You might be able to do something similar to make the align use a lot less memory but I'm less sure.

Understanding split kmers produced from fasta files

In trying to understand split kmers produced from fasta files I produced 3 simple and very non-biologically relevant fasta files

ref.fas (a string of 160 Gs)

As expected this produced 1 split kmer as shown by the humanized summary

The 2nd fasta file (sample1) had just one polymorphism (G->A) at position 90

This file produced 31 split kmers as expected but the kmer that should have had an A as the middle base is reported with an N I think

The 3rd fasta file (sample2) had another one polymorphism (G->T) at the same position 90

Similar thing with the split kmer file

And when I calculated the distance between sample 1 and 2 using ska distance sample1.skf sample2.skf -o sample1_vs_sample2 the output showed no SNPs

What is `XXXXXX_ddl.fa` ?

I found a file *_ddl.fa in my folder, possibly leftover from an interrupted ska process.

If it is from ska, could you please honour $TMPDIR or use mktemp() which honours it?

It will speed up stuff for most people, especially where . is slow NFS.

Order of `type` columns

Currently
Sample adk atpA ddl gdh gyd pstS purK ST

Any chance making it compatible with mlst?
Sample ST adk atpA ddl gdh gyd pstS purK

This has the advantage that no matter how many alleles in the scheme, the ST will be consistent.

But I understand that ST is not compulsory, and that this could be used for assaying all sorts of amplicons such as AMR genes.

Publish the thing

This is a really nice and simple tool to use and it's a shame that it's only the two of us using it ๐Ÿ™‚

SKA Type crashes when no allele is found for a locus

If a sample is missing any of the supplied loci to SKA type it says that there is a mismatch in the number between your alleles and profile. It would be really useful if SKA type could tolerate a missing locus and provide a hyphen in its place since for some schemes the locus could be genuinely missing in some lineages. Most of the time it would be the result of poor quality sequence data but it would be really useful to have the program identify the locus as missing instead of erroring out.

ska distances: adding isolates to databse

Is there any way to add isolates to a database of distances without having to recalculate all of the distances in the database? It would be great if this was possible and the original cluster names stayed the same too for easy modelling over a long time period. Thanks

Version the kmers and/or kmerge files

At some point I suspect someone will change the format of the files output by ska and so it might be a good idea to specify a version at the top of the kmers file so that you can tell if this version of the software is compatible with this version of input file.

"make PREFIX=/my/dir install" still uses /usr/local

make PREFIX=/home/linuxbrew/.linuxbrew/Cellar/ska/1.0 install
cp bin/ska /usr/local/bin/
cp: cannot create regular file <E2><80><98>/usr/local/bin/ska<E2><80><99>: Permission denied
make: *** [Makefile:34: install] Error 1

I can't figure out why this is happening.

ska annotate --- generate a multi-sample VCF ?

Got this:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
CP006620        3509    .       T       C       .       .       NS=14;NS5=0,2,0,12,0
CP006620        4634    .       A       T       .       .       NS=14;NS5=12,0,0,2,0
CP006620        4937    .       C       T       .       .       NS=14;NS5=0,12,0,2,0

Expected:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO  SAMPLE1 SAMPLE2 SAMPLE3 ....
CP006620        3509    .       T       C       .       .       NS=14;NS5=0,2,0,12,0 GT 1 0 1 0 0 1 2
etc

SKA run out of memory on 32 GB VM

Please help! I'm running out of memory on 32GB VM when trying to merge more than 1000 isolates of Mtb. It max out ram and whole VM frozen after reading file 389 everytime. Can anyone shed some light on it??

fastq -> READS ; fasta -> CONTIGS ?

Due to differences inc ommand line parameters, I assume fastq is actually for short sequence reads, and fasta for contigs?

What if my reads are in FASTA format? Will fastq barf ?

Remove rare kmers from merge file

Please could you add a command so that I can remove rare kmers from a merge file? I've got a couple of merge files which I'd like to merge but I run out of RAM. If possible I'd like to remove all the kmers which are in fewer than 80% of input sequences because I think that'll make the merge (and later alignment) much more efficient.

ska weed -d 0.8 my_data.kmerge

Consider using either kmerge or kmers as a file name, not both

We had a chat about having a consistend name for the kmer outputs from ska. I like the idea of everything being *.kmers by default. If possible I'd also prefer you left filenames up to the user so instead of prefix they specify the full name for their output.

unique kmer output

Where does the unique.skf file go once we run "ska unique"? Are there any limitations to this that we should know of?
My command runs and does not give me an error. I get the following:
Output will be written to unique.skf
But then, I don't see this file...

ska fasq vs ska fasta

Hi,

I am getting different results (number of SNPs and SNP distances) when I run ska distance with .skf files generated using ska fastq and ska fasta, with the same genome database. Specifically, the number of SNPs is greater when I use the fasta option than when I use fastq. I wonder if this is associated with error/variation introduced during the assembly process (reads were trimmed and quality assessed), or is there any other issue to consider.

Thanks!

One command for `fasta`, `fastq` and `merge`

We had a chat about you combining the following commands:

  • fasta
  • fastq
  • merge

My thought was that you could have one command:

$ ls
lots_of_samples.kmers
sample_a.kmers
sample_b.fasta
sample_c_1.fastq
sample_c_2.fastq

$ ska add -o lots_of_samples.kmers sample_a.kmers sample_b.fasta sample_c_1.fastq:sample_c_2.fastq

ska type multiple samples at once?

If the user used -f <file> for the locii, could you allow the positional parameters to be multiple *.skf files so we get a table of ST calls?

Tag a release

Could you tag a release (once ready) โ€“ I'll write a conda recipe, which seems to be something I am into at the moment

kmers split by 2 adjacent SNPs for allele-specific primers

I am interested in using SKA to generate allele-specific LNA-modified primers for clinical outbreak investigation. I have successfully run through the tutorial with a test dataset but am trying to solve the following problem: from a primer-design perspective, it may be optimal to identify split kmers which differ by two adjacent nucleotide rather than just one. Is this something that might be possible with SKA or would be within the scope of future releases?

Many thanks,

Dustin

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.