Code Monkey home page Code Monkey logo

Comments (4)

ivan-aksamentov avatar ivan-aksamentov commented on May 28, 2024 1

The hardcoded defaults for v2 are here (branch v2):

impl Default for AlignPairwiseParams {
fn default() -> Self {
Self {
min_length: 100,
penalty_gap_extend: 0,
penalty_gap_open: 6,
penalty_gap_open_in_frame: 7,
penalty_gap_open_out_of_frame: 8,
penalty_mismatch: 1,
score_match: 3,
max_indel: 400,
seed_length: 21,
min_seeds: 10,
min_match_rate: 0.3,
seed_spacing: 100,
mismatches_allowed: 3,
retry_reverse_complement: false,
no_translate_past_stop: false,
left_terminal_gaps_free: true,
right_terminal_gaps_free: true,
excess_bandwidth: 9,
terminal_bandwidth: 50,
gap_alignment_side: GapAlignmentSide::Right,
}
}
}

For v3 (not stable, branch master) the hardcoded defaults are here:

impl Default for AlignPairwiseParams {
fn default() -> Self {
Self {
min_length: 100,
penalty_gap_extend: 0,
penalty_gap_open: 6,
penalty_gap_open_in_frame: 7,
penalty_gap_open_out_of_frame: 8,
penalty_mismatch: 1,
score_match: 3,
max_band_area: 500_000_000, // requires around 500Mb for paths, 2GB for the scores
max_indel: 400, // obsolete
seed_length: 21, // obsolete
min_seeds: 10, // obsolete
min_match_rate: 0.3, // obsolete
seed_spacing: 100, // obsolete
mismatches_allowed: 3, // obsolete
retry_reverse_complement: false,
no_translate_past_stop: false,
left_terminal_gaps_free: true,
right_terminal_gaps_free: true,
gap_alignment_side: GapAlignmentSide::Right,
excess_bandwidth: 9,
terminal_bandwidth: 50,
min_seed_cover: 0.33,
kmer_length: 10, // Should not be much larger than 1/divergence of amino acids
kmer_distance: 50, // Distance between successive kmers
min_match_length: 40, // Experimentally determined, to keep off-target matches reasonably low
allowed_mismatches: 8, // Ns count as mismatches
window_size: 30,
max_alignment_attempts: 3,
}
}
}

There are 2 important changes to consider in the upcoming Nextclade v3:

  • alignment algo is changed quite a bit, so the params will change
  • Nextalign executable is removed. Instead, Nextclade will take over the same job. In the new dataset format most files will be optional (and the dataset is also optional, so individual input args can be used) - all this to emulate the interface of Nextalign and to facilitate incremental development of datasets.

Because we are removing Nextalign, it does not make sense to add params into its help text anymore, as we are not planning any more releases.

Regarding Nextclade: the datasets can (and do) override parameters (using virus_properties.json file for v2 and pathogen.json in the v3), because different viruses sometimes need some different tuning. So I think that the displayed hardcoded number might be inaccurate and misleading, depending on which dataset you are planning to run. But let me know if you think it makes sense to add hardcoded defaults to Nextclade v3 anyways.

In the meantime, one thing you can try is to add -v (--verbose) flag to the run command, and then the program should print the final values for this particular run, already taking into account values (in this order) in:

  • dataset (if using Nextclade and if they are defined)
  • CLI args (if an arg is provided)
  • hardcoded defaults

UPD:

This statement is incorrect for v2:

already taking into account values (in this order) in

Nextclade/Nextalign v2 only print the CLI args, before merging-in the defaults, which is probably not very useful. This will change in v3.

from nextclade.

ivan-aksamentov avatar ivan-aksamentov commented on May 28, 2024 1

If you want to try Nextclade v3:

You can download prebuilt binaries on GitHub Actions:

Or you can build it from source, from master branch, using our dev guide:
https://github.com/nextstrain/nextclade/blob/master/docs/dev/developer-guide.md

But v3 is not released and not stable yet. It's a bit of a crazy land still, and things might break. In which case you can try a slightly earlier version in the list of GitHub Actions. When things calm down a bit, we'll probably release an alpha version, or a few.

We appreciate early testing and feedback!

from nextclade.

corneliusroemer avatar corneliusroemer commented on May 28, 2024

from nextclade.

AngieHinrichs avatar AngieHinrichs commented on May 28, 2024

Thanks @ivan-aksamentov! I will give both a try. I see v3 can be run without a dataset if --input-ref is provided, great. 🚀

from nextclade.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.