Code Monkey home page Code Monkey logo

Comments (2)

alisandra avatar alisandra commented on August 12, 2024

Hi Sofia,

Thanks for the question.

The --min-coding-length is indeed a cutoff for length, but the default is 100 nucleotides in the CDS,
so much shorter than 200aa and it's unlikely this threshold is responsible for your missing genes.

A few other parameters may be worth playing with, namely

  • reducing--edge-threshold (default 0.1) may reduce fragmentation of genes (but will increase run time, and in rare cases lead to concatenated gene models)
  • reducing --peak_threshold (default 0.8) may increase recall (but will reduce precision)

However it's likely that the neural net simply didn't learn a good representation for this class of genes, and
you're right that retraining may help. Certainly 3,000 gene copies from a single family should be enough to drastically improve performance on that family. While I haven't tried to boost performance by gene family, I could potentially speculate on how I'd try.

Before I do that, a question: are you interested in only that gene family, or whole genome annotations that specifically perform better on that gene family?

from helixer.

srobb1 avatar srobb1 commented on August 12, 2024

Hi Alisandra,
Thank you for the reply. Looking at the parameters that you listed, my guess is that I would still likely miss my genes. It would probably be best to create a new model using the 3k genes in this family.

On this project (alfalfa plus other plants), I would only need to improve gene models for this gene family. There are published genes that are good for the whole genome but they are missing this family so we tried helixer specifically to see if we could find those genes, and we didn't.

I have other organisms in totally different projects that I would like to improve the whole genome annotations (first in line a couple sea anemones and corals). I will try to follow the documentation on how to build models for new organisms. I get a wide variety of species, especially invertebrates that come to me for structural and functional annotation. Helixer seems like a great option that has shown to provide good models in a short time for a some other species (vertebrates) that I have helped with.

Sofia

from helixer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.