Code Monkey home page Code Monkey logo

Comments (14)

Zulko avatar Zulko commented on July 21, 2024 1

Gee thats all more complicated than it should be. I believe there is still something wrong with the MANIFEST, maybe recursive-include instead of include or something like that. I will do some tests on monday. Thanks for the reports!

from codon-usage-tables.

Zulko avatar Zulko commented on July 21, 2024

This may have been a problem with the MANIFEST file. I have now added this file to the manifest and pushed a new version on pypi. Could you try again and let me know if it works?

from codon-usage-tables.

weitzner avatar weitzner commented on July 21, 2024

Looks like we are hitting a different error now

Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/_f/5kqpr8kx5zl9qjtkzzkb31xw0000gn/T/pip-install-m1m1kpun/python-codon-tables/setup.py", line 19, in <module>
        with open(os.path.join('python_codon_tables', 'README.rst'), 'r') as f:
    FileNotFoundError: [Errno 2] No such file or directory: 'python_codon_tables/README.rst'

from codon-usage-tables.

Zulko avatar Zulko commented on July 21, 2024

The MANIFEST, again... Sorry for these, they are trivial but they are not caught by the test suite because it doesnt use pip. I'll fix it

from codon-usage-tables.

Zulko avatar Zulko commented on July 21, 2024

I fixed it on Github but I'll only push the new version to PyPI tomorrow. In the meantime you can also install from directly from Github with pip:

pip install git+https://github.com/Edinburgh-Genome-Foundry/codon-usage-tables.git

from codon-usage-tables.

weitzner avatar weitzner commented on July 21, 2024

Thanks!

from codon-usage-tables.

Zulko avatar Zulko commented on July 21, 2024

Done. Let me know if I can close this one.

from codon-usage-tables.

weitzner avatar weitzner commented on July 21, 2024

Pip install works, but importing fails:

FileNotFoundError: [Errno 2] No such file or directory: '.../miniconda3/envs/codon_harmony/lib/python3.7/site-packages/python_codon_tables/../data/tables'

from codon-usage-tables.

Zulko avatar Zulko commented on July 21, 2024

Ok, I managed to reproduce your bug and to fix it, at least on my machine. Could you try a pip --upgrade and let me know if it now works for you (I am confident it will).

Beware that today I also changed the API in subtle ways so make sure you use the methods highlighted in the example. On the good side, there is a new feature table = download_codons_table(taxid=XXX) which allows you to get the table for any taxID.

from codon-usage-tables.

weitzner avatar weitzner commented on July 21, 2024

Great! It seems to be working now. I am going to try to integrate this into my project https://github.com/weitzner/codon_harmony Thanks!

from codon-usage-tables.

Zulko avatar Zulko commented on July 21, 2024

Hey it seems that your project could use DnaChisel, a generic DNA optimization library for Python which I am very proud of (I am surely biased!)

Here is how (some of) your project's specifications would be formulated using DnaChisel. Some specifications might not be exactly as you want them (in particular there has been some discussion around the codon harmonization) but the library is written so as to be easily extended by the user, so maybe it could work for you:

import dnachisel as dc

# GENERATE A RANDOM PROTEIN SEQUENCE FOR THE EXAMPLE
aa_sequence = dc.random_protein_sequence(1000)
dna_sequence = dc.reverse_translate(aa_sequence)

# SPECIFY THE CONSTRAINTS AND OBJECTIVES
problem = dc.DnaOptimizationProblem(
    sequence=dna_sequence,
    constraints=[
        dc.EnforceTranslation(translation=aa_sequence), # keep the protein sequence
        dc.EnforceGCContent(mini=0.3, maxi=0.7, window=70),
        dc.AvoidHairpins(stem_size=10),
        dc.AvoidPattern(dc.repeated_kmers(3, 3)),
        dc.AvoidPattern(dc.repeated_kmers(9, 2)),
        dc.AvoidPattern(enzyme='BsmBI'),
        *(dc.AvoidPattern(dc.homopolymer_pattern(c, 6)) for c in "ATGC")
    ],
    objectives=[
        dc.CodonOptimize(species='e_coli')
    ]
)

# SOLVE THE CONSTRAINTS, THEN OPTIMIZE
print ("BEFORE:", problem.constraints_text_summary())
problem.resolve_constraints()
problem.optimize()
print ("AFTER:", problem.constraints_text_summary())

Output:

BEFORE: ===> FAILURE: 5 constraints evaluations failed
✔PASS ┍ EnforceTranslation[0-3000(+)]
      │ All OK.
✔PASS ┍ EnforceGCContent[0-3000(+)](mini:0.30, maxi:0.70, window:70)
      │ Passed !
✔PASS ┍ AvoidHairpins[0-3000(+)](stem_size:10, hairpin_window:200)
      │ Score:         0. Locations: []
 FAIL ┍ AvoidPattern[0-3000(+)](([ATGC]{3})\1{2} (3-repeats 3-mers))
      │ Failed. Pattern found at positions [97-106(+), 98-107(+), 99-108(+),
      │ 100-109(+), 172-181(+), 1453-1462(+), 1454-1463(+), 2967-2976(+)]
 FAIL ┍ AvoidPattern[0-3000(+)](([ATGC]{9})\1{1} (2-repeats 9-mers))
      │ Failed. Pattern found at positions [1420-1438(+)]
 FAIL ┍ AvoidPattern[0-3000(+)](enzyme:BsmBI)
      │ Failed. Pattern found at positions [1594-1600(-), 337-343(-)]
 FAIL ┍ AvoidPattern[0-3000(+)](AAAAAA)
      │ Failed. Pattern found at positions [790-796(+), 1226-1232(+),
      │ 1963-1969(+), 1964-1970(+), 1965-1971(+), 2206-2212(+), 2207-2213(+),
      │ 2810-2816(+)]
 FAIL ┍ AvoidPattern[0-3000(+)](TTTTTT)
      │ Failed. Pattern found at positions [2810-2816(-), 2207-2213(-),
      │ 2206-2212(-), 1965-1971(-), 1964-1970(-), 1963-1969(-), 1226-1232(-),
      │ 790-796(-)]
✔PASS ┍ AvoidPattern[0-3000(+)](GGGGGG)
      │ Passed. Pattern not found !
✔PASS ┍ AvoidPattern[0-3000(+)](CCCCCC)
      │ Passed. Pattern not found !


AFTER: ===> SUCCESS - all constraints evaluations pass
✔PASS ┍ EnforceTranslation[0-3000(+)]
      │ All OK.
✔PASS ┍ EnforceGCContent[0-3000(+)](mini:0.30, maxi:0.70, window:70)
      │ Passed !
✔PASS ┍ AvoidHairpins[0-3000(+)](stem_size:10, hairpin_window:200)
      │ Score:         0. Locations: []
✔PASS ┍ AvoidPattern[0-3000(+)](([ATGC]{3})\1{2} (3-repeats 3-mers))
      │ Passed. Pattern not found !
✔PASS ┍ AvoidPattern[0-3000(+)](([ATGC]{9})\1{1} (2-repeats 9-mers))
      │ Passed. Pattern not found !
✔PASS ┍ AvoidPattern[0-3000(+)](enzyme:BsmBI)
      │ Passed. Pattern not found !
✔PASS ┍ AvoidPattern[0-3000(+)](AAAAAA)
      │ Passed. Pattern not found !
✔PASS ┍ AvoidPattern[0-3000(+)](TTTTTT)
      │ Passed. Pattern not found !
✔PASS ┍ AvoidPattern[0-3000(+)](GGGGGG)
      │ Passed. Pattern not found !
✔PASS ┍ AvoidPattern[0-3000(+)](CCCCCC)
      │ Passed. Pattern not found !

from codon-usage-tables.

weitzner avatar weitzner commented on July 21, 2024

Wow, that seems to be very close to what I could directly. Can a dc.DnaOptimizationProblem contain multiple dc.EnforceGCContents? I have a particular way of defining "harmony" as well as a strategy for determining which codons to enrich and. deplete. I'd be interested to serif we could merge the strategies somehow. Let me know if that would be of interest to you!

from codon-usage-tables.

Zulko avatar Zulko commented on July 21, 2024

Yes you can put different GC contents for instance:

constraints = [
    EnforceGCContent(mini=0.4, maxi=0.6), # global GC
    EnforceGCContent(mini=0.7, maxi=0.3, window=100), # windowed GC
    EnforceGCContent(mini=0.9, maxi=0.2, window=30), # smaller-windowed GC
]

Regarding the codon harmonization I would definitely be interested in whether/how your codon optimization can be ported into a Specification class. Most DnaChisel specs implement a strategy in which they "scan" the sequence, spot "underoptimal" regions, and optimize these locally, one after another, from left to right. But a new Specification class can also define its own resolution strategy and you are not obliged to follow this pattern. Could you describe briefly how your harmonization score is computed and how it is optimized?

from codon-usage-tables.

weitzner avatar weitzner commented on July 21, 2024

First a few tolerances are set – usage frequency below which a codon will be excluded from the set (currently defaults to 0.10, so if a codon is used < 10%. of the time, it is not considered here), and a maximum allowed deviation from the host profile (1 + relax in the other package).

To compute the idea codon usage, the codon usage tables are updated (rare codons are dropped, frequencies are recomputed), and then, using the AA sequence, the desired use of each codon (as integers) is calculated. After this, the current DNA sequence is scanned with each codon's position(s) and count recorded, and the residual of the expected usage vs observed usage is computed. And then, basically, you just go through the list of codons that are over-represented and replace them with those that are under-represented.

After all that, the codon adaptation index is computed, and the sequence that matches the host profile and doesn't have the undesirable features with the highest CAI is outputted to disk.

from codon-usage-tables.

Related Issues (4)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.