Comments (2)
Worth considering that GPTMD with the canonical + isoform FASTA database makes this less necessary. smith-chem-wisc/MetaMorpheus#1842
from mzlib.
To do this, I've been getting the isoform
nodes within each protein entry, and using the ref
label to get which splice variants need to be applied.
Here are some methods I wrote recently in python for this, in case that's of use later:
def get_splice_variants(entry: ET.Element) -> dict:
"""
Extracts splice variants from the provided UniProt entry.
Args:
entry (Element): An Element object representing a UniProt entry.
Returns:
dict: A dictionary mapping isoform IDs to lists of tuples containing variation ID, start, and end positions.
"""
splice_variants = {}
comment_xpath = '{http://uniprot.org/uniprot}comment[@type="alternative products"]'
isoform_xpath = '{http://uniprot.org/uniprot}isoform'
id_xpath = '{http://uniprot.org/uniprot}id'
seq_xpath = '{http://uniprot.org/uniprot}sequence'
for comment in entry.findall(comment_xpath):
for isoform in comment.findall(isoform_xpath):
isoform_id = isoform.findtext(id_xpath)
refs = isoform.find(seq_xpath).get('ref')
if refs:
refs = refs.split()
variant_info = []
for ref in refs:
variant = entry.find("{http://uniprot.org/uniprot}feature[@type='splice variant'][@id='" + ref + "']")
if variant is not None:
description = variant.attrib.get("description")
location = variant.find('{http://uniprot.org/uniprot}location')
original = variant.find("{http://uniprot.org/uniprot}original")
variation = variant.find("{http://uniprot.org/uniprot}variation")
if original is not None and variation is not None:
original = original.text
variation = variation.text
start = location.find('{http://uniprot.org/uniprot}begin')
end = location.find('{http://uniprot.org/uniprot}end')
if start is None:
start = int(location.find('{http://uniprot.org/uniprot}position').attrib['position']) - 1
end = start
else:
start = int(start.attrib.get('position', None)) - 1
end = int(end.attrib.get('position', None))
variant_info.append((ref, description, start, end, original, variation))
if variant_info:
splice_variants[isoform_id] = variant_info
return splice_variants
def apply_splice_variants(sequence: str, refs: List[Tuple[str, str, int, int, str, str]]) -> str:
"""
Applies the given splice variants to the given sequence.
Args:
sequence (str): The sequence to apply the splice variants to.
refs (List[Tuple[str, str, int, int, str, str]]): A list of tuples containing variation ID, start, and end positions.
Returns:
str: The modified sequence with the splice variants applied.
"""
for ref in sorted(refs, key=lambda x: x[3], reverse=True): # Apply splice variants with later positions first
ref_id, description, start, end, original, variation = ref
if variation is not None:
sequence = sequence[:start] + variation + sequence[end:]
else:
sequence = sequence[:start] + sequence[end:]
return sequence
from mzlib.
Related Issues (20)
- Refactor Averagine Code
- Improvements to plots and graphs HOT 1
- Depracated PSI-MOD.obo.xml HOT 2
- extra line breaks in mgf cause crash in LoadAllStaticData HOT 1
- Create icon for mzLib HOT 7
- Improve clarity of lack of ThermoRawFileReader support for macOS
- MGF peak list parser expects field separator to be SPACE HOT 4
- Updating to use .NET 6; end of support for .NET 5 is May 8, 2022
- Can't build on mac HOT 11
- Kullback-Leibler Divergence is 0 if no peaks match HOT 2
- extra trailing tab in output HOT 2
- IsotopicDistribution returns masses and intensities as separate IEnumerables
- Normalization methods in SpectralSimilarity have side effects
- mzML reader enforces order of MS1 and MS/MS spectra in the file
- Calculated number of averagines is only 1500
- Zero-equivalent intensities in MzML.GetOneBasedScanFromDynamicConnection
- Null ChemicalFormula throws NullReferenceException
- Binary search related bug in MzSpectrum.Extract HOT 2
- UnimodLoader throws KeyNotFoundException HOT 1
- Legacy comment update HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mzlib.