Comments (22)
Just for reference: the standard specification contains examples for the possible values in the different sections and elements for mzTab-M 2.0:
https://hupo-psi.github.io/mzTab/2_0-metabolomics-release/mzTab_format_specification_2_0-M_release.html#format-specification
from rmztab-m.
Thanks Johannes, just adding some details to this:
-
The validator is not able to parse the different tables from our .mztabm file, (even when adding "fake" sml data or a single null entry as Johannes said), would you know why that would be the case ?
-
For Johannes second question, there is also this problem for the id_confidence_measure[1], as we don't identify anything in xcms this would need to be null but it is not allowed by the format.
from rmztab-m.
@jorainer @philouail Which validator are you using? We could think about relaxing the SML section requirement to enable usage of mzTab-M as an intermediate format.
from rmztab-m.
This is the link to the validator: https://apps.lifs-tools.org/mztabvalidator/
would you know of another one ? I'm also worried that something is wrong with our file.
from rmztab-m.
The URL is the right one. Leaving the summary and evidence tables out altogether will fail the parse. Would it be possible for you to share an example file with me so that I can give you direct feedback on it?
from rmztab-m.
@jorainer Concerning semi quantitative abundances: small_molecule-quantification_unit only applies to the values in the SML table, while small_molecule_feature-quantification_unit applies to the values reported in the SMF table. In the default semantic validation mapping file (see https://github.com/HUPO-PSI/mzTab/blob/master/specification_document-releases/2_0-Metabolomics-Release/mzTab_2_0-M_mapping.xml, only applied if you use semantic validation mode), they can have cv terms that are children of any of the following root terms:
<CvTerm termAccession="PRIDE:0000392" useTerm="false" termName="Quantification unit" isRepeatable="false" allowChildren="true" cvIdentifierRef="PRIDE"></CvTerm>
<CvTerm termAccession="UO:0000051" useTerm="false" termName="concentration unit" isRepeatable="false" allowChildren="true" cvIdentifierRef="UO"></CvTerm>
<CvTerm termAccession="MS:1000043" useTerm="false" termName="intensity unit" isRepeatable="false" allowChildren="true" cvIdentifierRef="MS"></CvTerm>
<CvTerm termAccession="UO:0000006" useTerm="false" termName="substance unit" isRepeatable="false" allowChildren="true" cvIdentifierRef="UO"></CvTerm>
Children of PRIDE:0000392
Children of UO:0000051
Children of MS:1000043
Children of UO:0000006
For your particular use-case, following how mzML annotates "intensity" values, you should hopefully be able to use any of the children of MS:1000043. If non of them fit, could you check if the MS CV contains a more suitable term? We can then update the semantic validation mapping file.
from rmztab-m.
Here is an example
test.mztab.txt
I had to switch it to .txt because github does not support .mztab, if you want to original i can send it by email.
from rmztab-m.
Thanks for the test file, I will check it and report back what we may do with minimal changes to the file. We may need to publish an amendment to the standard, if we decide to relax some requirements to strong recommendations. I will need to update the parser / validator implementation jmztab-m soon anyway, since the OLS 3 validation endpoint no longer appears to work for me. A development version of the web-based validator is now deployed at https://apps.lifs-tools.org/mztabvalidator-dev
from rmztab-m.
Amazing thanks for the help !
from rmztab-m.
I have updated your file, which now passes validation (in principle):
xcms-test-export.mztab.txt
Validation results (basic and semantic) with the current default mapping file are available here:
https://apps.lifs-tools.org/mztabvalidator/result/a07569b1-3ba5-493b-95d7-bac34ce667b3
The errors shown are all only due to the semantic validation mapping file having required terms, we can create a custom xcms semantic validation mapping file to facilitate further adoption.
This is just a first draft, though. For now, I have added 1 SML entry that does not link to any grouped feature entry, but without abundances. Depending on the workflow, subsequent tools would be able to read the features, run an identification step and record the results in a new mzTab-M file that then contains a proper SML table.
Please note that I added a charge of 1 in the SMF table to all features. This is the (net) charge (positive integer) of the ion / m/z. Not sure if your workflow allows determination of the charge of features and adducts at this level of analysis. If not, please let me know, we should be able to adapt how this is handled.
from rmztab-m.
Validation result with the following relaxed semantic validation file yields only info level messages:
https://apps.lifs-tools.org/mztabvalidator/result/bf180e3b-7e67-4fcb-8dc3-a008027d4d10
An adapted semantic validation mapping file for feature-only files based on XCMS is available here:
https://github.com/HUPO-PSI/mzTab/blob/master/examples/2_0-Metabolomics_Release/mzTab_2_0-M_mapping-xcms.xml
from rmztab-m.
Am a bit late for the party ;)
Please note that I added a charge of 1 in the SMF table to all features. This is the (net) charge (positive integer) of the ion / m/z.
after xcms
preprocessing we actually only have a feature table with abundances (semi-quantitative) - each feature being characterized by an m/z and retention time value. No additional information (like charge etc) are available at this stage (we would assume that most features have charge one - but we don't know). That, along with other information like adduct or compound annotation would needed to be added by a separate software further downstream in the analysis.
Don't know what's better here - changing the definition of mzTab-M or simply putting (like you did) some best-guess defaults.
For our test file, did I understand correctly that you had to tweak/change the validator to be able to read a mzTab-M without SML?
from rmztab-m.
Understood, that would mean that we need to change charge to optional (nullable) in the spec and update the schema and validator implementation. I would not recommend to put in best guesses, that might lead to confusion about what is meant and without a clearly defined way of encoding this kind of information, people and tools will pick a way to interpret it.
For the example I provided two days ago, I did not adapt the validator, just your file and provided a different semantic mapping file. All linked further up in this thread. But to be able to validate your file without charges, we will need to alter the spec + schema + implementation.
from rmztab-m.
Hi Nils,
Thanks for the feedback and the help.
I completely agree with your perspective. I believe making certain elements nullable would facilitate a more intermediate style file format, which aligns better with xcms. Just wanted to summaries the points that need to be addressed to create this intermediate style format.
Regarding the mzTab recommendation found here: https://hupo-psi.github.io/mzTab/2_0-metabolomics-release/mzTab_format_specification_2_0-M_release.html#metadata-section and the validator. There are a few points to address:
- small_molecule-quantification_unit Although it's marked as mandatory, it seems unnecessary for xcms results if SML has no inputs.
- small_molecule-identification_reliability This was added to our file but is only mandatory in certain cases and not relevant for xcms results.
- id_confidence_measure[1-n] Similarly, this is mandatory but not relevant for xcms outcomes.
What do you think ? I believe these requirements could probably be relaxed in both the file format definition and validator.
Furthermore:
- reliability in SML: it's nullable in the mzTab file format definition, and it would make sense to change it to optional in the validator for xcms.
- charge in SMF: Making it optional in the validator, as you suggested, is also necessary.
I will adapt my code for some of the other changes that you made that make sense in the context of xcms. But in term of general structure, it was fine for you ? Would you rather we wait until the validator is adapted for us to publish this export method for xcms results?
from rmztab-m.
Ideally this would be the intermediate file that xcms would provide:
test.mztab.txt
How does this looks ? (mainly changed the metadata so it makes sense for an xcms output). do you want us to force the input of some other variable ?
Also to be noted that we allow to pass optional column in the SMF as xcms can provide more information than the file format ask for. it would of course follow the required format of opt_column_name
from rmztab-m.
Hi @nilshoffmann,
how are other softwares handling this ? We have examples from MS-Dial in gcms_tms_height, which also had to shoe-horn unidentified features into SML.
I haven't found an mzMine3 example yet.
Yours, Steffen
from rmztab-m.
Related Issues (15)
- Adapt template for swagger-codegen to properly escape underscore names
- Dependency Dashboard
- Adapt swagger-codegen template to create properly escaped comments with multiline content.
- SMILES containing # quits parsing the remaining line
- dplyr not imported HOT 1
- Metadata parsing: database name not parsed HOT 2
- undefined columns selected when fromDataFrame() without SME section
- convertMzTab2MAF() only handles mzTab with nrow(smlTable) == nrow(smfTable)
- Add examples to show how to create mzTab-M from XCMS and MSnbase objects
- General questions
- Implement write to TSV support.
- Implement read from TSV support.
- Dependency: openapi HOT 1
- Failure in assay HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rmztab-m.