Code Monkey home page Code Monkey logo

turbomam / mixs-subset-examples-first Goto Github PK

View Code? Open in Web Editor NEW
0.0 0.0 0.0 11.1 MB

A subset of the MIxS specification that's self-documenting and DataHarmonizer compatible. Comes with valid and invalid data examples. Subset = all checklists and all environmental packages, but partial combinations.

Home Page: https://turbomam.github.io/mixs-subset-examples-first/

License: MIT License

Makefile 2.20% Python 97.58% Shell 0.01% Jupyter Notebook 0.04% HTML 0.04% JavaScript 0.06% CSS 0.08%

mixs-subset-examples-first's People

Contributors

only1chunts avatar turbomam avatar

Watchers

 avatar  avatar

mixs-subset-examples-first's Issues

make test-examples -> 'Agriculture' is not defined

ERROR:root:Error compiling generated python code: name 'Agriculture' is not defined
Traceback (most recent call last):
File "/home/mark/.cache/pypoetry/virtualenvs/mixs-subset-examples-first-1BdAgHnd-py3.9/bin/linkml-run-examples", line 8, in
sys.exit(cli())
File "/home/mark/.cache/pypoetry/virtualenvs/mixs-subset-examples-first-1BdAgHnd-py3.9/lib/python3.9/site-packages/click/core.py", line 1130, in call
return self.main(*args, **kwargs)
File "/home/mark/.cache/pypoetry/virtualenvs/mixs-subset-examples-first-1BdAgHnd-py3.9/lib/python3.9/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/home/mark/.cache/pypoetry/virtualenvs/mixs-subset-examples-first-1BdAgHnd-py3.9/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/mark/.cache/pypoetry/virtualenvs/mixs-subset-examples-first-1BdAgHnd-py3.9/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/mark/.cache/pypoetry/virtualenvs/mixs-subset-examples-first-1BdAgHnd-py3.9/lib/python3.9/site-packages/linkml/workspaces/example_runner.py", line 306, in cli
runner.process_examples()
File "/home/mark/.cache/pypoetry/virtualenvs/mixs-subset-examples-first-1BdAgHnd-py3.9/lib/python3.9/site-packages/linkml/workspaces/example_runner.py", line 136, in process_examples
self.process_examples_from_list(input_examples, fmt, False)
File "/home/mark/.cache/pypoetry/virtualenvs/mixs-subset-examples-first-1BdAgHnd-py3.9/lib/python3.9/site-packages/linkml/workspaces/example_runner.py", line 195, in process_examples_from_list
obj = self._load_from_dict(input_dict, target_class=tc)
File "/home/mark/.cache/pypoetry/virtualenvs/mixs-subset-examples-first-1BdAgHnd-py3.9/lib/python3.9/site-packages/linkml/workspaces/example_runner.py", line 245, in _load_from_dict
v2 = self._load_from_dict(v, target_class=islot.range)
File "/home/mark/.cache/pypoetry/virtualenvs/mixs-subset-examples-first-1BdAgHnd-py3.9/lib/python3.9/site-packages/linkml/workspaces/example_runner.py", line 250, in _load_from_dict
return [self._load_from_dict(x, target_class) for x in dict_obj]
File "/home/mark/.cache/pypoetry/virtualenvs/mixs-subset-examples-first-1BdAgHnd-py3.9/lib/python3.9/site-packages/linkml/workspaces/example_runner.py", line 250, in
return [self._load_from_dict(x, target_class) for x in dict_obj]
File "/home/mark/.cache/pypoetry/virtualenvs/mixs-subset-examples-first-1BdAgHnd-py3.9/lib/python3.9/site-packages/linkml/workspaces/example_runner.py", line 247, in _load_from_dict
py_target_class = getattr(self.python_module, camelcase(target_class))
File "/home/mark/.cache/pypoetry/virtualenvs/mixs-subset-examples-first-1BdAgHnd-py3.9/lib/python3.9/site-packages/linkml/workspaces/example_runner.py", line 99, in python_module
self._python_module = pygen.compile_module()
File "/home/mark/.cache/pypoetry/virtualenvs/mixs-subset-examples-first-1BdAgHnd-py3.9/lib/python3.9/site-packages/linkml/generators/pythongen.py", line 82, in compile_module
raise e
File "/home/mark/.cache/pypoetry/virtualenvs/mixs-subset-examples-first-1BdAgHnd-py3.9/lib/python3.9/site-packages/linkml/generators/pythongen.py", line 78, in compile_module
return compile_python(pycode)
File "/home/mark/.cache/pypoetry/virtualenvs/mixs-subset-examples-first-1BdAgHnd-py3.9/lib/python3.9/site-packages/linkml_runtime/utils/compile_python.py", line 47, in compile_python
exec(spec, module.dict)
File "test", line 187, in
File "test", line 195, in Database
NameError: name 'Agriculture' is not defined
make: *** [Makefile:159: examples/output] Error 1

filter assets/mixs_combined.tsv on Environmental package = soil, water or (empty)

No conflicts between soil, water and (empty)!

Lots of conflicts with agriculture.

What might have a small number of conflicts?

Environmental package Count - Environmental package
agriculture 161
air 29
built environment 162
food-animal and animal feed 105
food-farm environment 144
food-food production facility 104
food-human foods 112
host-associated 50
human-associated 53
human-gut 37
human-oral 36
human-skin 37
human-vaginal 45
hydrocarbon resources-cores 82
hydrocarbon resources-fluids/swabs 86
microbial mat/biofilm 64
miscellaneous natural or artificial environment 46
plant-associated 76
sediment 70
soil 58
symbiont-associated 72
wastewater/sludge 40
water 85
(empty)
Total Result 1754

Make a checklist oriented usage sheet?

Usage of terms in Checklist classes would be completely redundant. In the output of XXX, we informally model these usages? assignments? in terms of a "core" grouping

Already have

  • checklist oriented class-slot assignment and requirement sheet: data/core_requirements_recommended_required_curated.tsv

Starting on

  • environmental package usage sheet, based on data/mixs_v6_environmental_packages.tsv

Add env package requirements sheet. Include usage and or annotations?

would be based on data/mixs_v6_environmental_packages.tsv

Environmental package Structured comment name Package item Definition Expected value Value syntax Example Requirement Preferred unit Occurrence MIXS ID
air samp_name sample name A local identifier or name that for the material sample used for extracting nucleic acids... text {text} ISDsoil1 M 1 MIXS:0001107

That doesn't take advantage of previous Environmental package and Structured comment name cleanups that went into XXX?

How to look for malformed names in YAML after the fact?

Must include

  • Environmental package -> class
  • Structured comment name -> slot
  • Requirement -> recommended and required
Requirement Count - Requirement
C 163
E 7
M 191
X 1390
(empty)
Total Result 1751

catalog the manually created or curated files in /data

data/core_requirements_recommended_required_curated.tsv

slot class mixs_requirement_value recommended required
> slot class ignore recommended required
samp_name MigsBa M True

etc.

data/Database.tsv

class tree_root slot range multivalued inlined_as_list
> class tree_root slot range multivalued inlined_as_list
Database true
mimssoil_set MimsSoil true true
Database mimssoil_set

data/duplicated_scns_by_id.tsv

host_body_product 2 868 888
host_sex 2 811 862
host_symbiont 2 1298 1309
nose_throat_disord 2 270 283
soil_horizon 2 1082 1291
tot_nitro 2 102 530

data/mixs_combined_all_modified.tsv

data/mixs_combined_all_modified_lossy_deduped.tsv

data/mixs_requirement_codes.tsv

mixs_citation = https://github.com/GenomicsStandardsConsortium/mixs/wiki/5.-MIxS-checklists

mixs_requirement_value mixs_name mixs_desc not applicable optional recommended required
- not applicable descriptor is not applicable for a given checklist type TRUE
C conditional mandatory descriptor must be present for compliance with the checklist, but only when applicable to the study, i.e. if this item is not applicable for the study the metadata data will still be checklist compliant even if it is left out TRUE
E Environment-dependent descriptor must be present depending on the environment the original sample was obtained from TRUE
M mandatory descriptor must be present for compliance with the checklist TRUE
X optional descriptor may be present, not mandatory for compliance with checklist TRUE

data/mixs_v6_checklists_env_packages_classes_curated.tsv

class title aliases class_uri description in_subset is_a mixin mixins
> class title aliases class_uri description in_subset is_a mixin mixins
MigsEu migs_eu MIXS:0010002 Checklist TRUE
MigsBa migs_ba MIXS:0010003 Checklist TRUE
MigsPl migs_pl MIXS:0010004 Checklist TRUE
MigsVi migs_vi MIXS:0010005 Checklist TRUE
MigsOrg migs_org MIXS:0010006 Checklist TRUE
Mims mims MIXS:0010007 Checklist TRUE
MimarksS mimarks_s MIXS:0010008 Checklist TRUE
MimarksC mimarks_c MIXS:0010009 Checklist TRUE
Misag misag MIXS:0010010 Checklist TRUE
Mimag mimag MIXS:0010011 Checklist TRUE
Miuvig miuvig MIXS:0010012 Checklist TRUE
Air air MIXS:0016000 EnvironmentalPackage
BuiltEnvironment built environment MIXS:0016001 EnvironmentalPackage
HostAssociated host-associated MIXS:0016002 EnvironmentalPackage
HumanAssociated human-associated MIXS:0016003 EnvironmentalPackage
HumanGut human-gut MIXS:0016004 EnvironmentalPackage
HumanOral human-oral MIXS:0016005 EnvironmentalPackage
HumanSkin human-skin MIXS:0016006 EnvironmentalPackage
HumanVaginal human-vaginal MIXS:0016007 EnvironmentalPackage
MicrobialMatBiofilm microbial mat/biofilm MIXS:0016008 EnvironmentalPackage
MiscellaneousNaturalOrArtificialEnvironment miscellaneous natural or artificial environment MIXS:0016009 EnvironmentalPackage
PlantAssociated plant-associated MIXS:0016010 EnvironmentalPackage
Sediment sediment MIXS:0016011 EnvironmentalPackage
Soil soil MIXS:0016012 EnvironmentalPackage
WastewaterSludge wastewater/sludge MIXS:0016013 EnvironmentalPackage
Water water MIXS:0016014 EnvironmentalPackage
HydrocarbonResourcesCores hydrocarbon resources-cores MIXS:0016015 EnvironmentalPackage
HydrocarbonResourcesFluidsSwabs hydrocarbon resources-fluids/swabs MIXS:0016016 EnvironmentalPackage
UnknownTerm MIXS:0016017
Agriculture agriculture MIXS:0016018 EnvironmentalPackage
FoodAnimalAndAnimalFeed food-animal and animal feed MIXS:0016019 EnvironmentalPackage
FoodFarmEnvironment food-farm environment MIXS:0016020 EnvironmentalPackage
FoodFoodProductionFacility food-food production facility MIXS:0016021 EnvironmentalPackage
FoodHumanFoods food-human foods MIXS:0016022 EnvironmentalPackage
SymbiontAssociated symbiont-associated MIXS:0016023 EnvironmentalPackage
Checklist
EnvironmentalPackage

use and improve custom mixs jinja

  • don't say class or slot (term)
  • inheritance may not be useful... say combination of ...
  • applicable classes have been lost from slot pages

add structured pattern settings

  • will this be hard to do with schemasheets? two different inner keys in two different columns
  • or put {dicts} in a single column?
  • how about yq

make sure that run-linikml-examples works for *validation* and *conversion*

from https://github.com/microbiomedata/submission-schema/blob/main/project.Makefile

examples/output/README.md: src/nmdc_submission_schema/schema/nmdc_submission_schema.yaml \
src/data/invalid src/data/valid
	mkdir -p $(dir $@)
	# RDF/TTL generation is failing
	# https://github.com/microbiomedata/submission-schema/issues/13
	$(RUN) linkml-run-examples \
		--output-formats json \
		--output-formats yaml \
		--counter-example-input-directory $(word 2,$^) \
		--input-directory $(word 3,$^) \
		--output-directory $(dir $@) \
		--schema $< > $@

Docs aren't building because of unclean tree

https://github.com/turbomam/mixs-subset-examples-first/actions/runs/4671863919/jobs/8273368033

Or GH pages just hadn't been setup yet?

Harshad and Sujay have suggested deleting the GH pages branch when this happens, but I didn't even have on when I got this error

Kai had reported a similar situation in which he solved the problem by allowing write permissions by ? on ??

like this?

https://github.com/turbomam/mixs-subset-examples-first/actions/runs/4668090785/jobs/8264692805

temporarily remove slots with digit-first names

these can't be generated as proprieties of a class in the Python Data Classes

MIXS ID Structured comment name identifier Item aliases range structured_pattern minimum_value maximum_value Occurrence - > multivalued, vmap: {1: false, m: true} see_also examples Example todos comments notes Preferred unit Definition Expected value Value syntax temporal likely enum likely semantic root
> slot_uri slot identifier title aliases range structured_pattern minimum_value maximum_value annotations see_also examples ignore todos comments notes annotations annotations annotations annotations ignore ignore ignore
> inner_key: syntax inner_key: occurrence inner_key: global_preferred_unit inner_key: global_raw_definition inner_key: global_expected_value inner_key: global_value_syntax
>
> internal_separator: "|" internal_separator: "|" internal_separator: "|" internal_separator: "|" internal_separator: "|" internal_separator: "|"
MIXS:0000065 16s_recover 16S recovered boolean 1 yes Can a 16S gene be recovered from the submitted SAG or MAG?
MIXS:0000066 16s_recover_software 16S recovery software string {software};{version};{parameters} 1 rambl;v2;default parameters Tools used for 16S rRNA gene extraction names and versions of software(s), parameters used

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.