kntkb / rna-bgsu Goto Github PK
View Code? Open in Web Editor NEWCreate RNA dataset for QCArchive submission
Create RNA dataset for QCArchive submission
Corresponding notebook:
qca-dataset-submission_TEST/generate-dataset.ipynb
Issue:
I get the following error when I try to create a dataset for QCArchive submission using small portion of my structure (pdb/sdf) data. I'm able to successfully create a dataset for some part of my data, for example sample/AAA/*.pdb
, but fails for others.
Deduplication : 62%|██████▉ | 30/48 [00:00<00:00, 31.90it/s]
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Input In [101], in <cell line: 1>()
----> 1 dataset = factory.create_dataset(dataset_name="RNA Single Point Dataset v1.0",
2 molecules=mols,
3 tagline="QM dataset for ML",
4 description="Data source: https://github.com/kntkb/rna_bgsu")File ~/mambaforge/envs/qcarchive-user-submit/lib/python3.10/site-packages/openff/qcsubmit/factories.py:438, in >BaseDatasetFactory.create_dataset(self, dataset_name, molecules, description, tagline, metadata, processors, toolkit_registry, >verbose)
435 toolkit_registry = GLOBAL_TOOLKIT_REGISTRY
437 # create an initial component result
--> 438 workflow_molecules = self._create_initial_component_result(
439 molecules=molecules, toolkit_registry=toolkit_registry
440 )
442 # create the dataset
443 # first we need to instance the dataset and assign the metadata
444 object_meta = self.dict(exclude={"workflow", "type"})File ~/mambaforge/envs/qcarchive-user-submit/lib/python3.10/site-packages/openff/qcsubmit/factories.py:374, in >BaseDatasetFactory._create_initial_component_result(self, molecules, toolkit_registry)
364 workflow_molecules = ComponentResult(
365 component_name=self.type,
366 component_description={"type": self.type},
(...)
370 ],
371 )
373 else:
--> 374 workflow_molecules = ComponentResult(
375 component_name=self.type,
376 component_description={"type": self.type},
377 component_provenance=self.provenance(toolkit_registry=toolkit_registry),
378 molecules=molecules,
379 )
381 return workflow_moleculesFile ~/mambaforge/envs/qcarchive-user-submit/lib/python3.10/site->packages/openff/qcsubmit/workflow_components/utils.py:631, in ComponentResult.init(self, component_name, >component_description, component_provenance, molecules, input_file, input_directory, skip_unique_check, verbose)
623 if molecules is not None:
624 for molecule in tqdm.tqdm(
625 molecules,
626 total=len(molecules),
(...)
629 disable=not verbose,
630 ):
--> 631 self.add_molecule(molecule)File ~/mambaforge/envs/qcarchive-user-submit/lib/python3.10/site->packages/openff/qcsubmit/workflow_components/utils.py:727, in ComponentResult.add_molecule(self, molecule)
717 if not self.skip_unique_check and molecule_hash in self._molecules:
718 # we need to align the molecules and transfer the coords and properties
719 # get the mapping, drop some comparisons to match inchikey
720 isomorphic, mapping = off.Molecule.are_isomorphic(
721 molecule,
722 self._molecules[molecule_hash],
(...)
725 bond_order_matching=False,
726 )
--> 727 assert isomorphic is True
728 # transfer any torsion indexes for similar fragments
729 if "dihedrals" in molecule.properties:
730 # we need to transfer the properties; get the current molecule dihedrals indexer
731 # if one is missing create a new oneAssertionError:
Create additional datasets:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.