rdkit / mmpdb Goto Github PK
View Code? Open in Web Editor NEWA package to identify matched molecular pairs and use them to predict property changes.
License: Other
A package to identify matched molecular pairs and use them to predict property changes.
License: Other
Hi, I've built my MMP database for a set of compounds but am struggling to generate the output I would like.
My use case is a pretty typical one, finding changes that lead to large property change in compounds obtained from patents.
c1ccccc1O X1
c1ccccc1OC X2
c1ccccc1N X3
c1ccccc1OC1CC1 X4
c1cc(Cl)ccc1O X5
What I'm hoping to do is generate a list of matched pairs for a given processed compound. e.g. X1
c1ccccc1* X1 *O X2 OC
c1ccccc1 X1 *O X3 N
c1ccccc1 X1 *O X4 *OC1CC1
etc.
Is this possible?
Thanks
Mike
The transform rules in mmpdblib appears to miss some apparent cases.
A test case with the following structures:
OC(c(cccc1)c1O)=O mol1
CCCCCCCC(c(cc1)cc(C(O)=O)c1O)=O mol2
CCCCCC(c(cc1)cc(C(O)=O)c1O)=O mol3
with some properties:
ID prop
mol1 0.0
mol2 1.0
mol3 1.5
I performed the fragmentation, index and property loading as instructed.
python -m mmpdblib fragment test_struct.tsv --max-rotatable-bonds 20 --num-cuts 3 -o test.fragments
python -m mmpdblib index test.fragments -o test.mmpdb
python -m mmpdblib loadprops --properties test_prop.tsv test.mmpdb
The indexed pairs makes sense.
However, when I run:
python -m mmpdblib transform --smiles 'OC(c(cccc1)c1O)=O' test.mmpdb --explain
I noticed that I cannot get mol2 or mol3, where the rules mol1->mol2 and mol1->mol3 is included in the index step. Did I miss something here? Thank you for your help.
Here's the explanation output:
WARNING: APSW not installed. Falling back to Python's sqlite3 module.
Processing fragment Fragmentation(1, 'N', 7, '1', '*c1ccccc1O', '0', 3, '1', '*C(=O)O', 'O=CO')
variable '*c1ccccc1O' not found as SMILES '[*:1]c1ccccc1O'
No matching rule SMILES found. Skipping fragment.
Processing fragment Fragmentation(1, 'N', 3, '1', '*C(=O)O', '0', 7, '1', '*c1ccccc1O', 'Oc1ccccc1')
variable '*C(=O)O' not found as SMILES '[*:1]C(=O)O'
No matching rule SMILES found. Skipping fragment.
Processing fragment Fragmentation(2, 'N', 6, '11', '*c1ccccc1*', '01', 4, '12', '*C(=O)O.*O', None)
variable '*c1ccccc1*' not found as SMILES '[*:1]c1ccccc1[*:2]'
variable '*c1ccccc1*' not found as SMILES '[*:2]c1ccccc1[*:1]'
No matching rule SMILES found. Skipping fragment.
Processing fragment Fragmentation(1, 'N', 1, '1', '*O', '0', 9, '1', '*c1ccccc1C(=O)O', 'O=C(O)c1ccccc1')
variable '*O' not found as SMILES '[*:1]O'
No matching rule SMILES found. Skipping fragment.
Processing fragment Fragmentation(1, 'N', 9, '1', '*c1ccccc1C(=O)O', '0', 1, '1', '*O', 'O')
variable '*c1ccccc1C(=O)O' not found as SMILES '[*:1]c1ccccc1C(=O)O'
No matching rule SMILES found. Skipping fragment.
Processing fragment Fragmentation(2, 'N', 6, '11', '*c1ccccc1*', '01', 4, '12', '*O.*C(=O)O', None)
variable '*c1ccccc1*' not found as SMILES '[*:1]c1ccccc1[*:2]'
variable '*c1ccccc1*' not found as SMILES '[*:2]c1ccccc1[*:1]'
No matching rule SMILES found. Skipping fragment.
== Product SMILES in database: 0 ==
ID SMILES prop_from_smiles prop_to_smiles prop_radius prop_fingerprint prop_rule_environment_id prop_count prop_avg prop_std
prop_kurtosis prop_skewness prop_min prop_q1 prop_median prop_q3 prop_max prop_paired_t prop_p_value
This isn't so much a bug report, as it is an open question if there is any software like mmpdb that works strictly off chemical formulas (as opposed to chemical structures)? I am working with mass spectrometry data and could very much use such functionality. Thank you in advance.
Currently mmpdb seems to only support SMILES. But rdkit can natively support CXSMILES. Is it possible to extend mmpdb to support CXSMILES?
For our work, we were initially using CXSMILES and were using the comma
delimiter. Using a csvwriter we were enclosing the CXSMILES in double quotes so that the csvreader would know not to split the commas inside the double qoutes. But as it turns turns out mmpdb just uses python's split()
method, which does not take care of ignoring the commas inside qoutes. So, this doesn't work for CXSMILES.
This is not clear to me. Thanks
Dear developer,
I would like to ask how to build mmpdb with large data set.
I tried to build mmpdb with chembl28 data. At first, I made chunk files from over 1 million smiles which came from chemblDB.
Then made fragment files from the chunk data and merge them to one file.
Finally I run mmpdb index command against merged fragment data. But the process was killed due to lack of memory.
Are there any way to build mmpdb from large size of fragments?
My environment 32GB RAM.
Any advice or suggestion are greatly appreciated.
Thanks,
Taka
I have large database 500K compounds and I am interested in finding only few transforms.
Ideally I would like to give transform in the form of smirks.
I understand that it might be easier to ask for a different fragmentation pattern and perform indexing on it.
I can translate the smirks into smarts specifying specific bonds.
For the tool to be useful I would like to be able to provide more than one SMARTS to the --cut-smarts option.
It would be excellent if an option like --cache would allow using a fragmentation file and enhance it by specifying other cut patterns.
Thanks.
marco
Hi there,
I work at a contract research organisation (CRO) that has recently been interested in implementing mmpdb as a part of a drug discovery pipeline. A step of this implementation involves checking for potential security issues before it can be installed internally.
This check was failed due to two areas in the code where SQL injection vulnerabilities appear. For context, an SQL injection vulnerability is a technique in which a call normally used to execute SQL queries could be used by a malicious user to execute unintended actions, like the exposure of sensitive/confidential information in databases or the installation of malware etc.
Out IT team flagged two areas of the code where the vulnerabilities appear. Here is the first (in mmpdblib/peewee.py):-
def execute_sql(self, sql, params=None, require_commit=True):
logger.debug((sql, params))
with self.exception_wrapper():
cursor = self.get_cursor()
try:
cursor.execute(sql, params or ())
except Exception as exc:
if self.get_autocommit() and self.autorollback:
self.rollback()
if self.sql_error_handler(exc, sql, params, require_commit):
raise
else:
if require_commit and self.get_autocommit():
self.commit()
return cursor
The second is in the same location, so I assume their method of flagging vulnerabilities has picked this same SQL injection problem twice.
Would you be interested in addressing this particular vulnerability? If not, this isn't an immediate problem as we can address it internally and submit a PR with the fix? Let me know how you'd like to proceed with this, if at all
Thanks
Hi developers,
I am keeping trying the mmpdb regarding the "transform" function. I found that for 2 cuts and 3 cuts, the new generated SMILES messed up with the atom mappings in the transform rule.
For example,
Original SMILES | New SMILES | from_smiles | to_smiles |
---|---|---|---|
CC(C)c1nc(C(=O)NCc2ccccc2)no1 | CC(C)CNC(=O)N1CCC(c2ccccc2)C1 | [:1]CNC(=O)c1noc([:2])n1 | [:1]CNC(=O)N1CCC([:2])C1 |
CC(C)c1nc(C(=O)NCc2ccccc2)no1 | CC(C)CNC(=O)c1cc(-c2ccccc2)no1 | [:1]CNC(=O)c1noc([:2])n1 | [:1]CNC(=O)c1cc([:2])no1 |
I expect that the transformed linkers (i.e. "to_smiles") should connect with two other unchanged fragments at the same attachment points (*1 & *2) as the old linkers (i.e. "from_smiles"). However, it shows the new generated molecules (i.e. "New SMILES") flip the transformed linker over. In other words, the atom mappings in "From Smiles" to "To Smiles" are correct, but the atom mappings are incorrect in the new generated whole molecule.
Would you mind take a look at this issue?
Thanks,
Cheng
At this point the CSV output generated with mmpdb index --out csv [...]
does not contain property information even if you specify a property file with --properties
.
If a property file is explicitly given, it would be nice if the information like property value of compound 1 and 2, and property change during the transformation would be included in the resulting CSV file.
Hi all,
I would like to know if there is a function to show the number of pairs & rules from already generated .mmpdb file. Previously, once I generated the .mmpdb database, the information of the number of pairs and rules is shown on the screen. But I would like to review these information without regenerating the database.
Appreciate if anyone else can help me figure a way out.
Thanks,
Cheng
When I try to execute mmpdb index command on a large fragments file (>5M structures), Python runs out of memory even on a very big Linux node > 700Gb.
Can anything be done to process such big databases?
Thanks.
I'm not sure what "--output mmpa" does but it gives the following error for the GitHub version:
$ mmpdb index myfile.fragments -o myfile.mmpa --out mmpa
...
File "mmpdb/mmpdblib/index_writers.py", line 106, in add_environment_fingerprint_parent
self._W("FINGERPRINT\t%d\t%s\n" % (fp_idx, environment_fingerprint, parent_idx))
TypeError: not all arguments converted during string formatting
Dear developer,
I use vscode with Miniconda3-4.5.4 (python=3.6) in Windows.
When I run the command line, as shown in the Fragment structures section in README.md:
mmpdb fragment test_data.smi -o test_data.fragments
I get the error message:
Traceback (most recent call last):
File "C:/Users/User/miniconda3/envs/mmpdb/Scripts/mmpdb", line 4, in <module>
__import__('pkg_resources').run_script('mmpdb==2.3.dev1', 'mmpdb')
File "C:\Users\User\miniconda3\envs\mmpdb\lib\site-packages\pkg_resources\__init__.py", line 651, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "C:\Users\User\miniconda3\envs\mmpdb\lib\site-packages\pkg_resources\__init__.py", line 1448, in run_script
exec(code, namespace, namespace)
File "c:\users\user\miniconda3\envs\mmpdb\lib\site-packages\mmpdb-2.3.dev1-py3.6.egg\EGG-INFO\scripts\mmpdb", line 11, in <module>
commandline.main()
File "C:\Users\User\miniconda3\envs\mmpdb\lib\site-packages\mmpdb-2.3.dev1-py3.6.egg\mmpdblib\commandline.py", line 1054, in main
parsed_args.command(parsed_args.subparser, parsed_args)
File "C:\Users\User\miniconda3\envs\mmpdb\lib\site-packages\mmpdb-2.3.dev1-py3.6.egg\mmpdblib\commandline.py", line 181, in fragment_command
do_fragment.fragment_command(parser, args)
File "C:\Users\User\miniconda3\envs\mmpdb\lib\site-packages\mmpdb-2.3.dev1-py3.6.egg\mmpdblib\do_fragment.py", line 567, in fragment_command
pool = create_pool(args.num_jobs)
File "C:\Users\User\miniconda3\envs\mmpdb\lib\site-packages\mmpdb-2.3.dev1-py3.6.egg\mmpdblib\do_fragment.py", line 396, in create_pool
pool = multiprocessing.Pool(num_jobs, init_worker)
File "C:\Users\User\miniconda3\envs\mmpdb\lib\multiprocessing\context.py", line 119, in Pool
context=self.get_context())
File "C:\Users\User\miniconda3\envs\mmpdb\lib\multiprocessing\pool.py", line 174, in __init__
self._repopulate_pool()
File "C:\Users\User\miniconda3\envs\mmpdb\lib\multiprocessing\pool.py", line 239, in _repopulate_pool
w.start()
File "C:\Users\User\miniconda3\envs\mmpdb\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\Users\User\miniconda3\envs\mmpdb\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users\User\miniconda3\envs\mmpdb\lib\multiprocessing\popen_spawn_win32.py", line 33, in __init__
prep_data = spawn.get_preparation_data(process_obj._name)
File "C:\Users\User\miniconda3\envs\mmpdb\lib\multiprocessing\spawn.py", line 172, in get_preparation_data
main_mod_name = getattr(main_module.__spec__, "name", None)
AttributeError: module '__main__' has no attribute '__spec__'
I reviewed lots of methods fixing this error but nothing changed.
It would be great if someone could also help me fix it.
Thanks.
When running the transform query, the enumerator only creates one out of the several possible compounds if a fragment to be replaced is symmetric.
Here is a simplified example:
Running a transform query with this input compound
C1CCC1CCN2CC2
yields (among others) this fragmentation:
constant: C1CCC1*.C2CN2*
variable: CC
Note that the variable linker is symmetric.
The MMP database now contains transformations like
CC >> C(C)C
which should produce these two compounds:
C1CCC1C(C)CN2CC2
C1CCC1CC(C)N2CC2
, depending on which way round the transformation is applied. However, it only produces one of them. (The other one is produced too, but based on a different rule with much less pairs.)
Hi authors,
I thought this tool can automatically find the MMPs from a group of molecules.
For example, if mmpdb
is given a sdf, csv or smi file, it can generate a resulting file which has all the MMPs from the given file.
However, when I read the paper, it seems that the user needs to provide user-defined cutting patterns. (the constants part in the paper)
Is mmpdb
a interactive MMPs generation tool?
Best,
PK
Hi all,
I am aware of that we can adjust the max-radius parameter to set the maximum environmental radius to be indexed in MMPDB. But I wonder if there is a way that we can only index the database at a specific radius. For example, could we just generate mmpdb at the radius =3 specifically?
Thanks!
Cheng
mmpdb uses __file__
to get the *.sql
files. This prevents mmpdb from being installed as a wheel/zipfile.
I've switched to importlib.resources
, which is the modern way to get resources like this.
The importlib.resources
module was added in Python 3.7, which means this change drops Python 3.6 support!
This should not be a problem. Python 3.6 came out nearly 5 years ago, and its support period ends 2021-12, which is next month.
If it is a problem, then there are a couple of solutions. 1) use pkg_resources, 2) use the resources back-port.
The mapping from (package_name, resource_name) -> content is in the setup.cfg:
[options.package_data]
mmpdblib = schema.sql, create_index.sql, drop_index.sql, fragment_schema.sql
and loaded like this:
_schema_template = importlib.resources.read_text("mmpdblib", "schema.sql")
Hello!
I tried to run the command mmpdb fragment tests/chembl_test.smi -o tests/chembl_test.fragments
with my own data - looks like this (if it matters):
c1cn(-c2ccc3c(-c4cc5cc(CN6CCCCC6)ccc5[nH]4)n[nH]c3c2)nn1
CN(C)C(=O)c1ccc2c(-c3cc4cc(CN5CCOCC5)ccc4[nH]3)n[nH]c2c1
c1cnn(-c2ccc3c(-c4cc5cc(CN6CCCCC6)ccc5[nH]4)n[nH]c3c2)c1
c1cc2[nH]c(-c3n[nH]c4cc(-c5cn[nH]c5)ccc34)cc2cc1CN1CCOCC1
c1ncc(-c2cnc(Nc3cc(N4CCNCC4)ccn3)s2)cn1
and then I got AttributeError: module '__main__' has no attribute '__spec__'
with the full traceback:
Traceback (most recent call last):
File "/Users/alisagorislav/opt/anaconda3/envs/mmpdb/bin/mmpdb", line 4, in <module>
__import__('pkg_resources').run_script('mmpdb==2.3.dev1', 'mmpdb')
File "/Users/alisagorislav/opt/anaconda3/envs/mmpdb/lib/python3.9/site-packages/pkg_resources/__init__.py", line 672, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/Users/alisagorislav/opt/anaconda3/envs/mmpdb/lib/python3.9/site-packages/pkg_resources/__init__.py", line 1472, in run_script
exec(code, namespace, namespace)
File "/Users/alisagorislav/opt/anaconda3/envs/mmpdb/lib/python3.9/site-packages/mmpdb-2.3.dev1-py3.9.egg/EGG-INFO/scripts/mmpdb", line 11, in <module>
commandline.main()
File "/Users/alisagorislav/opt/anaconda3/envs/mmpdb/lib/python3.9/site-packages/mmpdb-2.3.dev1-py3.9.egg/mmpdblib/commandline.py", line 1054, in main
parsed_args.command(parsed_args.subparser, parsed_args)
File "/Users/alisagorislav/opt/anaconda3/envs/mmpdb/lib/python3.9/site-packages/mmpdb-2.3.dev1-py3.9.egg/mmpdblib/commandline.py", line 181, in fragment_command
do_fragment.fragment_command(parser, args)
File "/Users/alisagorislav/opt/anaconda3/envs/mmpdb/lib/python3.9/site-packages/mmpdb-2.3.dev1-py3.9.egg/mmpdblib/do_fragment.py", line 567, in fragment_command
pool = create_pool(args.num_jobs)
File "/Users/alisagorislav/opt/anaconda3/envs/mmpdb/lib/python3.9/site-packages/mmpdb-2.3.dev1-py3.9.egg/mmpdblib/do_fragment.py", line 396, in create_pool
pool = multiprocessing.Pool(num_jobs, init_worker)
File "/Users/alisagorislav/opt/anaconda3/envs/mmpdb/lib/python3.9/multiprocessing/context.py", line 119, in Pool
return Pool(processes, initializer, initargs, maxtasksperchild,
File "/Users/alisagorislav/opt/anaconda3/envs/mmpdb/lib/python3.9/multiprocessing/pool.py", line 212, in __init__
self._repopulate_pool()
File "/Users/alisagorislav/opt/anaconda3/envs/mmpdb/lib/python3.9/multiprocessing/pool.py", line 303, in _repopulate_pool
return self._repopulate_pool_static(self._ctx, self.Process,
File "/Users/alisagorislav/opt/anaconda3/envs/mmpdb/lib/python3.9/multiprocessing/pool.py", line 326, in _repopulate_pool_static
w.start()
File "/Users/alisagorislav/opt/anaconda3/envs/mmpdb/lib/python3.9/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/Users/alisagorislav/opt/anaconda3/envs/mmpdb/lib/python3.9/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/Users/alisagorislav/opt/anaconda3/envs/mmpdb/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/Users/alisagorislav/opt/anaconda3/envs/mmpdb/lib/python3.9/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/Users/alisagorislav/opt/anaconda3/envs/mmpdb/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
File "/Users/alisagorislav/opt/anaconda3/envs/mmpdb/lib/python3.9/multiprocessing/spawn.py", line 183, in get_preparation_data
main_mod_name = getattr(main_module.__spec__, "name", None)
AttributeError: module '__main__' has no attribute '__spec__'
I used macOS Monterey 12.1
mmpdb uses peewee as an adapter for different back-end databases.
I originally included a vendored version of peewee for easy of installation. I've removed that and am instead using an installation dependency on the "peewee" package.
That is, I removed peewee.py
and playhouse/
and configured setup.cfg
to have a installation dependency on peewee >= 3.0
.
Turns out the peewee API changed from 2.x to 3.x, which occurred in 2018. A plus side of vendoring is that mmpdb was isolated from this change so we didn't have to worry about it until now. :)
I've updated mmpdb to work with the new peewee API.
In Python 2.7 the built-in json module was significantly slower than the third-party ujson and cjson modules at parsing the fragment file. If one of the latter two is not found, mmpdb prints a warning message to suggest installing one of those two modules, then falls back to using the json module.
It appears that Python 3.6's json module, while still slower than cjson, is no longer sufficiently slower as to warrant having that warning message. In one test, json took 2m09s while cjson took 2m04s.
Need to re-run the timing tests with Python 3.5, 3.6, and 3.7. If it's no longer needed then have the warning message only for Python 2.7.
Hi all,
I built a mmpDB using the example "test_data.smi" shown on the github page. Then, I run a query to grep all the rules from the schema as follows: (please correct me if I did something wrong with the query)
c = cursor.execute(
"SELECT rule_environment.rule_id, from_smiles.smiles, from_smiles.num_heavies, to_smiles.smiles, to_smiles.num_heavies, "
" rule_environment.radius, "
" rule_environment_statistics.id, property_name_id, count, avg, std, kurtosis, skewness, min, q1, median, q3, max, paired_t, p_value "
" FROM rule, rule_environment, rule_environment_statistics, "
" rule_smiles as from_smiles, rule_smiles as to_smiles "
" WHERE rule_environment.id = rule_environment_id "
" AND rule_environment_statistics.rule_environment_id = rule_environment_id "
" AND rule_environment.rule_id = rule.id "
" AND rule.from_smiles_id = from_smiles.id "
" AND rule.to_smiles_id = to_smiles.id ")
After that, I took a look at all rules. I found it is difficult to understand the rule environments for the same rule.
For example below, for the rule id 0, there are 11 environments with different rule-environment-id. But how could I get the local chemical information for each environment? That will help me understand the difference between those environments.
rule_id | from_smiles | from_smiles_nHeavies | to_smiles | to_smiles_nHeavies | environ_radius | rule_environ_id | prop_id | count | avg |
---|---|---|---|---|---|---|---|---|---|
0 | [*:1]c1ccccc1N | 7 | [*:1]c1ccccc1O | 7 | 0 | 1 | 0 | 2 | 1 |
0 | [*:1]c1ccccc1N | 7 | [*:1]c1ccccc1O | 7 | 1 | 3 | 0 | 1 | 1 |
0 | [*:1]c1ccccc1N | 7 | [*:1]c1ccccc1O | 7 | 2 | 5 | 0 | 1 | 1 |
0 | [*:1]c1ccccc1N | 7 | [*:1]c1ccccc1O | 7 | 3 | 7 | 0 | 1 | 1 |
0 | [*:1]c1ccccc1N | 7 | [*:1]c1ccccc1O | 7 | 4 | 9 | 0 | 1 | 1 |
0 | [*:1]c1ccccc1N | 7 | [*:1]c1ccccc1O | 7 | 5 | 11 | 0 | 1 | 1 |
0 | [*:1]c1ccccc1N | 7 | [*:1]c1ccccc1O | 7 | 1 | 214 | 0 | 1 | 1 |
0 | [*:1]c1ccccc1N | 7 | [*:1]c1ccccc1O | 7 | 2 | 216 | 0 | 1 | 1 |
0 | [*:1]c1ccccc1N | 7 | [*:1]c1ccccc1O | 7 | 3 | 218 | 0 | 1 | 1 |
0 | [*:1]c1ccccc1N | 7 | [*:1]c1ccccc1O | 7 | 4 | 220 | 0 | 1 | 1 |
0 | [*:1]c1ccccc1N | 7 | [*:1]c1ccccc1O | 7 | 5 | 222 | 0 | 1 | 1 |
Thanks,
Jen
Most of the Cut-SMARTS currently cut bonds between aliphatic carbon and Halogens. This may not be desirable, since it leads to CF3 and OCF3 groups being split up. These splits may not be interesting for users.
The idea is to create new cut-SMARTS that do not cut CF, CF2, CF3, OCF3, and generally C[Halogen] bonds.
Just an idea for docs. It could be pretty helpful if there will be an installation guide in the README.md file.
Dear all,
I came across a SQLite3 error when indexing the fragments. See below:
WARNING: Neither ujson nor cjson installed. Falling back to Python's slower built-in json decoder. Building index ...
Failed to execute the following SQL: CREATE INDEX pair_rule_environment_id on pair (rule_environment_id);
Traceback (most recent call last):
File "/mmpdb/mmpdb", line 11, in <module> commandline.main()
File "/mmpdb/mmpdblib/commandline.py", line 1054, in main parsed_args.command(parsed_args.subparser, parsed_args)
File "/mmpdb/mmpdblib/commandline.py", line 393, in index_command do_index.index_command(parser, args)
File "/mmpdb/mmpdblib/do_index.py", line 205, in index_command pair_writer.end(reporter)
File "mmpdb/mmpdblib/index_algorithm.py", line 1199, in end self.backend.end(reporter)
File "/mmpdb/mmpdblib/index_writers.py", line 228, in end schema.create_index(self.conn)
File "/mmpdb/mmpdblib/schema.py", line 133, in create_index _execute_sql(c, get_create_index_sql())
File "/mmpdb/mmpdblib/schema.py", line 119, in _execute_sql c.execute(statement)
**sqlite3.OperationalError: database or disk is full**
But I checked my disk memory and confirmed there were plenty space available (1T). Any comments or suggestions on that? Will it help if I switch to APSW instead of SQlite3?
Thanks,
Cheng
The Smallest-transformation-only appears to not reduce single-cut transformations to H>>X transformations. For example, the transformation
[:1]c1ccccc1 >> [:1]c1ccc(F)cc1
is still in the database, although it could be reduced to
[:1][H] >> [:1]F
When mmpdb transform is called, the algorithm currently fragments the input molecule and searches for replacements for all fragments in the DB. This can then be filtered by the substructure filter and others, but fundamentally all fragments are searched. A huge speedup can potentially be gained if the user could specify the fragment she wants to exchange in the query.
This requires a check whether the fragment as specified exists at all, potentially some fragment cleanup (e.g. specification of the attachment atoms), and the a filter to the specific fragment after fragmentation of the input compound
With the current implementation of smallest-transformation only, some reducible double-cut transformations are still present in the database. For example, this transformation
[:1]C(F)[:2] >> [:1]C(Cl)[:2]
can be reduced to that transformation
[:1]F >> [:1]Cl
I don't know how to use SQL to get the table of all rules in the mmpdb file, as well as the number of pairs, and statistics for each rule. Could you please share your steps to achieve this? @chengthefang
@KramerChristian Thanks, Christian. This solved my problem. I will close the issue from my end.
Originally posted by @chengthefang in #12 (comment)
Double- and triple-cuts can produce regioisomers, where the constant parts are just swapped. Examples are these transformations:
Double cut: [:1]CC1(CC1)[:2] >> [:1]C1(CC1)C[:2]
Triple cut: [:1]c1cc([:2])c([:3])cc1 >> [:1]c1cc([:3])c([:2])cc1
It may be useful to not store these transformations in order to reduce database size, in particular for triple cuts. If implemented, it would be good if these filter can be set separately for double and triple cuts.
Hi all,
I was trying to use the mmpdb version 3 for the fragmentation. However, I came across an error when implementing it on Linux:
The code I ran is mmpdb fragment test_data.smi -o test_data.fragdb
The error is:
`Failed to execute the following SQL:
-- Version 3.0 switched to a SQLite database to store the fragments.
-- Earlier versions used JSON-Lines.
-- The SQLite database improves I/O time, reduces memory use, and
-- simplifies the development of fragment analysis tools.
-- NOTE: There is configuration information in three files!
-- 1) fragment_types.py -- the data types
-- 2) fragment_schema.sql -- (this file) defines the SQL schema
-- 3) fragment_db.py -- defines the mapping from SQL to the data types
CREATE TABLE options (
id INTEGER NOT NULL,
version INTEGER,
cut_smarts VARCHAR(1000),
max_heavies INTEGER,
max_rotatable_bonds INTEGER,
method VARCHAR(20),
num_cuts INTEGER,
rotatable_smarts VARCHAR(1000),
salt_remover VARCHAR(200),
min_heavies_per_const_frag INTEGER,
min_heavies_total_const_frag INTEGER,
max_up_enumerations INTEGER,
PRIMARY KEY (id)
);
Traceback (most recent call last):
File "/miniconda3/envs/mmpdb31/bin/mmpdb", line 8, in
sys.exit(main())
File "/.local/lib/python3.9/site-packages/click/core.py", line 1130, in call
return self.main(*args, **kwargs)
File "/.local/lib/python3.9/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/.local/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/.local/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/.local/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/miniconda3/envs/mmpdb31/lib/python3.9/site-packages/mmpdblib/cli/fragment_click.py", line 215, in make_fragment_options_wrapper
return command(**kwargs)
File "/miniconda3/envs/mmpdb31/lib/python3.9/site-packages/mmpdblib/cli/smi_utils.py", line 98, in make_input_options_wrapper
return command(**kwargs)
File "/.local/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/miniconda3/envs/mmpdb31/lib/python3.9/site-packages/mmpdblib/cli/fragment.py", line 256, in fragment
with fragment_db.open_fragment_writer(
File "/miniconda3/envs/mmpdb31/lib/python3.9/site-packages/mmpdblib/fragment_db.py", line 372, in open_fragment_writer
init_fragdb(c, options)
File "/miniconda3/envs/mmpdb31/lib/python3.9/site-packages/mmpdblib/fragment_db.py", line 92, in init_fragdb
schema._execute_sql(c, get_schema_template())
File "/miniconda3/envs/mmpdb31/lib/python3.9/site-packages/mmpdblib/schema.py", line 129, in _execute_sql
c.execute(statement)
sqlite3.OperationalError: database is locked`
It probably has nothing to do with the MMPDB-v3 program since it is running fine on my Mac. If anyone has some advice/suggestions on how to solve it, it would be highly appreciated.
Thanks,
Cheng
Hi all,
I recently found some unexpected outcomes when using "mmpdb transform" with or without the property flag. The mmpdb database was generated using ChEMBL database with calculated LogP as the property.
When I used the "--no-properties" flag, I got 4632 transformed structures.
mmpdb transform chembl.mmpdb --smiles "XXXXXX" --min-pairs 5 --min-variable-size 0 --max-variable-size 20 --no-properties -o results_noprop.csv &
However, when I turned on the "--property LogP" flag, I got 591 transformed structures.
mmpdb transform chembl.mmpdb --smiles "XXXXXX" --min-pairs 5 --min-variable-size 0 --max-variable-size 20 --property LogP -o results_prop.csv &
I would expect the code with "--property LogP" generates the same number of compounds but with more output info.
Any thoughts on that?
Thanks!
Cheng
Hi all,
I am using mmpdb fragment to parse a subset of SureChembl database, and then I found the mmpdb fragment will fail for some specific smiles. I wonder if we could add some error handling to deal with some unfavorable structures.
Here is the example of test.smi.
C[C@]12CCC3c4c5cc(O)cc4[C@@]4(CC[C@@]1(C4)C3CC5)[C@@H]2O SCHEMBL9251776
Oc1ccccc1 phenol
Oc1ccccc1O catechol
Oc1ccccc1N 2-aminophenol
Oc1ccccc1Cl 2-chlorophenol
Nc1ccccc1N o-phenylenediamine
Nc1cc(O)ccc1N amidol
Oc1cc(O)ccc1O hydroxyquinol
Nc1ccccc1 phenylamine
C1CCCC1N cyclopentanol
I ran "python mmpdb/mmpdb fragment test.smi -o test_data.fragments". It failed on parsing the first smiles and won't skip it to continue. The error is shown as below:
Failure: file 'test.smi', line 1, record #1: first line starts 'C[C@]12CCC3c4c5cc(O)cc4[C@@]4(CC[C@@]1(C ...'
Traceback (most recent call last): File "mmpdb/mmpdb", line 11, in commandline.main() File "/mmpdb/mmpdblib/commandline.py", line 1054, in main parsed_args.command(parsed_args.subparser, parsed_args) File "/mmpdb/mmpdblib/commandline.py", line 181, in fragment_command do_fragment.fragment_command(parser, args) File "/mmpdb/mmpdblib/do_fragment.py", line 581, in fragment_command writer.write_records(records) File "/mmpdb/mmpdblib/fragment_io.py", line 404, in write_records for rec in fragment_records: File "/mmpdb/mmpdblib/do_fragment.py", line 475, in make_fragment_records fragments = result.get() File "anaconda2/lib/python2.7/multiprocessing/pool.py", line 572, in get raise self._value ValueError: need more than 1 value to unpack
Appreciate any suggestions or ideas.
Thanks,
Cheng
The one indexed in PyPI is 2.1. It would be great if the latest version is also in PyPI. Thanks.
Hi!
I am getting an unexpected generated molecule when using double cuts. The transformation, which in this case is the linker, is attached the wrong way around (details below). I would really appreciate your help :)
Using the two following molecules to create an MMPDB
c1ccc(cc1)c2ccc3c(c2)c(c(cc3NCc4cn5cc(cnc5n4)F)c6ccccc6)c7ccccc7 test1
c1ccc(cc1)c2ccc3c(c2)c(c(cc3NCc4[nH]c5c(n4)ccc(c5F)F)c6ccccc6)c7ccccc7 test2
to then generate new molecules from the molecule to improve
c2ccc3c(c2)c(c(cc3NCc4[nH]c5c(n4)ccc(c5F)F)c6ccccc6)c7ccccc7
the following molecule is proposed
Fc1cn2cc(CNc3cc(-c4ccccc4)c(-c4ccccc4)c4ccccc34)cnc2n1
while the expected generated molecule would be
c2ccc3c(c2)c(c(cc3NCc4[nH]c5c(n4)ccc(c5F)F)c6ccccc6)c7ccccc7.
Please let me know if any other information is needed to better understand my issue.
Thanks a lot,
Alice
When using --min-heavies-per-const-frag 3 option during the fragmentation stage, I noticed that I am loosing the following transformation:
[:1]O[:2] to [:1]C([:2])N
in which one of the Rs is a simple methyl group. Is it possible to somehow loosing this transformation by playing with any option on during the indexing step.
Hi,
I wonder if there is a way to obtain the information of all rules, as well as the number of pairs and the statistics for each rule for a built fragment(mmpdb) database. A simple output table I expect is like:
from_smiles(smirks) . To_smiles(smirks) # of pairs Mean std
rule1 ***** *****
rule2 ***** *****
.....
I am pretty interested in presenting the rules from a database in a similar way to the Tables1-5 & Figure 5 in your publication "J. Med. Chem. 2018, 61, 3277−3292". Would you mind me give some hints how to achieve that through mmpdb codes?
Thanks,
Cheng
A "--smallest-transformation-only" option doesn't produce desirable result with some transformations. It seems that there is conflict between --smallest-transformation-only option (use during indexing) with --min-heavies-per-const-frag option (use during fragmentation)
For example, if one considers this transformation [:1]C(=O)Nc1ccccc1>>[:1]C(=O)Nc1cccnc1, it is clearly reducible to [:1]c1ccccc1>>[:1]c1cccnc1.
However, if someone sets the option --min-heavies-per-const-frag == 9 (during fragmentation step), then the output is [:1]C(=O)Nc1ccccc1>>[:1]C(=O)Nc1cccnc1 and not the [:1]c1ccccc1>>[:1]c1cccnc1.
This is possibly because, a number of heavy atoms in fragment [:1]C(=O)Nc1ccccc1 (or [:1]C(=O)Nc1cccnc1) <=min-heavies-per-const-frag and hence there is no further fragmentation possible for this fragments and hence it's not reducible.
Resolved issues:
Currently the fragment file stores the fragmentation in JSON-Lines format. After an initial header (with version and option lines) are a sequence of "RECORD" or "IGNORE" lines, each structured as a JSON list. For RECORD lines, there are 10 fields, with the last a list of fragmentations.
I propose switching the fragment command to save to a SQLite database (proposed extension, ".fragdb"). Analysis in SQL is so much easier than writing one-off Python programs
I have pull-request implementing this proposal.
I see several advantages of using SQLite instead of a flat(ish) file:
To clarify the last point, consider a tool to examine the most common fragments, or to select only a few constants. This sort of tool could be written as a SQL command rather than read the data into memory followed by some Python-based data processing.
The disadvantages I can think of are:
Maybe there are other downsides?
One issue is how to handle the default fragment file. Currently mmpdb fragment
will write to stdout if no -o
option is given. This does not work with SQLite output file.
I could:
-o
fileinput.fragdb
)I have decided that if you do not specify -o
then the default output is the structure filename, with possible extension and .gz
removed and replaced with .fragdb
. If the structure file is AB.smi.gz
then the default output fragment database name is AB.fragdb
. If the structure file is CD.smi
then the default output name is CD.fragdb
.
This decision was guided by the need to distribute fragmentations across a cluster (rather than the simple Python-based multiprocess fragmentation now). In that case, your process will be something like:
o Split the input SMILES files into N parts
mmpdb smi_split input.smi
command?o Fragment each SMILES file (using a fake cluster queue submission)
o Merge the fragments back into a single file:
If the mmpdb fragment
step used the input SMILES filename to influence how the default output name is determined (in this case, from input.part001.smi
to input.part001.fragdb
) then there wouldn't need to be a filename manipulation step here.
Currently there is the option to display the fragments in "fraginfo" format. This was an undocumented option to display the text in a more human-readable format. It does not appear to be used, as the code clearly says "Fix this for Python 3". I suspect it can be removed without a problem.
Still, perhaps there is a reason to have a way to view the fragmentations in a more human readable format? For example:
mmpdb fragment_dump whatever.fragdb
mmpdb fragment_dump --id ABC123 whatever.fragdb
However, it just doesn't seem useful. It's so much easier to do the query in SQL.
Should the SMILES strings in the fragdb database be normalized? (That is, all 23,256 occurrences of *C.*C
would be normalized to an integer id in a new smiles
table, and the fragmentation SMILES stored by id reference, rather than storing the actual string.)
I used the ChEMBL_CYP3A4_hERG.smi test set from the original mmpdb development, with 20267 SMILES strings. Using a denormalized data set (constant and fragment SMILES are stored as-is), the resulting sizes are:
% ls -lh ChEMBL_CYP3A4_hERG.frag*
-rw-r--r-- 1 dalke admin 139M Oct 12 13:30 ChEMBL_CYP3A4_hERG.fragdb
-rw-r--r-- 1 dalke admin 146M Oct 12 12:41 ChEMBL_CYP3A4_hERG.fragments
This shows that "fragdb" is slightly more compact than "fragments".
On the other hand, gzip -9
produces a ChEMBL_CYP3A4_hERG.fragments.gz
which is 1/13th the size, at 11MB bytes (153151552/11656549 = 13.1).
A SQL query suggests I can save about 50MB by normalizing the duplicate fragment SMILES, which is about 40% of the file size.
sqlite> select sum(length(s) * N), sum(length(s) * (N-1)) FROM (select s, count(*) as N FROM (SELECT constant_smiles AS s FROM fragmentation UNION ALL SELECT variable_smiles AS s FROM fragmentation) group by s);
84360180|32669074
On the other hand, that estimate doesn't fully include the normalization table, nor does it include the indices which may be needed for possible analyzes.
(The constant_with_H_smiles
and record's input_smiles
and normalized_smiles
have few duplicate values so should not be normalized.)
I changed the code to normalize the fragments in a 'smiles' table and regenerated the data set. The new one is second, with the "hERG2"
% ls -lh ChEMBL_CYP3A4_hERG.fragdb ChEMBL_CYP3A4_hERG2.fragdb
-rw-r--r-- 1 dalke admin 139M Oct 12 13:30 ChEMBL_CYP3A4_hERG.fragdb
-rw-r--r-- 1 dalke admin 189M Oct 12 16:26 ChEMBL_CYP3A4_hERG2.fragdb
The resulting size is larger because it contains the SMILES normalization table, and the indexing needed for the UNIQUE constraint. It is still roughly the same size as the uncompressed fragments file, though quite larger than the gzip-compressed fragments.
There must be an index mapping the each fragmentation to its record. I tried a version without that index and mmpdb index
was obviously slower, even for a test set of only 1,000 SMILES.
The database should be index to support the types of analyses which might be done on the fragment data. At present I don't know what these are likely to be. Some likely ones are:
I modified the two datasets to make them index by the constant and variable parts. For the de-normalized hERG.fragdb I indexed the constant and variable SMILES strings. For the normalized hERG2.fragdb I indexed the constant and variable SMILES ids.
-rw-r--r-- 1 dalke admin 241M Oct 12 16:55 ChEMBL_CYP3A4_hERG_with_idx.fragdb
-rw-r--r-- 1 dalke admin 235M Oct 12 17:04 ChEMBL_CYP3A4_hERG2_with_idx.fragdb
This nearly doubles the database size. This also shows that normalization doesn't affect the database size if both the constants and variables need to be indexed.
It's hard to judge if this increase in size is useful without tests on the types of analyses to do, so I used the above "likely ones".
This is trivial with unnormalized version - merge the two sets of tables, and update the ids so they don't overlap. The normalized version is a bit more complex as the normalization tables must be merged.
The following (in the unnormalized data set) prints the distribution of constant SMILES, ordered by count:
select count(*), constant_smiles from fragmentation GROUP BY constant_smiles ORDER BY count(*) DESC
Given the fragmentations from 20K molecules, this takes a bit over 2 seconds on the unindexed,unnormalized data set.
If the SMILES strings are indexed, but still unnormalized, it takes a bit over 1 second.
If the SMILES strings are normalized and indexed it's 0.66 seconds. That's about 3x faster.
Given that this will likely not be common, I suggest staying with unnormalized strings, and no index. Perhaps there can be an mmpdb fragdb_index
command to add indices if people want a 2x speedup.
Bear in mind that currently analysis must do a linear search of the fragments file, decoding the JSON-Lines, and building the search into memory. This probably takes much longer, though I haven't tested it.
There are a couple of ways I could think of to select a set of fragments: 1) specify the SMILES strings, and 2) request only those in the top-P%, or top N, or those with at least M occurrences.
In both cases, I think the right solution is to make a temporary table containing the selected fragments, then use that to select the subset of fragments and of records containing those fragments. For example, the following selects all records with a fragmentation with a constant SMILES is one of the 10,000 most common.
attach database ":memory:" as tmp;
CREATE TABLE tmp.selected_fragmentation (
fragment_id INTEGER
);
INSERT INTO tmp.selected_fragmentation
SELECT id
FROM fragmentation
WHERE constant_smiles IN (
SELECT constant_smiles
FROM fragmentation
GROUP BY constant_smiles
ORDER BY count(*)
DESC
LIMIT 10000);
analyze tmp;
select record.id from record, fragmentation where record.id = fragmentation.record_id AND fragmentation.id in (select fragment_id from tmp.selected_fragmentation);
The :memory:
+ index + analyze is slightly faster than doing a straight search on the unindexed database. Note the above only filters the records, when full export will also need to export the fragments, which requires another search. That's why I think the temporary index is worthwhile.
It looks like this can be done with a similar mechanism - create a table, randomize the order, split into M parts, and save to distinct output databases.
The current pull request drops supports the old 'fragments' format if the filename ends with .fragments
or .fragments.gz
. This simplifies the relevant code by quite a bit, compared to a version which must support support both I/O formats.
The feedback from Mahendra is that this isn't an issue.
An middle solution is to only support the old 'fragments' format in the --cache
option, which would let people upgrade to the new system without having to re-fragment everything. I don't think that's needed as re-fragmentation, while slow, is doable.
Dear,
Thanks for developing the nice tool.
And I would like to know how to get environment smiles. When mmpdb is created, there is the environment_fingerprint table which has id and env_fp. I would like to get smiles which is corresponding to each fingerprint.
Any advice or suggestions are greatly appreciated.
Thanks,
On Windows, the options --min-variable-size
and --min-constant-size
of transform command aren't working. It is saying that, in the line 909 in mmpdblib/analysis_algorithms.py
, a Fragmentation object has no attribute 'num_variable_heavies'.
I'm on Windows 10 and using Anaconda Prompt, mmpdb works fine.
I tried the same command on Linux and it worked, but I think it's relevant to report the issue on Windows.
(base) C:\Users\me\Desktop\mmpdb-2.1>python mmpdb transform --smiles CC=CC=CC=O --min-variable-size 5 my_data.mmpdb
Traceback (most recent call last):
File "mmpdb", line 10, in <module>
commandline.main()
File "C:\Users\me\Desktop\mmpdb-2.1\mmpdblib\commandline.py", line 988, in main
parsed_args.command(parsed_args.subparser, parsed_args)
File "C:\Users\me\Desktop\mmpdb-2.1\mmpdblib\commandline.py", line 678, in transform_command
do_analysis.transform_command(parser, args)
File "C:\Users\me\Desktop\mmpdb-2.1\mmpdblib\do_analysis.py", line 116, in transform_command
explain = explain,
File "C:\Users\me\Desktop\mmpdb-2.1\mmpdblib\analysis_algorithms.py", line 764, in transform
cursor=cursor, explain=explain)
File "C:\Users\me\Desktop\mmpdb-2.1\mmpdblib\analysis_algorithms.py", line 909, in make_transform
if min_variable_size and frag.num_variable_heavies < min_variable_size:
AttributeError: 'Fragmentation' object has no attribute 'num_variable_heavies'
Hi all,
I am using the MMP transform without properties information.
For example,
python mmpdb transform test_data.mmpdb --smiles 'c1cccnc1O' --no-properties
Output
ID SMILES
1 Clc1ccccn1
2 Nc1ccccn1
3 c1ccncc1
In the output, I only got ID and Smiles. But I would like to print other useful information including 'from_smiles,' 'to_smiles', 'radius', 'rule_environment_id', 'count'. Those statistics could give us some confidence for the transform. I think those statistics should be independent of properties. I am curious if there is a way to output those statistics with generated molecules if we work with a MMPDB without property information.
Thanks, Cheng
I would like to do fragmentation on a dataset with 145 peptides. When I tried with mmpdb it says "too many rotatable bonds". Is there any way to increase the number of max rotatable bonds permited?
Hi all,
I am trying to build a MMP-DB with 10M compounds. But I got an error at the first step of fragmentation.
The command I used is as follows:
python mmpdb fragment first10M.smi --num-jobs 8 -o first10M.fragments.gz
The error I got is:
Traceback (most recent call last): File "/home/anaconda2/lib/python2.7/threading.py", line 801, in __bootstrap_inner self.run() File "/home/anaconda2/lib/python2.7/threading.py", line 754, in run self.__target(*self.__args, **self.__kwargs) File "/home/anaconda2/lib/python2.7/multiprocessing/pool.py", line 328, in _handle_workers pool._maintain_pool() File "/home/anaconda2/lib/python2.7/multiprocessing/pool.py", line 232, in _maintain_pool self._repopulate_pool() File "/home/anaconda2/lib/python2.7/multiprocessing/pool.py", line 225, in _repopulate_pool w.start() File "/home/anaconda2/lib/python2.7/multiprocessing/process.py", line 130, in start self._popen = Popen(self) File "/home/anaconda2/lib/python2.7/multiprocessing/forking.py", line 121, in __init__ self.pid = os.fork() OSError: [Errno 12] Cannot allocate memory
Does anybody have comments or suggestions on that? Also, can I run the command on distributed nodes on the cluster?
ps: I also have similar concerns about the second step of indexing since it usually takes longer time and larger memory than the fragmentation. Can I run the indexing command in parallel or on the distributed cluster?
Thanks,
Cheng
Currently, command-line processing is a mess.
There's commandline.py
, which is ~1,000 lines of argparse configuration for all of the commands.
Each command dispatches to a function in one of the 6 do_*.py
files. For example, the help commands are in do_help.py
and the analysis commands ("predict" and "transform") are in do_analysis.py
.
There's a growing consensus to organize the command-line components into its own subpackage named cli
(for "command-line interface"). I propose doing this.
Further, I propose switching from argparse to click. These are both packages to simplify working with command-line processing.
The "click" package is now (I don't know what it was like 6 years ago) a mature and full-featured package. It uses a different model than argparse, so this will require extensive rewriting. One clear advantage to click is the built-in support for testing. With argparse I needed my own functions to, for example, capture stdout and stderr so I can verify they contain the right information. While click's CliRunner.invoke does that for me.
6 years ago I chose argparse because I knew it best, and because it's part of the Python standard library. I don't like having external dependencies if I can avoid it.
My preference is in large part because when I started with Python I always had to install packages manually. Modern Python packages can specify which dependencies they need, and modern package installers can install them automatically if needed.
This means I'll also be updating the mmpdb package configuration to use the more modern conventions.
Hi All,
I am wondering which "cut-smarts" patterns are used by default in Transform function. That is, how does the program fragment a given molecule when applying a built MMPDB to do the transform? I couldn't find the parameter that controls the fragmentation methods in "transform" function.
Ideally, the transform function will fragment a molecule in the same way as how the MMPDB was built. But I am not sure if it is the case.
Thank you!
Cheng
In 2019 kzfm gave an example of using SQLAlchemy to work with the database.
In my work to replace the JSON-Lines fragment format with a SQLite-based format I quickly discovered that I am hand-writing an ORM. Poorly.
I propose switching using SQLAlchemy in this updated version I am working on.
At this point I don't know how much work that requires. kzfm showed that defining the structure and querying the generated mmpdb database was simple.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.