phi-grib / flame Goto Github PK
View Code? Open in Web Editor NEWModeling framework for eTRANSAFE project
License: GNU General Public License v3.0
Modeling framework for eTRANSAFE project
License: GNU General Public License v3.0
Please remember to update your code frequently, avoiding pushing obsolete code
Also, avoid re-writting code produced by other members of the team unless there is a good reason to do so. Even in this case, please inform the author before pushing
Moreover, before pushing code make double sure it works running simple tests. This does not replace more sophisticated quality controls, but at least will not block developement of other components
Currently it show like an error in the web service but I think that it should be only warning. When calling the model with molecule by molecule ("objects") this error disappears
We must create a folder with Jupyter notebooks illustrating how Flame can be used to generate predictions and how the JSONs can be easily converted to pandas and visualized in different ways
Since flame depends on RDkit, using just pip
to have a clean complete install is not possible. Conda allows handling this kind of complex dependencies very easily.
Default upload directory for the prediction web service is /var/tmp. In Windows, this directory must be created by hand.
We need to recognize the platform which is running the server and set up appropriate temp directories
(flame) [kpinto@ulises 6-model]$ flame -c build -e MyModel -f tr-DEG.sdf
CRITICAL ERROR: unable to load parameter file.Running with fallback defaults
Traceback (most recent call last):
File "/home/kpinto/miniconda3/envs/flame/bin/flame", line 11, in
load_entry_point('flame', 'console_scripts', 'flame')()
File "/phi/users/kpinto/flame/flame/flame_scr.py", line 142, in main
success, results = context.build_cmd(model)
File "/phi/users/kpinto/flame/flame/context.py", line 142, in build_cmd
shutil.copy(ifile, lfile)
File "/home/kpinto/miniconda3/envs/flame/lib/python3.6/shutil.py", line 241, in copy
copyfile(src, dst, follow_symlinks=follow_symlinks)
File "/home/kpinto/miniconda3/envs/flame/lib/python3.6/shutil.py", line 121, in copyfile
with open(dst, 'wb') as fdst:
FileNotFoundError: [Errno 2] No such file or directory: '/phi/users/kpinto/flame/flame_models/MyModel/dev/tr-DEG.sdf'
Since we are using python 3.6 we could get advantage of the new pathlib (new since 3.4). Its standard library to work with path (either posix or windows) with a lot of useful methods. Since we are dealing with multiple sdfiles (when working with cpu>1) it will be helpfull!
Since this will become a big project I think we should follow the PEP-8 style guide.
Here you can find a resume with the most important features.
Saying "no" to the first dialog in config command changes the config status. It shouldn't change since the model repo path is not updated when aborting the config been updated.
I saw that the argparser for file input uses -f
for short arg but --infile
for long. I think they should have the same starting letter. eg. --filein
Read the requirements.txt or the environment.yml and make a list of dependencies to pass to install_requires
in setup
Flame is only returning a TSV with molecule descriptors in the prediction module.
Don't use:
try:
1/0
except:
print('something did not work')
will print something did not work
Always catch the exception (even with generic exception class Exception
):
try:
1/0
except Exception as e:
print(f' something did not work. Cause: {e}')
will print something did not work. Cause: division by zero
Let's build a better world together
Do we need to have /old
here?
calling flame from -c predict
doesn't raise ImportError
(if there is such error)
When standardizing a molecule series, if one molecule fails in the standardization process, the whole series is rejected.
if 'standardize' in method:
try:
parent = standardise.run(Chem.MolToMolBlock(m))
except standardise.StandardiseException as e:
if e.name == "no_non_salt":
parent = Chem.MolToMolBlock(m)
else: **--> then the function is returning False for the whole series**
return False, e.name
except:
return False, "Unknown standardiser error"
Flame is returning the error message: "False {"error": "number of molecules informed and processed does not match"} " as no molecule could be processed.
Now it is in countmol()
. I think it will be better to have it in a separate method like:
def chunk_to_file(*args):
index = []
chunksize = nmol//self.control.numCPUs
for a in range (nmol):
index.append(a//chunksize)
moli=0 # molecule counter in next loop
chunki=0 # chunk counter in next toolp
filename, file_extension = os.path.splitext(ifile)
chunkname = filename + '_%d' %chunki + file_extension
try:
[. . .]
if self.control.numCPUs > 1 :
chunk_to_file()
We need to include scikit-learn in the environment. Also, we need to see how we can include standardizer and, if not possible, write a brief "how-to" explaining how setting up the environment
When config.yml contains a windows path it fails to resolve correctly and the function utils.model_repository_path()
returns an invalid path.
In [19]: p = pathlib.Path('C:/Users')
In [20]: p.resolve()
Out[20]: PosixPath('/home/biel/git-repos/phi/Flame/C:/Users')
if a single letter changes in the name present in the parameters from actual target field in the sdf file the error it gives is caused by an empty result since it does not have any activity value.
This error must be handled explicitly as it is: SDFile_activty
param not found in sdf
More flexible. Is the standard for python3
When running predict through the command line this parameter can not be modified in the parameter file as it is fixed at context.py. Should be mentioned somewhere, perhaps commented in the parameters file.
I don't know if this is useful but:
We should use Logging lib to dump event messages, warnings and errors. It will improve debugging and inspection of results. The logger have different levels (DEBUG, INFO, WARNING, ERROR) and info about the module that produces the message. For example:
2018-07-28 12:41:12,075 - flame.build - INFO - Creating list...
2018-07-28 12:41:12,075 - flame.build - DEBUG - length of list: 10
JSON format works perfectly. When TSV format is set up in yaml file:
1- there is no complete output in terminal:
2- the output.tsv dumped contains:
- headers: obj_nam | SMILES | c0 | c1 | ymatrix
- What does is mean c0 and c1? what about ymatrix?
3- where is sens, spec and MCC??
The way how the exceptions are handled is proper to cause problems and misconceptions. For example, in the function:
def nummols (ifile):
try:
suppl = Chem.SDMolSupplier(ifile)
except:
return False, 'unable to open molfile'
return True, len(suppl)
if the try/except catches an error, it will swallow it and output unable to open molfile
always, even if the error wasn't opening the file (bad rdkit import for example)
Otherwise, doing:
def nummols (ifile):
try:
suppl = Chem.SDMolSupplier(ifile)
except:
raise
return len(suppl)
If the try fails because there is no Chem module now the error will be correctly tracked to:
NameError: name 'Chem' is not defined
We need to install standardise from https://github.com/flatkinson/standardiser and probably include it in the furnace environment
This could be problematic since flame will be working in numerous environments. Isn't it better to put the exported .tar in the model folder itself?
1- in yaml file:
Both JSON and TSV data serialization fails.
JSON serialization fails when dumping the variable values from results. values is given as np.int64 type which is not compatible.
TSV instead, fails at (line 139):
if isinstance(val, float):
line += "%.4f" % val
else:
line += val
As there is no assertion for np.int64 type, the variable is not converted to string.
Molecular descriptors:
-It creates a file with the same name for both build and predict. I would recommend to put different names.
Build:
Predict:
Could it be possible to obtain a table where appears:
MD | spec_calc | sens_calc | MCC_calc | spec_CV | sens_CV | MCC_CV | Coverage_CV | Accuracy_CV | spec_extv | sens_extv | MCC_extv | Coverage_extv
RDKit_properties | | | | 0.77 | 0.79 | 0.55 | 0.47 | 0.78 | 0.74 | 0.58 | 0.32 | 0.51
RDKit_md | | | | 0.78 | 0.73 | 0.50 | 0.36 | 0.75 | 0.79 | 0.80 | 0.58 | 0.48
When building or applying models, the program takes much more time (x10 or more) to finish in Windows than in Linux. The CPU was not in use. The problem reproduces in different Windows installs, but not in VMs
1 - In context.py (build_cmd function):
ifile = model['infile']
if not os.path.isfile(ifile):
return False, 'wrong training series file'
epd = utils.model_path(model['endpoint'], 0)
lfile = os.path.join(epd, os.path.basename(ifile))
shutil.copy(ifile, lfile) <---
When the input file is already in the dev folder, an exception raises.
2- In idata.py (workflow_objects function):
if first_mol: # first molecule
md_results = results[0]
va_results = results[1]
num_var = len(md_results) <---
first_mol = False
else:
if len(results[0]) != num_var:
print('ERROR: (@workflow_objects) incorrect number of MD for molecule #', str(
i+1), 'in file ' + input_file)
continue
Indicated statement assumes first molecule will always be correct in the number of parameters.
when a molecule fails to compute in a padel web service, the change in the matrix size is not handled properly
flame) [kpinto@ulises 0-rdkit-properties]$ flame -c build -e INF-ql-RF -f ../../../1-test/pr-InF-3D-moka.sdf
recycling data >>> /phi/users/kpinto/flame/flame_models/INF-ql-RF/dev/data.pkl
running sumbsmapling
tune_parameters
metric: f1
best parameters: {'class_weight': None, 'max_features': 'sqrt', 'n_estimators': 25, 'oob_score': True, 'random_state': 46}
found in: 2.9187703132629395 seconds
Traceback (most recent call last):
File "/home/kpinto/miniconda3/envs/flame/bin/flame", line 11, in
load_entry_point('flame', 'console_scripts', 'flame')()
File "/phi/users/kpinto/flame/flame/flame_scr.py", line 142, in main
success, results = context.build_cmd(model)
File "/phi/users/kpinto/flame/flame/context.py", line 145, in build_cmd
success, results = build.run(lfile)
File "/phi/users/kpinto/flame/flame/build.py", line 83, in run
results = learn.run()
File "/phi/users/kpinto/flame/flame/learn.py", line 123, in run
self.run_internal()
File "/phi/users/kpinto/flame/flame/learn.py", line 96, in run_internal
success, results = model.validate()
File "/phi/users/kpinto/flame/flame/stats/base_model.py", line 391, in validate
success, results = self.CF_qualitative_validation()
File "/phi/users/kpinto/flame/flame/stats/base_model.py", line 248, in CF_qualitative_validation
self.sensitivity = (self.TP / (self.TP + self.FN))
ZeroDivisionError: division by zero
Now it is:
try:
with open(os.path.join(self.dest_path, 'data.pkl'), 'wb') as fo:
pickle.dump(md5_parameters, fo)
pickle.dump(md5_input, fo)
pickle.dump(self.results, fo)
except Exception as e:
print(e)
pass
It will be way more clear if it firsts checks if data.pkl exists and if does or doesn't do the appropriate stuff.
Use yamlloader to load and write the YAML with ordered dict so it maintains the order.
All the team must commit to maintain updated the external licenses document
WS shows projection error
when it should say something like model doesn't exist
I am doing the tutorial. When I try to build the model I get Segmentation fault: 11
I have installed the environment you provide on a macOS 10.13.3 machine.
Also the file I use as training set is caco2.sdf.
The version number must be passed to predict as an int, both from flame and from predict-ws. Avoid reconverting strings to ints at the constructor or other places
Depending if the model is qualitative or quantitative flame shouldn't read without raising error or warning a sdf with the wrong type in < activity >
manage
needs some fixes and improvements in order to make the user and developer experience smooth. It would be nice to have a functionality to set the root directory for models repository and copy the the config.yaml file if it needs to be readed again.
It would be nice if we can discuss more about how manage should deal with the repository of models and how to propagate this information to the other classes.
This only happens if the model name is 'test' in lowercase, with other names or 'TEST' in uppercase it works
Steps to reproduce:
from flame.build import Build
d = Build("test")
d.run("/home/marc/Documents/flame_dev_api/sdf/caco2.sdf")
Output:
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-5-dbf08200c5a5> in <module>()
----> 1 d.run("/home/marc/Documents/flame_dev_api/sdf/caco2.sdf")
~/Documents/flame/flame/build.py in run(self, input_source)
70 modpath = utils.module_path(self.model, 0)
71
---> 72 idata_child = importlib.import_module(modpath+".idata_child")
73 learn_child = importlib.import_module(modpath+".learn_child")
74 odata_child = importlib.import_module(modpath+".odata_child")
~/anaconda3/envs/flame_django/lib/python3.6/importlib/__init__.py in import_module(name, package)
124 break
125 level += 1
--> 126 return _bootstrap._gcd_import(name[level:], package, level)
127
128
~/anaconda3/envs/flame_django/lib/python3.6/importlib/_bootstrap.py in _gcd_import(name, package, level)
~/anaconda3/envs/flame_django/lib/python3.6/importlib/_bootstrap.py in _find_and_load(name, import_)
~/anaconda3/envs/flame_django/lib/python3.6/importlib/_bootstrap.py in _find_and_load_unlocked(name, import_)
~/anaconda3/envs/flame_django/lib/python3.6/importlib/_bootstrap.py in _call_with_frames_removed(f, *args, **kwds)
~/anaconda3/envs/flame_django/lib/python3.6/importlib/_bootstrap.py in _gcd_import(name, package, level)
~/anaconda3/envs/flame_django/lib/python3.6/importlib/_bootstrap.py in _find_and_load(name, import_)
~/anaconda3/envs/flame_django/lib/python3.6/importlib/_bootstrap.py in _find_and_load_unlocked(name, import_)
ModuleNotFoundError: No module named 'test.dev'
Then working as ws, and the number of CPUs is set to 1, standardizer fails. The error is captured and a "standardizer unknown error" is issued
Changing to 2 CPU or removing normalization solves the problem
Should we test flame in older python version and downgraded versions of packages and fix the issues? If so, where do we have to put the compatibility frontier?
Print when working with the CLI but a logger will be better for debugging and to inspect the workflow of model management and use.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.