Code Monkey home page Code Monkey logo

machine-learning-meets-pka's Issues

ValueError: Input X contains NaN.

hi,

I tried to run the notebook, modeling.ipynb, and the previous cell units ran smoothly, but when it came to this section

est_cls = RandomForestRegressor
rf_params = dict(n_estimators=1000, n_jobs=est_jobs, verbose=verbose, random_state=seed)
name = 'RandomForest (n_estimators=1000)'

train_all_sets(est_cls, rf_params, name)

the error info:

ValueError: Input X contains NaN.
RandomForestRegressor does not accept missing values encoded as NaN natively. For supervised learning, you might want to consider sklearn.ensemble.HistGradientBoostingClassifier and Regressor which accept missing values encoded as NaNs natively. Alternatively, it is possible to preprocess the data, for instance by using an imputer transformer in a pipeline or drop samples with missing values. See https://scikit-learn.org/stable/modules/impute.html You can find a list of all estimators that handle NaN values at the following page: https://scikit-learn.org/stable/modules/impute.html#estimators-that-handle-nan-values

Here the input is made of null values, but I see that none of the previous values have null values inside them. Why is there a null value here, or am I setting the parameter wrong here?

many thanks for your help

best,

Sh-Y

make data available under CCZero?

MIT is a software license. For data, a data license is better (data copyright and software copyright laws are often different). May I ask you to consider making a citable Zenodo or Figshare archive of the data (novartis_cleaned_mono_unique_notraindata.sdf etc) under a CCZero license (which is quite like the MIT license but then for data)?

No pre-trained model

Hi,

I have setup the conda environment, but when I run:

python predict_sdf.py pka_ligands.sdf pka_ligands_pred.sdf
Loading SDF...
2 molecules loaded
Loading model...
Traceback (most recent call last):
  File "predict_sdf.py", line 41, in <module>
    with open('RF_CV_FMorgan3_pKa.pkl', 'rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'RF_CV_FMorgan3_pKa.pkl'

Would you be able to share the pretrained model?

I understand if it has confidential data, this might not be possible, but I thought I would ask. :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.