Code Monkey home page Code Monkey logo

mol2vec's People

Contributors

samoturk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mol2vec's Issues

Convert back Base64encoded string of ROMol into an object of <rdkit.Chem.rdchem.Mol>

I have a base64encoded string of ROMOL which needs to be converted back to it's original object form. I am using the base64encoded string to write the data into the text format, but after reading the text format I need to convert it back into it's object so that I can generate embeddings using the object via the function "mol2alt_sentence".

I am only able to save the base64encoded romol in string format instead of serialized object. Therefore, the functionality of converting the string back to the object is required. Please suggest any alternative solution for this issue.

Transfer learning

Hi, first, thanks for making this great OSS library, much appreciated.

Im interested in taking the pretrained ZINC based model and further train it with sentences from my data set. The reason for this approach is that my dataset is small, only a few thousand SMILES. So Im trying to take a page from the CNN book where you can take a pretrained image recognition model and further specialized for your use case.

Is there any way to achieve that within your library?

Feature size

Thank for providing the implementation. I have a question when I ran the code: when input various molecules, the mol2alt_sentence function seems to output encodings with various lengths thus the final DfVec also include multiple 300 dimension features. I am wondering how to aggregate them?

I am just using the standard pipeline but with different input molecules:
sentence=mol2alt_sentence(mol, radius=1) sentence_obj=MolSentence(sentence) mol_vec=DfVec(sentences2vec(sentence_obj, model, unseen='UNK'))

Thanks!

Feng

How Do You Convert RDKIT molecule to your fingerprint key ?

Hi !

We are very excited about using your project. We have the notebook samples working and we want to try on our own RDKIT molecules.

Unfortunately, we don't know how to convert RDKIT molecules into the morgan fingerprints that you are using as keys into the embedding dictionary. We can convert RDKIT molecules to bit vectors, but can't seem to match RDKIT molecules into your non-bit-vector representation (integers?).

Please advise !

Thanks !

Convert sentences to mol

After applying mol2alt_sentence to get the molecular sentence, is there any way to convert this back to the Mol object?

E.g. I have the sentence ['1016841875', '198706261', '2245384272', '2909042096', '2245384272', '2909042096', '1016841875', '198706261'] - can I convert this back to an rdkit.Chem.rdchem.Mol object?

I have found the object mol2vec.helpers.IdentifierTable but I'm unsure what's used for or if its helpful.

PS: the mol2vec project is a great implementation and very helpful for my research so far!

update sentences2vec function for gensim 4.0

def sentences2vec(sentences, model, unseen=None):
    """Generate vectors for each sentence (list) in a list of sentences. Vector is simply a
    sum of vectors for individual words.
    
    Parameters
    ----------
    sentences : list, array
        List with sentences
    model : word2vec.Word2Vec
        Gensim word2vec model
    unseen : None, str
        Keyword for unseen words. If None, those words are skipped.
        https://stats.stackexchange.com/questions/163005/how-to-set-the-dictionary-for-text-analysis-using-neural-networks/163032#163032

    Returns
    -------
    np.array
    """
    
    keys = set(model.wv.key_to_index)
    vec = []
    
    if unseen:
        unseen_vec = model.wv.get_vector(unseen)

    for sentence in sentences:
        if unseen:
            vec.append(sum([model.wv.get_vector(y) if y in set(sentence) & keys
                       else unseen_vec for y in sentence]))
        else:
            vec.append(sum([model.wv.get_vector(y) for y in sentence 
                            if y in set(sentence) & keys]))
    return np.array(vec)```

Type Error when running with Python 3.8

When running with Python 3.8, I get the following error message:

Featurizing molecules.
Traceback (most recent call last):
File "/Users/dkazempour/opt/anaconda3/bin/mol2vec", line 8, in
sys.exit(run())
File "/Users/dkazempour/opt/anaconda3/lib/python3.8/site-packages/mol2vec/app/mol2vec.py", line 165, in run
args.func(args)
File "/Users/dkazempour/opt/anaconda3/lib/python3.8/site-packages/mol2vec/app/mol2vec.py", line 25, in do_featurize
features.featurize(args.in_file, args.out_file, args.model, args.radius, args.uncommon)
File "/Users/dkazempour/opt/anaconda3/lib/python3.8/site-packages/mol2vec/features.py", line 465, in featurize
word2vec_model[uncommon]
TypeError: 'Word2Vec' object is not subscriptable

Which library is causing this issue?

Update: I recognized that my observation is related to the other issue titled "update sentences2vec function for gensim 4.0" by Maledive.

The causing library is gensim. Something has changed at the 4.x.x versions, which yields the above stated error.

A temporary 'fix' (actually a quite quick-n-dirty hack) is as follows: pip install -Iv gensim==3.8.2
Afterwards I could successfully run mol2vec again.

filter criteria

Hi, first, thanks for making this great OSS library, much appreciated.

In the article, it is indicated that only the following elements are allowed to appear in the smiles molecule. Will lowercase letters be included? Some atoms Such as c,o,h,n....
image

I can't download Zinc15. May you provide a way to download it.

Error when loading model

Hi, Thanks for putting together the notebook to explore mol2vec. I am getting the following error however when loading the model using model = KeyedVectors.load('model_300dim.pkl')

AttributeError: 'Word2Vec' object has no attribute 'vocabulary'

I am using gensim v 3.3.0.

Also, I noticed there are 2 versions of the 'model_300dim.pkl' file, one is around 25 Mbs and another around 74 Mbs. Which one should be used? I have tried both versions and see the same error. Thanks for any help!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.