Code Monkey home page Code Monkey logo

Comments (5)

urialon avatar urialon commented on June 26, 2024

Hi Avra,
Thank you for your interest in this work! Sorry again for the delayed response.

Yes, in order to use code2vec for python, you will have to train the model on a python dataset.
Have you seen this section in the README? https://github.com/tech-srl/code2vec#extending-to-other-languages

Uri

from code2seq.

Avv22 avatar Avv22 commented on June 26, 2024

@urialon. astminer team helped me producing training, testing and validation python.c2v data. So, how we should proceed next please to train code2vec model? Then once we train the model, can we feed the same data to produce embeddings as we have 20k python files that we split into train, test and validate before feeding them to astminer tool. Once we train code2vec, we would like to feed the same python code to produce embeddings, so what do you think please?

from code2seq.

urialon avatar urialon commented on June 26, 2024

I am not sure how your python.c2v look like, but try to continue running the preprocess.sh script starting from this line: https://github.com/tech-srl/code2seq/blob/master/preprocess.sh#L54 (and adapt all paths according to your files).

from code2seq.

Avv22 avatar Avv22 commented on June 26, 2024

@urialon.

Thank you. So I will train your model on 150k python dataset. How to please save the model to use it later on another python dataset to generate embeddings? Does the preprocessor.sh does it automatically and save the model please?

We would like too once we train the model on 150k python dataset you specified to use to generate later on one embedding vector for each python file we have in our own dataset, can we do that please? We don't want to generate method name but one embedding that is representative of a file. We would like to do the same for our 20k python files.

from code2seq.

urialon avatar urialon commented on June 26, 2024

Hi @Avra2 ,

preprocess.sh just preprocesses the data, it does not even train the model.
However, train.sh trains and saves the checkpoints. See:
https://github.com/tech-srl/code2vec#training-a-model-from-scratch

from code2seq.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.