Code Monkey home page Code Monkey logo

Comments (5)

urialon avatar urialon commented on July 28, 2024

Hi,
You can feed input with any number of classes, including two.
See:
https://github.com/tech-srl/code2vec#extending-to-other-languages
So instead of the method name, you can feed "true" and "false".

By the way,
code2seq has a better encoder than code2vec, and even though it can generate longer sequences, it can be used for binary classification in the same way.

from code2vec.

munybt avatar munybt commented on July 28, 2024

So, all I have to do is modifying the Extractor to change the first token in each line?

I visited the code2seq page and I couldn't tell the difference from code2vec. Could you please clarify?

from code2vec.

urialon avatar urialon commented on July 28, 2024

Exactly.

In general, the encoder of code2seq is better because it learns paths and names using LSTMs, node-by-node, while in code2vec the paths are monolithic.
For additional differences between code2vec and code2seq, see Section 6 in the code2seq paper.

from code2vec.

DavidSUN2 avatar DavidSUN2 commented on July 28, 2024

Hi @munybt @urialon,
I'm sorry though this has been closed.
If I train this model for binary classification problem with label 'true' and 'false', according to your discussion above, I only need to modify extractor file. I have 2 related question:

  1. I have checked the model.py file. I found the following loss function:
            loss = tf.reduce_sum(tf.nn.sparse_softmax_cross_entropy_with_logits(
                labels=tf.reshape(words_input, [-1]),
                logits=logits)) / batch_size

do I need to change this part used in model.py file? For example, replace tf.nn.sparse_softmax_cross_entropy_with_logits with tf.nn.sigmoid_cross_entropy_with_logits.
Or any other part in model.py I need to modify for binary classification problem?
2. About extractor file, which specific part do I need to change? Could you please give me some hints?
Thanks in advance.

from code2vec.

urialon avatar urialon commented on July 28, 2024

Hi @DavidSUN2 ,
I don't think you need to change the loss. Binary classification is a special case of multi-class classification. As long as the model "sees" only two possible labels during training - this is binary classification.
If you change the loss to tf.nn.sigmoid_cross_entropy_with_logits you can allow the model to assign multi labels, i.e, the same example can be both "true" and "false".

Regarding the extractor - it depends on where do you get your true/false labels from.
This is the line that prints the method name as the "label":
https://github.com/tech-srl/code2vec/blob/master/JavaExtractor/JPredict/src/main/java/JavaExtractor/FeaturesEntities/ProgramFeatures.java#L21

So instead of stringBuilder.append(name) you can stringBuilder.append anything else. I hope it helps.

from code2vec.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.