Comments (5)
Hi,
You can feed input with any number of classes, including two.
See:
https://github.com/tech-srl/code2vec#extending-to-other-languages
So instead of the method name, you can feed "true" and "false".
By the way,
code2seq has a better encoder than code2vec, and even though it can generate longer sequences, it can be used for binary classification in the same way.
from code2vec.
So, all I have to do is modifying the Extractor to change the first token in each line?
I visited the code2seq page and I couldn't tell the difference from code2vec. Could you please clarify?
from code2vec.
Exactly.
In general, the encoder of code2seq is better because it learns paths and names using LSTMs, node-by-node, while in code2vec the paths are monolithic.
For additional differences between code2vec and code2seq, see Section 6 in the code2seq paper.
from code2vec.
Hi @munybt @urialon,
I'm sorry though this has been closed.
If I train this model for binary classification problem with label 'true' and 'false', according to your discussion above, I only need to modify extractor file. I have 2 related question:
- I have checked the model.py file. I found the following loss function:
loss = tf.reduce_sum(tf.nn.sparse_softmax_cross_entropy_with_logits(
labels=tf.reshape(words_input, [-1]),
logits=logits)) / batch_size
do I need to change this part used in model.py file? For example, replace tf.nn.sparse_softmax_cross_entropy_with_logits
with tf.nn.sigmoid_cross_entropy_with_logits
.
Or any other part in model.py I need to modify for binary classification problem?
2. About extractor file, which specific part do I need to change? Could you please give me some hints?
Thanks in advance.
from code2vec.
Hi @DavidSUN2 ,
I don't think you need to change the loss. Binary classification is a special case of multi-class classification. As long as the model "sees" only two possible labels during training - this is binary classification.
If you change the loss to tf.nn.sigmoid_cross_entropy_with_logits
you can allow the model to assign multi labels, i.e, the same example can be both "true" and "false".
Regarding the extractor - it depends on where do you get your true/false labels from.
This is the line that prints the method name as the "label":
https://github.com/tech-srl/code2vec/blob/master/JavaExtractor/JPredict/src/main/java/JavaExtractor/FeaturesEntities/ProgramFeatures.java#L21
So instead of stringBuilder.append(name)
you can stringBuilder.append
anything else. I hope it helps.
from code2vec.
Related Issues (20)
- Preprocessor step disposing numbers in (variable) names HOT 4
- How to release a model HOT 1
- Repeating metric values HOT 3
- Model for other task. HOT 2
- I run this "python3 code2vec.py --load models/dataset/saved_model_iter2 --test data/dataset/dataset.test.c2v" and I got this issue! is there any help? HOT 5
- I don't know how to apply the output files created by astminer. HOT 1
- Can I get the exact values for the context HOT 2
- Matrix size-incompatible during using sample model HOT 2
- bias-variance tradeoff HOT 1
- Application to real case study HOT 11
- Javascript Benchmark with Code2Vec HOT 3
- There is no entire model and model weights file to load HOT 4
- How to create code embeddings from Java codebase and store it in a vector database? HOT 4
- Issues encountered when processing big data HOT 1
- File Not found error HOT 2
- Queries regarding Java Extractor HOT 1
- Which version of JDK do I need to install before running this project? HOT 3
- How to create code2vec input HOT 8
- Queries on ...dict.c2v file HOT 1
- Is there any library or API available for generating embeddings of each line of a Java code file while preserving AST (Abstract Syntax Tree) structure information? I'm already familiar with fold2vec. Are there any other alternatives? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from code2vec.