It seems that the Odex prompts fed to the model have a trailing whitespace, and this d

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Stripping the prompt can improve model performance about odex HOT 4 CLOSED

zorazrw commented on September 13, 2024

Stripping the prompt can improve model performance

from odex.

Comments (4)

murthyrudra commented on September 13, 2024 1

Hi @zorazrw , this is the command I had run

python nl2code_codegen.py --language en --model_size 2B --model_data mono \
 --num_tests_input 0 --num_tests_eval 100 --num_examples 0 --temperature 0.8 \
 --top_p 0.95 --num_return_sequences 50

This is my environment

- `transformers` version: 4.24.0
- Platform: Linux-4.18.0-425.13.1.el8_7.x86_64-x86_64-with-glibc2.17
- Python version: 3.8.11
- Huggingface_hub version: 0.11.1
- PyTorch version (GPU?): 1.12.1 (True)
- Tensorflow version (GPU?): 2.12.0 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No

Please let me know if you need any other information

from odex.

zorazrw commented on September 13, 2024

Nice catch in the whitespace stripping! Also thanks a lot for doing the comparison studies.
I tried to reproduce the results: adding prompt.strip() did improve the results a lot, but results on my end are ~10 points lower than your reported ones, as shown below:

Overall Pass@K Scores: 
[pass@1] 0.3465 (439)
[pass@2] 0.4220 (439)
[pass@3] 0.4615 (439)
[pass@4] 0.4861 (439)
[pass@5] 0.5027 (439)
[pass@6] 0.5147 (439)
[pass@7] 0.5236 (439)
[pass@8] 0.5304 (439)
[pass@9] 0.5358 (439)
[pass@10] 0.5399 (439)

Would you be able to provide more configuration details? Or spot any that may differ?

from odex.

neubig commented on September 13, 2024

@zorazrw : is this fixed?

from odex.

zorazrw commented on September 13, 2024

Yes, we are able to get similar results using the current code that includes whitespace cleaning.

Overall Pass@K Scores:
[pass@1] 0.4160 (439)
[pass@2] 0.4701 (439)
[pass@3] 0.4945 (439)
[pass@4] 0.5085 (439)
[pass@5] 0.5177 (439)
[pass@6] 0.5241 (439)
[pass@7] 0.5291 (439)
[pass@8] 0.5330 (439)
[pass@9] 0.5361 (439)
[pass@10] 0.5385 (439)

Considering the randomness of sampling, this should be close enough to the results in the first comment.

The results we report in the paper have a slightly larger variance for smaller Ks since by default we sampled 10 predictions instead of 50.

from odex.

Stripping the prompt can improve model performance about odex HOT 4 CLOSED

Comments (4)

Related Issues (7)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent