wenjinw / latin-prompt Goto Github PK

View Code? Open in Web Editor NEW

46.0 46.0 4.0 3.8 MB

License: MIT License

Python 93.74% Shell 6.26%

latin-prompt's People

Contributors

Stargazers

Watchers

Forkers

rohan598 cszjing jordy-vl

latin-prompt's Issues

LATIN-tuning

Will you opensource your implementation for Alpaca LATIN-tuning? :)

reproduction gives better results for Llama-v2 7B

See results @ https://wandb.ai/jordy-vlan/Layout/runs/fw39mx08/overview?workspace=user-jordy-vlan

For LLama-v2-chat 13B you report val ANLS on DocVQA of 0.4435, whereas my reproduction reaches 0.6239; any idea as to why/how?

Exact command used:

LATIN-Prompt/examples/llama_docvqa_due_azure.py --model_name_or_path llama2-7b-chat --dataset_name docvqa_due_azure --output_dir outputs --results_dir results --datas_dir /data/users/sbiswas/DocVQA --wandb_project Layout --run_name llama2-7b-chat__Prompt_task_instruction_space__docvqa_due_azure --prompt task_instruction_space --per_device_eval_batch_size 2

What differs is the eval_batch_size=2 and I am using "NousResearch/Llama-2-7b-chat-hf" instead of the official checkpoint.

You can check my fork here: https://github.com/Jordy-VL/LATIN-Prompt; I have added some niceties regarding:

loading LLama with 4-bit quantization
wandb logging
metric extensions such that for the validation set ANLS is also recorded per diagnostic category.

Latin-prompt

One of the innovations brought by your work is to have structure-preserving OCR (implemented as whitespacing and newlines) for the documents as part of the prompts.

From the first Figure in the paper I reckoned you gave literally the string "5_" to indicate 5 whitespaces. What is now the most correct way? Doesn't the tokenizer of the LLM just filter away multiple whitespaces?

VQA groups

Thank you for sharing the code and for the nice paper.
Can you direct me to where I can find the group devision for InfographicVQA dataset? (table 5)
If it is not public yet, can you make it available?

Fine-Tuning GPT 3.5

First off, wanted to say that this is a very interesting way to tackle the issue of layout within a prompt.

Do you have any plans to experiment with fine-tuning GPT3.5 Turbo with the DocVQA dataset? Would be interesting to see if results improve from zero-shot.

Reproducing Question

Hey guys!

Very interesting topic and high quality paper from the research team.

After reading the paper and reproducing it with reference to the repository, I had a few questions that led me to raise an issue.

Is it correct that you used the boundingboxes of lines and the boundingboxes of texts in the original OCR provided by RRC leaderboard, and if so, was the performance not good when you used the boundingboxes of texts?
I was wondering if the ANLS values in your paper were calculated with your own code or the results you submitted to the RRC Leaderboard?
If the answer to 1 is LINE, is the overall process correct to use LINE OCR TEXT and TEXT_BOXES to get LAYOUT_RECOVER using SPACE_LAYOUT function and PROMPT_TASK to request GPT-3.5 API?

Thanks.

(P.S. There seems to be a typo in the title of the README.md :) Promot )

wenjinw / latin-prompt Goto Github PK

latin-prompt's People

Contributors

Stargazers

Watchers

Forkers

latin-prompt's Issues

LATIN-tuning

reproduction gives better results for Llama-v2 7B

Latin-prompt

VQA groups

Fine-Tuning GPT 3.5

Reproducing Question

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent