Code Monkey home page Code Monkey logo

latin-prompt's People

Contributors

wenjinw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

latin-prompt's Issues

LATIN-tuning

Will you opensource your implementation for Alpaca LATIN-tuning? :)

reproduction gives better results for Llama-v2 7B

See results @ https://wandb.ai/jordy-vlan/Layout/runs/fw39mx08/overview?workspace=user-jordy-vlan

For LLama-v2-chat 13B you report val ANLS on DocVQA of 0.4435, whereas my reproduction reaches 0.6239; any idea as to why/how?

Exact command used:

LATIN-Prompt/examples/llama_docvqa_due_azure.py --model_name_or_path llama2-7b-chat --dataset_name docvqa_due_azure --output_dir outputs --results_dir results --datas_dir /data/users/sbiswas/DocVQA --wandb_project Layout --run_name llama2-7b-chat__Prompt_task_instruction_space__docvqa_due_azure --prompt task_instruction_space --per_device_eval_batch_size 2

What differs is the eval_batch_size=2 and I am using "NousResearch/Llama-2-7b-chat-hf" instead of the official checkpoint.

You can check my fork here: https://github.com/Jordy-VL/LATIN-Prompt; I have added some niceties regarding:

  • loading LLama with 4-bit quantization
  • wandb logging
  • metric extensions such that for the validation set ANLS is also recorded per diagnostic category.

Latin-prompt

One of the innovations brought by your work is to have structure-preserving OCR (implemented as whitespacing and newlines) for the documents as part of the prompts.

From the first Figure in the paper I reckoned you gave literally the string "5_" to indicate 5 whitespaces. What is now the most correct way? Doesn't the tokenizer of the LLM just filter away multiple whitespaces?

VQA groups

Thank you for sharing the code and for the nice paper.
Can you direct me to where I can find the group devision for InfographicVQA dataset? (table 5)
If it is not public yet, can you make it available?

Fine-Tuning GPT 3.5

First off, wanted to say that this is a very interesting way to tackle the issue of layout within a prompt.

Do you have any plans to experiment with fine-tuning GPT3.5 Turbo with the DocVQA dataset? Would be interesting to see if results improve from zero-shot.

Reproducing Question

Hey guys!

Very interesting topic and high quality paper from the research team.

After reading the paper and reproducing it with reference to the repository, I had a few questions that led me to raise an issue.

  1. Is it correct that you used the boundingboxes of lines and the boundingboxes of texts in the original OCR provided by RRC leaderboard, and if so, was the performance not good when you used the boundingboxes of texts?

  2. I was wondering if the ANLS values in your paper were calculated with your own code or the results you submitted to the RRC Leaderboard?

  3. If the answer to 1 is LINE, is the overall process correct to use LINE OCR TEXT and TEXT_BOXES to get LAYOUT_RECOVER using SPACE_LAYOUT function and PROMPT_TASK to request GPT-3.5 API?

Thanks.

(P.S. There seems to be a typo in the title of the README.md :) Promot )

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.