Code Monkey home page Code Monkey logo

Comments (24)

collant avatar collant commented on August 27, 2024 16

It should be possible to train it on new data. You can run finetune.py after you prepare your dataset.

You can use the Standord Alpaca method for data generation: Data Generation Using OpenAI APIs

So basically, you collect some news papers here and there, you use OpenAI to generate Question/Answer pairs (A lot). And then, you just run the finetune.py on the prepared jsonl dataset.

If you want to automate all of this, it should be possible, but you won't get the answer in real time (at least 5 hours training on RTX 4090).

There is also soft fine tuning, where you give GPT-4 the latest news in the prompt and ask about it, which should be possible to do in real time and you can automate it as well.

The main reason why Alpaca-lora it is not real time yet, is the context length (how much information can you provide in the prompt). I have no information about Alpaca-lora context length at the moment.

Also, OpenAI have a fine tuning API that works in less than 10mn. They should be using something like LORA as well. In near term you should be able to crawl breaking news about a town and update your fine tuned model every 1 hour.

Another method, is to do what Bing does, you query a search engine in real time, it gives you a snippet, you put that snippet into the prompt and append the user question to it.

from alpaca-lora.

baleksey avatar baleksey commented on August 27, 2024 6

@collant Thank you for reply! But there is a problem with mentioned approach. Current finetuning is based on input->output pairs which makes model to learn how to behave (and for sure can add some new knowledge from dataset answers). But when I want to teach the model about some new scientific paper for example, I don't know what type of questions I want to ask in the future and what type of answer I want to get from that specific paper. In other words I don't have that input/output pairs for current finetuning.

What I need is to throw the whole huge paper/book into the model so It memorize it. Then use model's analytic abilities to extract the right facts from memorized huge text (so I can't just include it to input because it 100 times bigger then input tokens limitation) and give me answers.

from alpaca-lora.

collant avatar collant commented on August 27, 2024 5

But when I want to teach the model about some new scientific paper for example, I don't know what type of questions I want to ask in the future

You don't need to actually come up with all the questions, you just give it some questions and answers about a specific paper, that's how it will be able to construct a sort of knowledge and it will be able to answer new questions using its analytic abilities.

So you will need a pipeline that will automate question generation. Here is an example:

  • You provide GPT-4 with one page of the scientific paper + abstract + conclusion and ask it to generate questions.
  • You ask it to answer those questions.
  • You do this in a loop for all the pages you want.
  • With the collected dataset you fine tune the model with the question/answers generated from a list of papers.
  • When the model is fine tuned, you can ask it other questions that are not in the dataset.

You can augment the GPT-4 questions with a hidden prompt that you do not need to provide to the fine tuned model.
For example, the first line in the prompt asks it to cite the current paper or other papers for each sentence in the answer.

from alpaca-lora.

collant avatar collant commented on August 27, 2024 5

You can also use Alpaca-lora (this repo) to generate questions and answer them:

Question generation

image

Answer one question at a time

image

But I don't think the 7B model quality would be good enough for this kind of tasks. Someone needs to try with the 65B + Human supervision.

from alpaca-lora.

olihough86 avatar olihough86 commented on August 27, 2024 2

Hi @LoopControl I'm looking to do the exact same thing, take a bunch of text data on a niche subject and smash it into the model via a LoRA, any chance you could make a quick play by play? I can set everything up, I know Python but the actual training process is unfamiliar to me

from alpaca-lora.

collant avatar collant commented on August 27, 2024 1

Is there a way to train from long-form text/articles/stories without using the question-answer format?

Can I just feed in 1 full article text at a time through the generate_prompt function in finetune.py (without the ### Instruction and ### Response and such) to train?

I'd like to train a model for general-purpose writing (fiction and non fiction).

Yes, that's actually the main thing LLMs are used for, you can just provide the text, truncated each time, maybe overlapping.

You can modify the generate_prompt method to put your dataset as is.

I wouldn't go too much on general puprose writing, maybe just fine tune it on one story + different endings. So the user can steer the story on inference time to their likings.

from alpaca-lora.

LoopControl avatar LoopControl commented on August 27, 2024 1

I tried to tune down all of the parameters like MICRO_BATCH_SIZE, BATCH_SIZE , EPOCH, but none seems help. I wonder what else I can do if I want to do this on 4090?

@collant I completed a full fine tuning with default settings with my own data (micro batch = 4, token cutoff = 256) earlier today. Thanks for the help and suggestions!

@suing I tried to go to higher cutoff lengths but it’s difficult with only 24GB of vram - best I could manage was 512 token length by dropping micro batch size to 2. Dropping batch size also multiplies the training time so I think 256 is the best we’ll have till 4-bit/GPTQ training is available.

from alpaca-lora.

LoopControl avatar LoopControl commented on August 27, 2024 1

@olihough86 I basically just did exactly what @collant suggested above. I used the lengths.ipnb file to generate snippets of the training data (I just took around 1600 character snippets randomly from each file). I then loaded that dataset in the finetune.py file and ran the training (and modify the generate_prompt to just return the text by itself without all the ### Input and such).

@T-Atlas I've ran the finetune on both a 3090 and on an old Tesla P40 (which is older than your V100; the P40 has only 6.1 compute capability vs your 7.0 compute capability on the V100) but only at 256 context length. The P40 took around 18-20GB at 256 context. I couldn't get 512 context length to run on the P40 because of VRAM limit without dropping the micro-batch-size to 2.

My guess is that because the V100, like my P40, has less than compute capability of 7.5 for full 8-bit support, the 8-bit mode (bitsandbytes python module) uses a lot more VRAM than it does on a card that fully supports it (like the 3090/4090).

The bitsandbytes repo ( https://github.com/TimDettmers/bitsandbytes ) has this listed under "Hardware requirements":

* LLM.int8(): NVIDIA Turing (RTX 20xx; T4) or Ampere GPU (RTX 30xx; A4-A100); (a GPU from 2018 or older).
* 8-bit optimizers and quantization: NVIDIA Kepler GPU or newer (>=GTX 78X).

The P40 and V100 have the 2nd line item but not the first (llm.int8 support).

from alpaca-lora.

HideLord avatar HideLord commented on August 27, 2024

Isn't there a way to train embeddings? Not sure how well it translates, but in stable diffusion, if you want it to draw an object it hasn't yet seen, you feed it a couple of images of said object and train an embedding. After that, when you add it to the prompt, it knows what you're talking about.

from alpaca-lora.

collant avatar collant commented on August 27, 2024

Isn't there a way to train embeddings? Not sure how well it translates, but in stable diffusion, if you want it to draw an object it hasn't yet seen, you feed it a couple of images of said object and train an embedding. After that, when you add it to the prompt, it knows what you're talking about.

Yes, that also should work. I think Lora model behaves as an embeddings NN.

See this example where they use Lora in Stable diffusion with pretty good results: Hugging Face Lora training

from alpaca-lora.

collant avatar collant commented on August 27, 2024

I might have missed the point, because they use this approach to mostly direct the model in a certain style; and not to give it some new knowledge. But after all, the model memorizes things by being trained on them.

from alpaca-lora.

baleksey avatar baleksey commented on August 27, 2024

@collant Thank you for clarification! This is definitely a good option to try and see if it get the desired results.

Will check Hugging Face Lora training as well. It wold be better not to use GPT-4 every time for such kind of tasks if this method works fine.

from alpaca-lora.

collant avatar collant commented on August 27, 2024

I can even see one small empirical evidence of new knowledge with the alpaca_data.json from Stanford.

{
        "instruction": "Discuss the causes of the Great Depression",
        "input": "",
        "output": "The Great Depression was a severe global economic downturn that lasted from 1929 to 1939. It was caused by several factors, including the stock market crash of 1929, structural weaknesses in the economy, low consumer spending, high unemployment, and a lack of confidence in the banking system. Other contributing factors included overproduction of commodities, reliance on borrowing to finance investments, agricultural overproduction, and a decrease in international trade."
    }

image

After all the model is fine tuned to provide those answers.

from alpaca-lora.

0xbitches avatar 0xbitches commented on August 27, 2024

@collant were the screenshots from a custom gradio? looks neat.
Sadly I agree that 7b (and 13b) are nowhere near sufficient for anything too serious.

from alpaca-lora.

collant avatar collant commented on August 27, 2024

@collant were the screenshots from a custom gradio? looks neat. Sadly I agree that 7b (and 13b) are nowhere near sufficient for anything too serious.

These are from gradio, run by the generate.py file in this repo.

from alpaca-lora.

bino282 avatar bino282 commented on August 27, 2024

I can even see one small empirical evidence of new knowledge with the alpaca_data.json from Stanford.

{
        "instruction": "Discuss the causes of the Great Depression",
        "input": "",
        "output": "The Great Depression was a severe global economic downturn that lasted from 1929 to 1939. It was caused by several factors, including the stock market crash of 1929, structural weaknesses in the economy, low consumer spending, high unemployment, and a lack of confidence in the banking system. Other contributing factors included overproduction of commodities, reliance on borrowing to finance investments, agricultural overproduction, and a decrease in international trade."
    }

image

After all the model is fine tuned to provide those answers.

You only finetune on new data or also on old data

from alpaca-lora.

LoopControl avatar LoopControl commented on August 27, 2024

Is there a way to train from long-form text/articles/stories without using the question-answer format?

Can I just feed in 1 full article text at a time through the generate_prompt function in finetune.py (without the ### Instruction and ### Response and such) to train?

I'd like to train a model for general-purpose writing (fiction and non fiction).

from alpaca-lora.

siuying avatar siuying commented on August 27, 2024

Hi,

I'm trying to fine tune some other data. For these data, I believe I need a larger CUTOFF_LEN, so I set them to 1024 or 2048. However I see a CUDA out of memory error if I do that.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 23.65 GiB total capacity; 21.58 GiB already allocated; 433.69 MiB free; 22.04 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I tried to tune down all of the parameters like MICRO_BATCH_SIZE, BATCH_SIZE , EPOCH, but none seems help. I wonder what else I can do if I want to do this on 4090?

from alpaca-lora.

T-Atlas avatar T-Atlas commented on August 27, 2024

I tried to tune down all of the parameters like MICRO_BATCH_SIZE, BATCH_SIZE , EPOCH, but none seems help. I wonder what else I can do if I want to do this on 4090?

@collant I completed a full fine tuning with default settings with my own data (micro batch = 4, token cutoff = 256) earlier today. Thanks for the help and suggestions!

@suing I tried to go to higher cutoff lengths but it’s difficult with only 24GB of vram - best I could manage was 512 token length by dropping micro batch size to 2. Dropping batch size also multiplies the training time so I think 256 is the best we’ll have till 4-bit/GPTQ training is available.

Hey @LoopControl , I have noticed a strange phenomenon. I trained using the same training data on both V100 32G and 4090, with the same training parameters as follows.

MICRO_BATCH_SIZE = 4  # this could actually be 5 but i like powers of 2
BATCH_SIZE = 128
GRADIENT_ACCUMULATION_STEPS = BATCH_SIZE // MICRO_BATCH_SIZE
EPOCHS = 3  # we don't always need 3 tbh
LEARNING_RATE = 3e-4  # the Karpathy constant
CUTOFF_LEN = 512  # 256 accounts for about 96% of the data
LORA_R = 8
LORA_ALPHA = 16
LORA_DROPOUT = 0.05

However, I found that this setting consumes almost all VRAM on V100 (32411/32510 MB), while only using about half (12273/23028 MB) on 4090. I am curious about what causes this situation. I noticed this UserWarning on V100: WARNING: Compute capability <7.5 detected! Only slow 8-bit matmul is supported for your GPU! Does it have anything to do with it? And if I want to maximize the use of 4090's VRAM, can I simply increase MICRO_BATCH_SIZE? Because I tried to increase BATCH_SIZE on V100, but it didn't affect VRAM usage.

from alpaca-lora.

baleksey avatar baleksey commented on August 27, 2024

@LoopControl Could you tell about the results of such snippets training? How your finetuned model behave now and what it actually knows? Do you happy with results?

from alpaca-lora.

T-Atlas avatar T-Atlas commented on August 27, 2024

@olihough86我基本上只是做了什么@collant上面建议。我使用该lengths.ipnb文件生成训练数据的片段(我只是从每个文件中随机抽取了大约 1600 个字符片段)。然后我将该数据集加载到finetune.py文件中并运行训练(并修改它generate_prompt以仅返回文本本身而没有所有### Input等等)。

@T-Atlas我在 3090 和旧的 Tesla P40(比您的 V100 更旧;P40 的计算能力只有 6.1,而 V100 的计算能力为 7.0)上运行了微调,但只有 256 个上下文长度。P40 在 256 上下文中占用了大约 18-20GB。由于 VRAM 限制,我无法在 P40 上运行 512 上下文长度而不将微批量大小降低到 2。

我的猜测是,因为 V100 和我的 P40 一样,对于完全支持 8 位的计算能力低于 7.5,所以 8 位模式(bitsandbytespython 模块)使用的 VRAM 比完全支持它的卡上使用的要多得多(如 3090/4090)。

bitsandbytes repo ( https://github.com/TimDettmers/bitsandbytes ) 在“硬件要求”下列出了这个:

* LLM.int8(): NVIDIA Turing (RTX 20xx; T4) or Ampere GPU (RTX 30xx; A4-A100); (a GPU from 2018 or older).
* 8-bit optimizers and quantization: NVIDIA Kepler GPU or newer (>=GTX 78X).

P40 和 V100 有第二行项目但没有第一行(llm.int8支持)。

@olihough86我基本上只是做了什么@collant上面建议。我使用该lengths.ipnb文件生成训练数据的片段(我只是从每个文件中随机抽取了大约 1600 个字符片段)。然后我将该数据集加载到finetune.py文件中并运行训练(并修改它generate_prompt以仅返回文本本身而没有所有### Input等等)。

@T-Atlas我在 3090 和旧的 Tesla P40(比您的 V100 更旧;P40 的计算能力只有 6.1,而 V100 的计算能力为 7.0)上运行了微调,但只有 256 个上下文长度。P40 在 256 上下文中占用了大约 18-20GB。由于 VRAM 限制,我无法在 P40 上运行 512 上下文长度而不将微批量大小降低到 2。

我的猜测是,因为 V100 和我的 P40 一样,对于完全支持 8 位的计算能力低于 7.5,所以 8 位模式(bitsandbytespython 模块)使用的 VRAM 比完全支持它的卡上使用的要多得多(如 3090/4090)。

bitsandbytes repo ( https://github.com/TimDettmers/bitsandbytes ) 在“硬件要求”下列出了这个:

* LLM.int8(): NVIDIA Turing (RTX 20xx; T4) or Ampere GPU (RTX 30xx; A4-A100); (a GPU from 2018 or older).
* 8-bit optimizers and quantization: NVIDIA Kepler GPU or newer (>=GTX 78X).

P40 和 V100 有第二行项目但没有第一行(llm.int8支持)。

Training with default finetune.py settings on the P40 took around 18-20GB so I don't see why it would use upwards of 30GB on yours.
I think your explanation makes sense because I used my own dataset, which is basically twice the length of the original repo. Additionally, I used MICRO_BATCH_SIZE = 4 and CUTOFF_LEN = 512, which works perfectly on my V100 with 32GB of VRAM.

from alpaca-lora.

mcmonkey4eva avatar mcmonkey4eva commented on August 27, 2024

@ HideLord

Isn't there a way to train embeddings? Not sure how well it translates, but in stable diffusion, if you want it to draw an object it hasn't yet seen, you feed it a couple of images of said object and train an embedding. After that, when you add it to the prompt, it knows what you're talking about.

The direct equivalent of an SD Textual Inversion Embedding would be a SoftPrompt: https://github.com/KoboldAI/KoboldAI-Client/wiki/Soft-Prompts

A LoRA in an LLM is, uh, equivalent to a LoRA in SD lol.

from alpaca-lora.

wiss84 avatar wiss84 commented on August 27, 2024

I have a question regarding the the finetuning on a custom dataset , so i have a dataset with 2 columns Questions and Answers, I am trying to train a llama 13b on this dataset, so i can later ask questions on the inference notebook and get an answer similar to the answer in the second column, i know its a noob question and im just starting in the world of llms, so i would really be thankful to any1 who can give me a basic idea on how to do that, i was inspired by the alpaca lora project from stanford, but im not sure if its related to my problem since their dataset was 3 colunms instruction input and output

from alpaca-lora.

tmm1 avatar tmm1 commented on August 27, 2024

@LoopControl Could you tell about the results of such snippets training? How your finetuned model behave now and what it actually knows? Do you happy with results?

Curious to hear the results as well.

from alpaca-lora.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.