Code Monkey home page Code Monkey logo

Comments (6)

Anindyadeep avatar Anindyadeep commented on July 28, 2024 1

Well, that answers all my doubts. Thanks a lot, @RahulSChand. I learned some new stuff here and it seems like I need to revisit some of those nuances. But thanks again.

from gpu_poor.

RahulSChand avatar RahulSChand commented on July 28, 2024

@Anindyadeep Thanks for letting me know. I checked this issue & ran QLoRA training with 1000 context length on my 4090 24GB GPU. Below is the memory screenshot (it takes ~23 GB & the website also gives you same values.

image

As far as the link that you provided is concerned where only 16GB of memory is used. This is because the finetuning is done on alpaca dataset (which has context length of around 700) Link: https://github.com/gururise/AlpacaDataCleaned#finetune-considerations. For length=700, website gives you 15GB memory requirement (same as the image you posted).

The memory requirements depend on the context length (activation memory) since there are many (length, dim) & (dim, dim, head) intermediate states generated in forward pass (which are also needed for backward pass). These vectors are not updated (they don't have grad) but they are needed to compute grad of LoRA params. So your memory requirement can increase a lot with context length.

from gpu_poor.

Anindyadeep avatar Anindyadeep commented on July 28, 2024

Ahh, that makes quite a sense. Actually, I was pretty blown up by watching the GPU numbers, but the same blog post showed that it takes 240 GB (6 x 40 GB) for full fine-tuning on the same dataset. So it now kinda makes sense that it will take more with 4096 context length (~ quadratic increase).

However, can you please clear one more doubt, like right now, why the memory requirement is > memory requirement in LoRA?

from gpu_poor.

RahulSChand avatar RahulSChand commented on July 28, 2024

@Anindyadeep sorry I didn't get your last question. What do you mean by "memory requirement is > memory requirement in LoRA"? Do you mean that the website is giving memory requirement for QLoRA as being larger than LoRA? I checked & this doesn't seem to be the case for your configuration (codellama-7b, 2048 context length).

Let me know if I am misunderstanding your question.

from gpu_poor.

Anindyadeep avatar Anindyadeep commented on July 28, 2024

Here:

For Full-finetuning

image


For LoRA

image


For QLoRA

image

from gpu_poor.

RahulSChand avatar RahulSChand commented on July 28, 2024

@Anindyadeep oh okay got it. This is because for QLoRA & any other bitsandbytes quantization (https://github.com/TimDettmers/bitsandbytes) method there is an overhead during forward pass (this overhead is usually small when context length is small). This is also present if you use bitsandbytes llm.int8 quantization

So even though QLoRA is smaller than LoRA (theoretically), the quantization overhead introduced by bitsandbytes can offset this when context length is large.

Below is an approximate way to calculate this overhead (this is an empirical way that I figured after lots of trial & error with 3b/7b/13b models & bitsandbytes QLoRA runs)

QLoRA overhead = (15*hidden_dim + 6*intermediate_dim) x (numLayers) x contextLen x 0.75 bytes

I am also not sure what happens at high context length regime (maybe for large context lengths like >2048 this approximation is very wrong and overhead doesn't grow linearly with contextLen). This is something I need to check

from gpu_poor.

Related Issues (9)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.