Comments (7)
The limitation you mentioned would be for selectively showing the LoRA args, correct?
Yes. But also for the --data argument or the --generate subcommand etc. These are technical details that are currently covered by jsonargparse automagically and an alternative might require you to do something very different. This is why I insist to create a PoC that mimics the script/args/config structure to see how difficult it becomes.
You could also completely change the args and config structure if you are doing breaking changes anyway
from litgpt.
You should accompany any decision with a PoC of how to implement it. I say this because (to the best of my knowledge) a call like litgpt finetune <repo_id> --method "lora"
will be difficult to make it work with jsonargparse. If you dissect that call then it means that you have a function that is called from the finetune
subcommand of the litgpt
CLI:
def dispatch_finetune(
repo_id, # required
method="lora",
):
if method == "lora":
from litgpt.finetune.lora import main
main(repo_id)
elif ...
where based on the arguments you call a different function. Jsonargparse needs to understand this to pull out the arguments from the inner function (main
above) to expose them in the help message and configs. When I tried this in the past, I couldn't make it work.
So it might need to be replaced with an alternative tool like click
which is more flexible at creating complex arbitrary CLIs. With the tradeoff that you might lose support for automatic types from typing signatures, extra code complexity, repeated parsing configs in multiple places, a different config system...
It will depend strongly on the tool chosen for the job proposed.
from litgpt.
See also my previous comment on this topic: #996 (comment)
from litgpt.
The limitation you mentioned would be for selectively showing the LoRA args, correct?
An alternative would be to show all finetune arguments (full, adapter, lora). I think users will know that the LoRA parameters only have an effect if they select --method lora
. This would of course not be so neat as the current version, but this would at least work in the meantime. (And we can maybe revisit other parsers some day; or wait for a jsonargparse version that might support it).
Switching to click could be an option longer term, but I think this would be a bigger lift.
from litgpt.
On 2) Could we keep it pretraining from scratch by default? If not, then there would have to be a very loud warning IMO, and a way to opt out of auto loading a checkpoint. How would that look like?
To add to Carlos' comment, if a CLI rewrite is considered we would have to be super sure it can support all our use cases and requirements. There might also be an option to work closer with the jsonargparse author if we're blocked by missing features.
from litgpt.
On 2) Could we keep it pretraining from scratch by default? If not, then there would have to be a very loud warning IMO, and a way to opt out of auto loading a checkpoint. How would that look like?
Personally, I have a slight preference to keep it pretraining from scratch, because it's also what most users would expect in my opinion.
from litgpt.
To summarize from our meeting this morning, an easier path forward might be to use
litgpt finetune_full
litgpt finetune_lora
litgpt finetune_adapter
litgpt finetune_adapter_v2
where we also keep litgpt finetune
as an alias for litgpt finetune_lora
.
To keep things simple for newcomers, we would only show litgpt finetune
in the main readme and then introduce the other ones
litgpt finetune_full
litgpt finetune_lora
litgpt finetune_adapter
litgpt finetune_adapter_v2
in the finetuning docs (plus the litgpt --help
description).
from litgpt.
Related Issues (20)
- Some confusion about weight conversion, as I need to use other engineering to evaluate my LLM HOT 2
- Upgrade LitData
- validation output during finetuning HOT 3
- mistralai/Mistral-7B-v0.3 support HOT 4
- How to set max_iters HOT 5
- Specify cache for huggingface openwebtext download HOT 1
- Training lasts just 150 seconds for TinyLlama OpenWebtext dataset
- Mixtral 8x22B support HOT 4
- Using custom data for `Continue pretraining an LLM` HOT 4
- The difference between FSDPStrategy and DeepSpeedStrategy during pre-training
- Missing python dependencies for running pretrain tutorial HOT 1
- Finetune lora max_seq_length error HOT 4
- Continue finetuning HOT 2
- Support non-int batch_size argument "auto" with litgpt evaluate HOT 3
- Command -> litgpt download openlm-research/open_llama_13b gives error: Unrecognized arguments: openlm-research/open_llama_13b HOT 4
- Finetuning with multiple gpus extemely slow HOT 2
- Batched inference on a single node with multiple GPUs HOT 9
- Gradient Accumulation Step under Multi-node Pretaining HOT 8
- Support a new model HOT 7
- Compiled inference failed: "Global state changed while dynamo tracing" HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from litgpt.