Code Monkey home page Code Monkey logo

huggingfacemodeldownloader's Introduction

HuggingFace Model Downloader

The HuggingFace Model Downloader is a utility tool for downloading models/datasets from the HuggingFace website. It provides multithreaded downloading capabilities for LFS files and ensures the integrity of the downloaded models by checking their SHA256 checksum.

Reason

Git LFS was so slow for me, and I cloudn't find a single binary that I can just run to download any model. In addition, this might be integrated later in my future projects for inference using golang/python combination

One Line Installer (linux/mac/windows WSL2)

the script will download the correct version based on os/arch and save the binary as "hfdownloader" in the same folder

bash <(curl -sSL https://g.bodaay.io/hfd) -h

to install it to default OS bin folder

bash <(curl -sSL https://g.bodaay.io/hfd) -i

it will automatically request higher 'sudo' previlages if required, you can specify the install destination by adding -p

bash <(curl -sSL https://g.bodaay.io/hfd) -i -p ~/.local/bin/

Quick Download and Run Exmaples (linux/mac/windows WSL2)

the bash script will just download the binary based on your os/arch and run it

Download Model: TheBloke/orca_mini_7B-GPTQ

bash <(curl -sSL https://g.bodaay.io/hfd) -m TheBloke/orca_mini_7B-GPTQ

Download Model: TheBloke/vicuna-13b-v1.3.0-GGML and get GGML Variant: q4_0

bash <(curl -sSL https://g.bodaay.io/hfd) -m TheBloke/vicuna-13b-v1.3.0-GGML:q4_0

Download Model: TheBloke/vicuna-13b-v1.3.0-GGML and get GGML Variants: q4_0,q5_0, and save each one in a separate folder

bash <(curl -sSL https://g.bodaay.io/hfd) -f -m TheBloke/vicuna-13b-v1.3.0-GGML:q4_0,q5_0

Download Model: TheBloke/vicuna-13b-v1.3.0-GGML and save them into /workspace/, 8 connections and get GGML Variant: q4_0,q4_K_S

bash <(curl -sSL https://g.bodaay.io/hfd) -m TheBloke/vicuna-13b-v1.3.0-GGML:q4_0,q4_K_S -c 8 -s /workspace/

Usage:

hfdownloader [flags]

Flags:

-m, --model string
Model/Dataset name (required if dataset not set)

You can supply filters for required LFS model files, separate filters by adding commas

filters will discard any LFS file ending with .bin,.act,.safetensors,.zip thats missing the supplied filtercd out

-m TheBloke/WizardLM-Uncensored-Falcon-7B-GGML:fp16 # this will download LFS file contains: fp16
-m TheBloke/WizardLM-33B-V1.0-Uncensored-GGML:q4_K_S,q5_K_M # this will download LFS file contains: q4_K_S  or  q5_K_M

-d, --dataset string
Model/Dataset name (required if model not set)

-f, --appendFilterFolder bool
Append the filter name to the folder, use it for GGML ONLY qunatizatized filterd download only (optional)

 # this will download LFS file contains: q4_K_S  or  q5_K_M, in a separate folders by appending the filder name to model folder name
 # all other non-lfs files and not ending with one of these extension: .bin,.safetensors,.meta,.zip will be availale in each folder
-f -m TheBloke/WizardLM-33B-V1.0-Uncensored-GGML:q4_K_S,q5_K_M

-k, --skipSHA bool
SKip SHA256 checking for LFS files, usefull when trying to resum interrupted download and complet missing files quickly (optional)

-b, --branch string
Model/Dataset branch (optional) (default "main")

-s, --storage string
Storage path (optional) (default "Storage")

-c, --concurrent int
Number of LFS concurrent connections (optional) (default 5)

-t, --token string
HuggingFace Access Token, this can be automatically supplied by env variable 'HUGGING_FACE_HUB_TOKEN' or .env file (recommended), required for some Models/Datasets, you still need to manually accept agreement if model requires it (optional)

-i, --install bool
Install the binary to the OS default bin folder (if installPath not specified), Unix-like operating systems only

-p, --installPath string
used with -i to copy the binary to spceified path, default to: /usr/local/bin/ (optional)

-h, --help
Help for hfdownloader

Model Example

hfdownloader  -m TheBloke/WizardLM-13B-V1.0-Uncensored-GPTQ -c 10 -s MyModels

Dataset Example

hfdownloader  -d facebook/flores -c 10 -s MyDatasets

Features

  • Nested File Downloading of the Model
  • Multithreaded downloading of large files (LFS)
  • Filter Downloads, specic LFS models files can be specified for downloading (Usefull for GGMLs), saving time and space
  • Simple utlity that can used as library easily or just a single binary, all functionality in one go file and can be imported in any project
  • SHA256 checksum verification for LFS downloaded models
  • Skipping previsouly downloaded files
  • Resume progress for Interrupted downloads for LFS files
  • Simple File size matching for non-LFS files
  • Support HuggingFace Access Token, for restricted models/datasets

huggingfacemodeldownloader's People

Contributors

bodaay avatar crasm avatar fkcptlst avatar hoveychen avatar julien-c avatar riccardopinosio avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

huggingfacemodeldownloader's Issues

Unable to use filter. windows build.

as said in title, i am unable to use the filter option,
tool proceeds to download all models from the smallest ones.

.\hfdownloader_windows_amd64_1.2.7.exe -m TheBloke/CodeLlama-34B-Python-GGUF:Q6_K
Model: TheBloke/CodeLlama-34B-Python-GGUF:Q6_K
Filter Has been applied, will include LFS Model Files that contains: [q6_k]
Downloading CodeLLama2-Python/TheBloke_CodeLlama-34B-Python-GGUF/codellama-34b-python.Q2_K.gguf Speed: 28.44 MB/sec, 100.00%
Hash Matched for LFS file: CodeLLama2-Python/TheBloke_CodeLlama-34B-Python-GGUF/codellama-34b-python.Q2_K.gguf
Downloading CodeLLama2-Python/TheBloke_CodeLlama-34B-Python-GGUF/codellama-34b-python.Q3_K_L.gguf Speed: 28.31 MB/sec, 26.60%

i was so happy finding this tool from author's answer on StackOverflow
thanks a lot!

Support Falcon 180B Q4_0 and above

With the release of Falcon 180B, we now have much larger model files than LLaMA2, with up to 4 splits for Q8_0. In reference to #9, the download logic should be updated to handle arbitrary file splits more gracefully.

Failed hash check logic passes as a completed download, and a typo

I have been downloading several models but occasionally notice that there are model folders much smaller than they should be. My internet connection is a crappy long distance microwave connection that is error prone. It turns out that, when there is a hash fail, somehow, the program then exits with a download success, like this:

Hash failed for LFS file: Storage/alpindale_goliath-120b/model-00003-of-00024.safetensors
Download of alpindale/goliath-120b completed successfully

I am using the command:

bash <(curl -sSL https://g.bodaay.io/hfd) -m alpindale/goliath-120b

Here is the full output from this particular run, which was actually a re-run because the download must have quietly failed before too:

File already exists. Skipping download.
Model: alpindale/goliath-120b
Branch: main
Storage: Storage
NumberOfConcurrentConnections: 5
Append Filter Names to Folder: false
Skip SHA256 Check: false
Token: 

Getting File Download Files List Tree from: https://huggingface.co/api/models/alpindale/goliath-120b/tree/main/
Checking Existsing file: Storage/alpindale_goliath-120b/.gitattributes
file size matched for non LFS file: Storage/alpindale_goliath-120b/.gitattributes
Checking Existsing file: Storage/alpindale_goliath-120b/README.md
file size matched for non LFS file: Storage/alpindale_goliath-120b/README.md
Checking Existsing file: Storage/alpindale_goliath-120b/config.json
file size matched for non LFS file: Storage/alpindale_goliath-120b/config.json
Skipping: Storage/alpindale_goliath-120b/.gitattributes
Skipping: Storage/alpindale_goliath-120b/README.md
Skipping: Storage/alpindale_goliath-120b/config.json

Downloading Storage/alpindale_goliath-120b/model-00001-of-00024.safetensors Speed: 4.70 MB/sec, 100.00% 
Merging Storage/alpindale_goliath-120b/model-00001-of-00024.safetensors Chunks
Checking SHA256 Hash for LFS file: Storage/alpindale_goliath-120b/model-00001-of-00024.safetensors
Hash Matched for LFS file: Storage/alpindale_goliath-120b/model-00001-of-00024.safetensors

Downloading Storage/alpindale_goliath-120b/model-00002-of-00024.safetensors Speed: 4.84 MB/sec, 100.00% 
Downloading Storage/alpindale_goliath-120b/model-00002-of-00024.safetensors Speed: 4.84 MB/sec, 100.00% 
Checking SHA256 Hash for LFS file: Storage/alpindale_goliath-120b/model-00002-of-00024.safetensors
Hash Matched for LFS file: Storage/alpindale_goliath-120b/model-00002-of-00024.safetensors

Downloading Storage/alpindale_goliath-120b/model-00003-of-00024.safetensors Speed: 4.95 MB/sec, 59.40% Error downloading chunk 2: stream error: stream ID 1; INTERNAL_ERROR; received from peer
warning: attempt 1 / 3 failed, error: Error downloading chunk 2: stream error: stream ID 1; INTERNAL_ERROR; received from peer
Downloading Storage/alpindale_goliath-120b/model-00003-of-00024.safetensors Speed: 4.95 MB/sec, 59.62% 
Downloading Storage/alpindale_goliath-120b/model-00003-of-00024.safetensors Speed: 4.95 MB/sec, 59.78% ee/main/
Checking Existsing file: Storage/alpindale_goliath-120b/.gitattributes
file size matched for non LFS file: Storage/alpindale_goliath-120b/.gitattributes
Checking Existsing file: Storage/alpindale_goliath-120b/README.md
file size matched for non LFS file: Storage/alpindale_goliath-120b/README.md
Checking Existsing file: Storage/alpindale_goliath-120b/config.json
file size matched for non LFS file: Storage/alpindale_goliath-120b/config.json
Downloading Storage/alpindale_goliath-120b/model-00003-of-00024.safetensors Speed: 4.93 MB/sec, 61.79% 
Hash Matched for LFS file: Storage/alpindale_goliath-120b/model-00001-of-00024.safetensors
Downloading Storage/alpindale_goliath-120b/model-00003-of-00024.safetensors Speed: 4.92 MB/sec, 63.59% 
Hash Matched for LFS file: Storage/alpindale_goliath-120b/model-00002-of-00024.safetensors
Skipping: Storage/alpindale_goliath-120b/.gitattributes
Skipping: Storage/alpindale_goliath-120b/README.md
Skipping: Storage/alpindale_goliath-120b/config.json
Skipping: Storage/alpindale_goliath-120b/model-00001-of-00024.safetensors
Downloading Storage/alpindale_goliath-120b/model-00003-of-00024.safetensors Speed: 4.91 MB/sec, 63.60% 
Found existing incomplete download for the file: model-00003-of-00024.safetensors
Forcing Number of connections to: 5



Downloading Storage/alpindale_goliath-120b/model-00003-of-00024.safetensors Speed: 8.24 MB/sec, 100.00%    0% 
Downloading Storage/alpindale_goliath-120b/model-00003-of-00024.safetensors Speed: 3.54 MB/sec, 89.38% 
Downloading Storage/alpindale_goliath-120b/model-00003-of-00024.safetensors Speed: 3.54 MB/sec, 89.41% 
Hash failed for LFS file: Storage/alpindale_goliath-120b/model-00003-of-00024.safetensors
Download of alpindale/goliath-120b completed successfully

And secondarily, because I pasted the code above and it had spellcheck for the output, and I noticed a typo. I'll put it here, but would you rather me make a new issue just for this?

Checking Existsing file: Storage/alpindale_goliath-120b/.gitattributes

"Existsing" is spelled wrong, it should be, "Existing"

My Download has gone above 100 percent, 121.98%, and is still going up and downloading.

My Download has gone above 100 percent, 121.98%, and is still going up and downloading.
I dunno what to do. the LFS temp folder that has 5 downloads i check their size and it is bigger than my target model.
about 21% higher as the download says. when will it stop and finishes the download? it just keeps going forever. it's been 2 nights now with my internet speed.

FYI shockingly slow transfer in wsl2

Great idea to build this- Model downloading is unnecessarily annoying with HF.

Very slow xfer in wsl2

Downloading Storage/turboderp_CodeLlama-34B-instruct-exl2/output-00001-of-00004.safetensors Speed: 1.12 MB/sec, 1.24%

At the same time fast.com is giving me upwards of 500mbit/sec.
LFS clone is 20x faster on this machine usually
It's also eating a pretty significant amount of CPU considering what it's doing - 30% on a i9-12900k

1491 root 20 0 1606540 42548 7840 S 31.7 0.1 1:08.95 hfdownloader

Ubuntu 22.04

/mnt/g/exllama# hfdownloader -m turboderp/CodeLlama-34B-instruct-exl2 -t access_token -b 6.0bpw
Model: turboderp/CodeLlama-34B-instruct-exl2
Branch: 6.0bpw
Storage: Storage
NumberOfConcurrentConnections: 5
Append Filter Names to Folder: false
Skip SHA256 Check: false
Token: some_token
_

Am I doing something wrong?
Thanks

download stops

I'm using an external hard drive (not sure if it's related to the issue) but the download stops at merging chunks and verifying hashes. It's starting again by pressing enter (but not from the first enter)

Resuming aborted download does not work

  1. Start downloading a large model
  2. Break your internet connection
  3. Resume internet connection (maybe with a different source ip)

Expected result: Resumes download at previous position
Actual: Timeout (and the timeout takes very long)

Error downloading chunk 3: read tcp 192.168.68.145:56343->108.157.214.31:443: read: can't assign requested address

  1. Start downloading a large model
  2. Abort the program (Ctrl+C)
  3. Restart the download for the same model

Expected result: Resumes download at previous position
Actual result: Seems to restart download from scratch

linux x86 binary fails on WSL

Using WSL Linux DESKTOP-MH2FEL7 5.10.16.3-microsoft-standard-WSL2 #1 SMP Fri Apr 2 22:23:49 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

I get:

$ bash <(curl -sSL https://g.bodaay.io/hfd)
Download successful
./hfdownloader: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by ./hfdownloader)
./hfdownloader: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by ./hfdownloader)

WSL uses GLIBC 2.31 which I guess is too old to be supported by this binary.

Cursor jittering in Tmux

Cursor constantly moving back and forth if I press enter even once during download.

hf-downloader-issue.mp4
image also such repeated text at times

Error downloading a specific version

Thanks for adding a specific version setting as discussed.
I have an issue with this:
./hfdownloader -m TheBloke/WizardLM-33B-V1.0-Uncensored-GGML:q4_K_M
Model: TheBloke/WizardLM-33B-V1.0-Uncensored-GGML:q4_K_M
Branch: main
DestinationPath: Storage
NumberOfConcurrentConnections: 5
Token:

Filter Has been applied, will include LFS Model Files that contains: [q4_k_m]Error: unlinkat Storage/TheBloke_WizardLM-33B-V1.0-Uncensored-GGML/tmp: directory not empty
2023/06/25 05:56:44 Error: unlinkat Storage/TheBloke_WizardLM-33B-V1.0-Uncensored-GGML/tmp: directory not empty

I can rerun the download by removing the folder manually.

[Feature Request] Filter by filetypes?

Hey there! Great utility! It's so nice not having to code something up in python to get similar behavior to the hf_hub_download function.

One request I have, however, would be the ability to filter by model filetype.

Or, heck, maybe this already exists, and I'm mis-understanding how the filtering works.

For example:

If downloading from stabilityai/stable-diffusion-xl-base-1.0, they have a myriad of different model types/weights/etc.

But, for my purposes, I really only care about the .json and .safetensors files - and even then, only the ones in subdirectories.

While I know the latter part would probably make things get super-complicated, I feel like being able to only pull down .safetensors and .json files would be a bit easier to implement?

Or, if the functionality is already there...could you kindly let me know how to filter this way?

Much appreciated.

Feature Request: Silent mode

I'm using hfdownloader in a docker container and I really would like to have to option to slience the progress to only have one "line" per file. Otherwise you will quickly see something like "buildx output clipped 200KiB/s reached" as every update to the progress bar is written to the build log.

something like --progress=[interactive,static,disabled] might be good with static only giving an output once a download starts and once it finishes.

[feature] proxy need

Is it possible to develop a proxy address feature, in my country, the direct download https://huggingface.co network is not connected, you need to use a proxy, this proxy method
export HF_ENDPOINT=https://hf-mirror.com

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.