Comments (19)
Hey! π This is something I'm working on! :)
from transformers.js.
ONNX weights β
https://huggingface.co/onnx-community/Florence-2-base-ft
Integrating into transformers.js now
from transformers.js.
@flatsiedatsie I got it working! :) Available in dev/v3 branch: #545 (comment)
from transformers.js.
ONNX weights β https://huggingface.co/onnx-community/Florence-2-base-ft
Integrating into transformers.js now
Can this be slimmedπ«£
I think it's already slimmed one.
from transformers.js.
@inisis Soon! π I'm still testing across the set of ~1000 Transformers.js models (link) to find issues like inisis/OnnxSlim#10, and it will be merged into the v3 branch soon!
from transformers.js.
Ah cool. I had also just fixed it :-D
const generatedNgram = new Map();
let nn = 0;
for (const ngram of ngrams) {
const prevNgram = ngram.slice(0, ngram.length - 1);
const prevNgramKey = nn++; //JSON.stringify(prevNgram);
const prevNgramValue = generatedNgram.get(prevNgramKey) ?? [];
prevNgramValue.push(ngram[ngram.length - 1]);
generatedNgram.set(prevNgramKey, prevNgramValue);
}
return generatedNgram;
from transformers.js.
Wow, it's definitely much faster. Very nice!
Great! π₯³
The descriptions aren't as useful though? But I'm going to keep playing around with that.
You might need to use one of their pre-selected prompts: https://huggingface.co/microsoft/Florence-2-base-ft/blob/e7a5acc73559546de6e12ec0319cd7cc1fa2437c/processing_florence2.py#L115-L117
- caption:
'What does the image describe?'
- detailed:
'Describe in detail what is shown in the image.'
- more detailed:
'Describe with a paragraph what is shown in the image.'
I've also uploaded the larger (800M) models: https://huggingface.co/onnx-community/Florence-2-large-ft or https://huggingface.co/onnx-community/Florence-2-large, which you can try out. If you do, I recommend selecting different quantizations with something like:
const model = await Florence2ForConditionalGeneration.from_pretrained(model_id, {
dtype: {
embed_tokens: 'fp16',
vision_encoder: 'fp32',
encoder_model: 'fp16',
decoder_model_merged: 'q4',
},
});
(you may need to mix and match these values; selecting from "fp32", "fp16", "q8", "q4")
from transformers.js.
I think with the WebGPU support this issue can be closed. Awesome stuff, thank you so much for your amazing work as always. I've implemented the basic CPU version in my project, but am keeping Moondream2 as the default for now since users might otherwise get confused at the response quality when they question the image with their custom prompts.
But for mass-describing images I would certainly pick Florence 2 now.
from transformers.js.
@inisis that's right! Already slimmed :)
from transformers.js.
@xenova so is onnxslim ready to be merged. ^-^
from transformers.js.
@xenova btw, if all tests finished, can onnxslim be merged into optimum π
from transformers.js.
@inisis I think that's a great idea! Feel free to open a feature request on that repo and I'll voice my support there π
from transformers.js.
@xenova I believe that you are a member of huggingface, can you have me π
from transformers.js.
I've just tried implementing it.
I'm seeing an error, but will keep trying.
image_to_text_worker.js:715 IMAGE TO TEXT WORKER: caught error calling model.generate: TypeError: Do not know how to serialize a BigInt
at JSON.stringify (<anonymous>)
at Function.getGeneratedNgrams (logits_process.js:370:1)
at Function.calcBannedNgramTokens (logits_process.js:387:1)
at Function._call (logits_process.js:401:1)
at closure (generic.js:20:1)
at Function._call (logits_process.js:89:1)
at closure (generic.js:20:1)
at Function.generate (models.js:1466:1)
*some time later
I tried to run your code example in a clean simple example, to rule out issues with my integration. But unfortunately the same error was raised:
![Screenshot 2024-06-22 at 17 09 31](https://private-user-images.githubusercontent.com/805405/341979555-c0eca782-c015-493b-a7d4-a06fc07ff6cd.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjEyMjQ5NzksIm5iZiI6MTcyMTIyNDY3OSwicGF0aCI6Ii84MDU0MDUvMzQxOTc5NTU1LWMwZWNhNzgyLWMwMTUtNDkzYi1hN2Q0LWEwNmZjMDdmZjZjZC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzE3JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcxN1QxMzU3NTlaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1hNzVlYjI5MDU4YWRjOGU4NDc3ZDZiMzcwZTIyYzc1YWRlMmQxNDMyYzg5ZTc4MThjZDQ1YTdhZDYyYjBiYzc4JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.5YU5LEGksZWEQvKGL0RHm69Aj_9epfdtVK-hfq14y0s)
from transformers.js.
Ah whoops I've updated that in my local branch but forgot to push. I've pushed and you can try again now.
from transformers.js.
Wow, it's definitely much faster. Very nice!
The descriptions aren't as useful though? But I'm going to keep playing around with that.
![Screenshot 2024-06-22 at 18 20 37](https://private-user-images.githubusercontent.com/805405/341983454-cd7bdc07-e7ff-4653-9855-202cbcfa4d28.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjEyMjQ5NzksIm5iZiI6MTcyMTIyNDY3OSwicGF0aCI6Ii84MDU0MDUvMzQxOTgzNDU0LWNkN2JkYzA3LWU3ZmYtNDY1My05ODU1LTIwMmNiY2ZhNGQyOC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzE3JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcxN1QxMzU3NTlaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT04YTMyZjRiOWM2ZGY3MGMwZDcwYzZhMDU2NTg1ZDFkMTIxZmE0ZTEyYTJkODcwNzJjNGVkZGY0MGIwNDliMjQ2JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.agk_eqjpIUu9At99fzyy23NRrbfZa1R7Tqv7RHVKcEs)
.
Odd that this prompt results in less detail :-D
.
from transformers.js.
I'll try that, thank you!
Could it be that with the new V3 the MusicGen streamer progress callback no longer works properly? I haven't tested these separately from my code though, could just be an issue with my code.
![Screenshot 2024-06-22 at 21 00 40](https://private-user-images.githubusercontent.com/805405/341992096-d37f579d-cc14-4446-af43-a6168eb8f68b.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjEyMjQ5NzksIm5iZiI6MTcyMTIyNDY3OSwicGF0aCI6Ii84MDU0MDUvMzQxOTkyMDk2LWQzN2Y1NzlkLWNjMTQtNDQ0Ni1hZjQzLWE2MTY4ZWI4ZjY4Yi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzE3JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcxN1QxMzU3NTlaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT05YTBjNTUwMmE4ZGI5N2Y0ZGFlYWNjZDc0NGI2MWNjNjQwZDg5Y2Q5MTVhZTgxNTZhMmE2MzEyYzFkZTJmZjcyJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.LkzUpxnfW-E5unaI_xYFQ6FpmDvvAjuJxDpG4kPRXtk)
I'm also seeing an error with nanoLlava. It's just a number:
from transformers.js.
I'm finding that the larger models are hit or miss.
- caption: 'What does the image describe?'
- detailed: 'Describe in detail what is shown in the image.'
- more detailed: 'Describe with a paragraph what is shown in the image.'
Does this list of captions mean that the model isn't designed for free-form question asking?
it sure seems like it:
from transformers.js.
i need to export my own custom florence2 model. how can I do it?
from transformers.js.
Related Issues (20)
- Add support for Gemma 2 templating HOT 2
- Error while using the library in nextjs (app based route) HOT 1
- Musicgen progress updates no longer work? HOT 1
- How to load version 3 from CDN? HOT 2
- convert.py has errors when i use yolov9 HOT 13
- How do I free up memory after transliteration HOT 1
- Model downloads or running on server? HOT 1
- Add Bitnet tokenizer HOT 2
- My extension was rejected from the Chrome Web Store for "including remotely hosted code in a Manifest V3 item" HOT 2
- Is translation via opus-mt-mul-en not supported? HOT 5
- Trying to run the Modnet example with nodejs on macOS result in Unknown model class "modnet", attempting to construct from base class. Model type for 'modnet' not found, assuming encoder-only architecture. HOT 1
- nllb-200-distilled-600M prunes itself to death? HOT 1
- Adding support for LivePortrait model onnx
- V3 audio transcription: aud.subarray is not a function HOT 8
- range error: array buffer allocation failed <- how to catch this error?
- transformers@latest: Unsupported model IR version: 9, max supported IR version: 8
- Support nomic-ai/nomic-embed-vision-v1.5 HOT 1
- AutoModel.from_pretrained - Which model is loaded HOT 1
- The scripts/convert.py script fails for a few reasons
- RAGatouille/Colbert support
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transformers.js.