Feature request It describes images Test on HF:<

ONNX weights ✅ <a href="https://huggingface.co/onnx-community/Florence-2-base-ft" rel=

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

ONNX weights ✅ <a href="https://huggingface.co/onnx-community/Florence-2-

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Ah cool. I had also just fixed it :-D <div class="snippet-clipboard-content notran

Wow, it's definitely much faster. Very nice! <p dir="au

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Add support for Florence 2? about transformers.js HOT 19 CLOSED

flatsiedatsie commented on July 17, 2024 1

Add support for Florence 2?

from transformers.js.

Comments (19)

xenova commented on July 17, 2024 5

Hey! 👋 This is something I'm working on! :)

from transformers.js.

xenova commented on July 17, 2024 3

ONNX weights ✅ https://huggingface.co/onnx-community/Florence-2-base-ft
Integrating into transformers.js now

from transformers.js.

xenova commented on July 17, 2024 2

@flatsiedatsie I got it working! :) Available in dev/v3 branch: #545 (comment)

from transformers.js.

inisis commented on July 17, 2024 1

ONNX weights ✅ https://huggingface.co/onnx-community/Florence-2-base-ft

Integrating into transformers.js now

Can this be slimmed🫣

I think it's already slimmed one.

from transformers.js.

xenova commented on July 17, 2024 1

@inisis Soon! 🚀 I'm still testing across the set of ~1000 Transformers.js models (link) to find issues like inisis/OnnxSlim#10, and it will be merged into the v3 branch soon!

from transformers.js.

flatsiedatsie commented on July 17, 2024 1

Ah cool. I had also just fixed it :-D

const generatedNgram = new Map();
		let nn = 0;
        for (const ngram of ngrams) {
            const prevNgram = ngram.slice(0, ngram.length - 1);
            const prevNgramKey = nn++; //JSON.stringify(prevNgram);
            const prevNgramValue = generatedNgram.get(prevNgramKey) ?? [];
            prevNgramValue.push(ngram[ngram.length - 1]);
            generatedNgram.set(prevNgramKey, prevNgramValue);
        }
        return generatedNgram;

from transformers.js.

xenova commented on July 17, 2024 1

Wow, it's definitely much faster. Very nice!

Great! 🥳

The descriptions aren't as useful though? But I'm going to keep playing around with that.

You might need to use one of their pre-selected prompts: https://huggingface.co/microsoft/Florence-2-base-ft/blob/e7a5acc73559546de6e12ec0319cd7cc1fa2437c/processing_florence2.py#L115-L117

caption: 'What does the image describe?'
detailed: 'Describe in detail what is shown in the image.'
more detailed: 'Describe with a paragraph what is shown in the image.'

I've also uploaded the larger (800M) models: https://huggingface.co/onnx-community/Florence-2-large-ft or https://huggingface.co/onnx-community/Florence-2-large, which you can try out. If you do, I recommend selecting different quantizations with something like:

const model = await Florence2ForConditionalGeneration.from_pretrained(model_id, {
    dtype: {
        embed_tokens: 'fp16',
        vision_encoder: 'fp32',
        encoder_model: 'fp16',
        decoder_model_merged: 'q4',
    },
});

(you may need to mix and match these values; selecting from "fp32", "fp16", "q8", "q4")

from transformers.js.

flatsiedatsie commented on July 17, 2024 1

I think with the WebGPU support this issue can be closed. Awesome stuff, thank you so much for your amazing work as always. I've implemented the basic CPU version in my project, but am keeping Moondream2 as the default for now since users might otherwise get confused at the response quality when they question the image with their custom prompts.

But for mass-describing images I would certainly pick Florence 2 now.

from transformers.js.

xenova commented on July 17, 2024

@inisis that's right! Already slimmed :)

from transformers.js.

inisis commented on July 17, 2024

@xenova so is onnxslim ready to be merged. ^-^

from transformers.js.

inisis commented on July 17, 2024

@xenova btw, if all tests finished, can onnxslim be merged into optimum 🚀

from transformers.js.

xenova commented on July 17, 2024

@inisis I think that's a great idea! Feel free to open a feature request on that repo and I'll voice my support there 😎

from transformers.js.

inisis commented on July 17, 2024

@xenova I believe that you are a member of huggingface, can you have me 😎

from transformers.js.

flatsiedatsie commented on July 17, 2024

I've just tried implementing it.

I'm seeing an error, but will keep trying.

image_to_text_worker.js:715 IMAGE TO TEXT WORKER: caught error calling model.generate:  TypeError: Do not know how to serialize a BigInt
    at JSON.stringify (<anonymous>)
    at Function.getGeneratedNgrams (logits_process.js:370:1)
    at Function.calcBannedNgramTokens (logits_process.js:387:1)
    at Function._call (logits_process.js:401:1)
    at closure (generic.js:20:1)
    at Function._call (logits_process.js:89:1)
    at closure (generic.js:20:1)
    at Function.generate (models.js:1466:1)

*some time later

I tried to run your code example in a clean simple example, to rule out issues with my integration. But unfortunately the same error was raised:

from transformers.js.

xenova commented on July 17, 2024

Ah whoops I've updated that in my local branch but forgot to push. I've pushed and you can try again now.

from transformers.js.

flatsiedatsie commented on July 17, 2024

Wow, it's definitely much faster. Very nice!

The descriptions aren't as useful though? But I'm going to keep playing around with that.

Odd that this prompt results in less detail :-D

Moondream for comparison:

from transformers.js.

flatsiedatsie commented on July 17, 2024

I'll try that, thank you!

Could it be that with the new V3 the MusicGen streamer progress callback no longer works properly? I haven't tested these separately from my code though, could just be an issue with my code.

I'm also seeing an error with nanoLlava. It's just a number:

from transformers.js.

flatsiedatsie commented on July 17, 2024

I'm finding that the larger models are hit or miss.

good:

bad:

- caption: 'What does the image describe?'
- detailed: 'Describe in detail what is shown in the image.'
- more detailed: 'Describe with a paragraph what is shown in the image.'

Does this list of captions mean that the model isn't designed for free-form question asking?

it sure seems like it:

good:

bad:

from transformers.js.

Vasanthengineer4949 commented on July 17, 2024

i need to export my own custom florence2 model. how can I do it?

from transformers.js.

Add support for Florence 2? about transformers.js HOT 19 CLOSED

Comments (19)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent