Code Monkey home page Code Monkey logo

Comments (19)

xenova avatar xenova commented on July 17, 2024 5

Hey! πŸ‘‹ This is something I'm working on! :)

from transformers.js.

xenova avatar xenova commented on July 17, 2024 3

ONNX weights βœ… https://huggingface.co/onnx-community/Florence-2-base-ft
Integrating into transformers.js now

from transformers.js.

xenova avatar xenova commented on July 17, 2024 2

@flatsiedatsie I got it working! :) Available in dev/v3 branch: #545 (comment)

from transformers.js.

inisis avatar inisis commented on July 17, 2024 1

ONNX weights βœ… https://huggingface.co/onnx-community/Florence-2-base-ft

Integrating into transformers.js now

Can this be slimmed🫣

I think it's already slimmed one.

from transformers.js.

xenova avatar xenova commented on July 17, 2024 1

@inisis Soon! πŸš€ I'm still testing across the set of ~1000 Transformers.js models (link) to find issues like inisis/OnnxSlim#10, and it will be merged into the v3 branch soon!

from transformers.js.

flatsiedatsie avatar flatsiedatsie commented on July 17, 2024 1

Ah cool. I had also just fixed it :-D

const generatedNgram = new Map();
		let nn = 0;
        for (const ngram of ngrams) {
            const prevNgram = ngram.slice(0, ngram.length - 1);
            const prevNgramKey = nn++; //JSON.stringify(prevNgram);
            const prevNgramValue = generatedNgram.get(prevNgramKey) ?? [];
            prevNgramValue.push(ngram[ngram.length - 1]);
            generatedNgram.set(prevNgramKey, prevNgramValue);
        }
        return generatedNgram;

from transformers.js.

xenova avatar xenova commented on July 17, 2024 1

Wow, it's definitely much faster. Very nice!

Great! πŸ₯³

The descriptions aren't as useful though? But I'm going to keep playing around with that.

You might need to use one of their pre-selected prompts: https://huggingface.co/microsoft/Florence-2-base-ft/blob/e7a5acc73559546de6e12ec0319cd7cc1fa2437c/processing_florence2.py#L115-L117

  • caption: 'What does the image describe?'
  • detailed: 'Describe in detail what is shown in the image.'
  • more detailed: 'Describe with a paragraph what is shown in the image.'

I've also uploaded the larger (800M) models: https://huggingface.co/onnx-community/Florence-2-large-ft or https://huggingface.co/onnx-community/Florence-2-large, which you can try out. If you do, I recommend selecting different quantizations with something like:

const model = await Florence2ForConditionalGeneration.from_pretrained(model_id, {
    dtype: {
        embed_tokens: 'fp16',
        vision_encoder: 'fp32',
        encoder_model: 'fp16',
        decoder_model_merged: 'q4',
    },
});

(you may need to mix and match these values; selecting from "fp32", "fp16", "q8", "q4")

from transformers.js.

flatsiedatsie avatar flatsiedatsie commented on July 17, 2024 1

I think with the WebGPU support this issue can be closed. Awesome stuff, thank you so much for your amazing work as always. I've implemented the basic CPU version in my project, but am keeping Moondream2 as the default for now since users might otherwise get confused at the response quality when they question the image with their custom prompts.

But for mass-describing images I would certainly pick Florence 2 now.

from transformers.js.

xenova avatar xenova commented on July 17, 2024

@inisis that's right! Already slimmed :)

from transformers.js.

inisis avatar inisis commented on July 17, 2024

@xenova so is onnxslim ready to be merged. ^-^

from transformers.js.

inisis avatar inisis commented on July 17, 2024

@xenova btw, if all tests finished, can onnxslim be merged into optimum πŸš€

from transformers.js.

xenova avatar xenova commented on July 17, 2024

@inisis I think that's a great idea! Feel free to open a feature request on that repo and I'll voice my support there 😎

from transformers.js.

inisis avatar inisis commented on July 17, 2024

@xenova I believe that you are a member of huggingface, can you have me 😎

from transformers.js.

flatsiedatsie avatar flatsiedatsie commented on July 17, 2024

I've just tried implementing it.

I'm seeing an error, but will keep trying.

image_to_text_worker.js:715 IMAGE TO TEXT WORKER: caught error calling model.generate:  TypeError: Do not know how to serialize a BigInt
    at JSON.stringify (<anonymous>)
    at Function.getGeneratedNgrams (logits_process.js:370:1)
    at Function.calcBannedNgramTokens (logits_process.js:387:1)
    at Function._call (logits_process.js:401:1)
    at closure (generic.js:20:1)
    at Function._call (logits_process.js:89:1)
    at closure (generic.js:20:1)
    at Function.generate (models.js:1466:1)

*some time later

I tried to run your code example in a clean simple example, to rule out issues with my integration. But unfortunately the same error was raised:

Screenshot 2024-06-22 at 17 09 31

from transformers.js.

xenova avatar xenova commented on July 17, 2024

Ah whoops I've updated that in my local branch but forgot to push. I've pushed and you can try again now.

from transformers.js.

flatsiedatsie avatar flatsiedatsie commented on July 17, 2024

Wow, it's definitely much faster. Very nice!

The descriptions aren't as useful though? But I'm going to keep playing around with that.

Screenshot 2024-06-22 at 18 20 37

.

Odd that this prompt results in less detail :-D
Screenshot 2024-06-22 at 18 27 27

.

Moondream for comparison:
Screenshot 2024-06-22 at 18 51 49

from transformers.js.

flatsiedatsie avatar flatsiedatsie commented on July 17, 2024

I'll try that, thank you!

Could it be that with the new V3 the MusicGen streamer progress callback no longer works properly? I haven't tested these separately from my code though, could just be an issue with my code.

Screenshot 2024-06-22 at 21 00 40

I'm also seeing an error with nanoLlava. It's just a number:
Screenshot 2024-06-22 at 21 05 03

from transformers.js.

flatsiedatsie avatar flatsiedatsie commented on July 17, 2024

I'm finding that the larger models are hit or miss.

good:
Screenshot 2024-06-26 at 19 22 30

bad:
Screenshot 2024-06-26 at 19 45 10

- caption: 'What does the image describe?'
- detailed: 'Describe in detail what is shown in the image.'
- more detailed: 'Describe with a paragraph what is shown in the image.'

Does this list of captions mean that the model isn't designed for free-form question asking?

it sure seems like it:

good:
Screenshot 2024-06-26 at 19 55 29

bad:
Screenshot 2024-06-26 at 19 57 16

from transformers.js.

Vasanthengineer4949 avatar Vasanthengineer4949 commented on July 17, 2024

i need to export my own custom florence2 model. how can I do it?

from transformers.js.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.