Code Monkey home page Code Monkey logo

Comments (6)

feizc avatar feizc commented on June 10, 2024 1

@feizc how are you approaching the problem of generating starting from a length that is less than the prefix?

Actually, I use a fixed length of conditional context, i.e., prefix length of prior music, to continue writing the next melody.

In my opinion, to start from zero, we can use special token like [pad] to supplement the prefix length, or only use decoder to generate an initial sentence then generate conditioned on latents.

I read the source code and find the author begin with zero :)


def gen_initial_events(): 

> events = np.zeros([device_count, batch_size, max_events_length], np.int32)

> events[:, :, 0] = dataset.SOS_ID 

> return events

from perceiver-ar-pytorch.

usryokousha avatar usryokousha commented on June 10, 2024 1

After reviewing the current implementation (autoregressive_wrapper) it seems you generate each subsequent token one at a time as would be the case in most architectures. The authors of the perceiver-ar paper outlined a strided approach (typically the size of the self-attention sequence length) where the sampled tokens would be cached up to a certain size and then the buffer would be freed. Have you considered implementing this? The actual released implementation perceiver-ar is relatively easy to follow.

from perceiver-ar-pytorch.

lucidrains avatar lucidrains commented on June 10, 2024

πŸŽΆπŸ€–πŸ˜„

from perceiver-ar-pytorch.

lucidrains avatar lucidrains commented on June 10, 2024

@feizc how are you approaching the problem of generating starting from a length that is less than the prefix?

from perceiver-ar-pytorch.

lucidrains avatar lucidrains commented on June 10, 2024

After reviewing the current implementation (autoregressive_wrapper) it seems you generate each subsequent token one at a time as would be the case in most architectures. The authors of the perceiver-ar paper outlined a strided approach (typically the size of the self-attention sequence length) where the sampled tokens would be cached up to a certain size and then the buffer would be freed. Have you considered implementing this? The actual released implementation perceiver-ar is relatively easy to follow.

noo not yet, i haven't implemented their special caching strategy at inference

but if i keep hearing more positive results, i may implement it! have to admit i was doubtful about the architecture initially

from perceiver-ar-pytorch.

usryokousha avatar usryokousha commented on June 10, 2024

I’m curious to see how well this would work at inference, particularly when using a vqvae / vqgan to encode images. If you could decode in only several steps that would really speed up generation. I suspect quality would suffer, but the paper’s results seem promising w.r.t. to the ImageNet results.

from perceiver-ar-pytorch.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.