Code Monkey home page Code Monkey logo

manga-reader's People

Contributors

pashpashpash avatar rishi23root avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

rishi23root

manga-reader's Issues

[Short term] Clean up the open source code to make it maximally convenient for new people to get set up and running it.

This is a more general task, but it's important to clean up the code. Right now it's a bit of a mess with python objects being created with unstandardized parameters (for example, the volume object, narration_script, movie_script object, etc. These should likely be classes instead of objects.

The goal here is for the source code to be cleanly organized into understandable, standardized objects/classes/functions so that even people with a limited understanding of code can feel like they can modify the behavior of the code to their liking. This will also help other developers get involved and understand what the code does.

Lastly, the README should be updated to include all of the setup steps, as right now certain things are excluded (i.e. needing to set up torch among other libraries prior to running)

image

[Long term] Animate key frames with SORA

Looking forward, the AI-generated videos SORA model looks promising. The pieces that we have built now can be used to turn MangaRecap into a full fledged animation studio. Input a manga and it creates an entire animated recap for you, with accurate characters and plots.

[Short term] Better system for splitting volumes into chapters

Right now, the script feeds every single page of a manga volume into GPT vision to identify chapter start pages in order to split up the volume into chapters later in the code. This happens as part of the "identifying important_pages" step, which is slow and expensive. There are better ways of doing this including but not limited to identifying the table of contents and mapping the table of contents chapter pages to the relevant PDF page indexes.

https://github.com/pashpashpash/manga-reader/blob/main/app.py#L39-L66

[Medium term] Better system for character identification.

Right now users have to create a "profile reference" PDF that has an example profile reference page for the manga. Then as part of the "identify important_pages" step, GPT vision is used to identify a character profile page within the volume. This is far from ideal, as it slows down the time it takes for a user to run the script both in terms of extra setup and slow GPT vision processing time. This is far from ideal.

Perhaps this can help:
https://github.com/ragavsachdeva/magi

I'm happy for other ideas on how to improve this.

[Short term] Implement retries and waiting for concurrent API calls in the case of throttling.

I have had some people tell me that the GPT concurrent calls I am making right now are being throttled -- most likely because my Openai organization has higher concurrency limits compared to new accounts. Retries should be implemented as well.

https://github.com/pashpashpash/manga-reader/blob/main/app.py#L47-L66

https://github.com/pashpashpash/manga-reader/blob/main/app.py#L173-L185

From someone who attempted to run the code:

openai.RateLimitError: Error code: 429 - {'error': {'message': 'Request too large for gpt-4-vision-preview in organization org-xxxxxxxxxx on tokens per min (TPM): Limit 10000, Requested 25012. The input or output tokens must be reduced in order to run successfully. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.