Code Monkey home page Code Monkey logo

swyxkit's Introduction

SwyxKit

A lightly opinionated starter for SvelteKit blogs:

Feel free to rip out these opinions as you see fit of course.

"Does anyone know what theme that blog is using? It looks really nice." - anon

Live Demo

See https://swyxkit.netlify.app/ (see Deploy Logs)

screenshot of swyxkit in action

screenshot of swyxkit in action

Users in the wild

Key Features and Design Considerations

Features

All the basic things I think a developer website should have.

Performance/Security touches

Fast (check the lighthouse scores) and secure.

Minor design/UX touches

The devil is in the details.

Developer Experience

Making this easier to maintain and focus on writing not coding.

Overall, this is a partial implementation of https://www.swyx.io/the-surprisingly-high-table-stakes-of-modern-blogs/

Setup

Step 0: Clone project (and deploy)

npx degit https://github.com/sw-yx/swyxkit
export GH_TOKEN=your_gh_token_here # Can be skipped if just trying out this repo casually
npm install
npm run start # Launches site locally at http://localhost:5173/ 
# you can also npm run dev to spin up histoire at http://localhost:6006/

You should be able to deploy this project straight to Netlify as is, just like this project is. This project recently switched to use sveltejs/adapter-auto, so you should also be able to deploy to Vercel and Cloudflare, but these 2 deploy targets are not regularly tested (please report/help fix issues if you find them)!

However, to have new posts show up, you will need to personalize the siteConfig (see next step) - take note of APPROVED_POSTERS_GH_USERNAME in particular (this is an allowlist of people who can post to the blog by opening a GitHub issue, otherwise any rando can blog and that's not good).

Deploy to Netlify

# These are just untested, suggested commands, use your discretion to hook it up or deploy wherever
git init
git add .
git commit -m "initial commit"
gh repo create # Make a new public GitHub repo and name it whatever
git push origin master
ntl init # Use the Netlify cli to deploy, assuming you already installed it and logged in. You can also use `ntl deploy`

Step 1: Personalization Configuration

As you become ready to seriously adopt this, remember to configure /lib/siteConfig.js - just some hardcoded vars I want you to remember to configure.

export const SITE_URL = 'https://swyxkit.netlify.app';
export const APPROVED_POSTERS_GH_USERNAME = ['sw-yx']; // IMPORTANT: change this to at least your GitHub username, or add others if you want
export const GH_USER_REPO = 'sw-yx/swyxkit'; // Used for pulling GitHub issues and offering comments
export const REPO_URL = 'https://github.com/' + GH_USER_REPO;
export const SITE_TITLE = 'SwyxKit';
export const SITE_DESCRIPTION = "swyx's default SvelteKit + Tailwind starter";
export const DEFAULT_OG_IMAGE =
	'https://user-images.githubusercontent.com/6764957/147861359-3ad9438f-41d1-47c8-aa05-95c7d18497f0.png';
export const MY_TWITTER_HANDLE = 'swyx';
export const MY_YOUTUBE = 'https://youtube.com/swyxTV';
export const POST_CATEGORIES = ['Blog']; // Other categories you can consider adding: Talks, Tutorials, Snippets, Podcasts, Notes...
export const GH_PUBLISHED_TAGS = ['Published']; // List of allowed issue labels, only the issues having at least one of these labels will show on the blog.

Of course, you should then go page by page (there aren't that many) and customize some of the other hardcoded items, for example:

  • Add the Utterances GitHub app to your repo/account to let visitors comment nicely if logged in.
  • The src/Newsletter.svelte component needs to be wired up to a newsletter service (I like Buttondown and TinyLetter). Or you can remove it of course.
  • Page Cache-Control policy and SvelteKit maxage
  • Site favicons (use https://realfavicongenerator.net/ to make all the variants and stick it in /static)
  • (If migrating content from previous blog) setup Netlify redirects at /static/_redirects

This blog uses GitHub as a CMS - if you are doing any serious development at all, you should give the GH_TOKEN env variable to raise rate limit from 60 to 5000.

Step 2: Make your first post

Open a new GitHub issue on your new repo, write some title and markdown in the body, add a Published tag (or any one of the label set in GH_PUBLISHED_TAGS), and then save.

You should see it refetched in local dev or in the deployed site pretty quickly. You can configure SvelteKit to build each blog page up front, or on demand. Up to you to trade off speed and flexibility.

Here's a full reference of the YAML frontmatter that swyxkit recognizes - ALL of this is optional and some of have aliases you can discover in /src/lib/content.js. Feel free to customize/simplify of course.

---
title: my great title
subtitle: my great subtitle
description: my great description
slug: my-title
tags: [foo, bar, baz]
category: blog
image: https://my_image_url.com/img-4.png
date: 2023-04-22
canonical: https://official-site.com/my-title
---

my great intro

## my subtitle

lorem ipsum 

If your Published post (any post with one of the labels set in GH_PUBLISHED_TAGS) doesn't show up, you may have forgotten to set APPROVED_POSTERS_GH_USERNAME to your GitHub username in siteConfig.

If all of this is annoying feel free to rip out the GitHub Issues CMS wiring and do your own content pipeline, I'm not your boss. MDSveX is already set up in this repo if you prefer not having a disconnected content toolchain from your codebase (which is fine, I just like having it in a different place for a better editing experience). See also my blogpost on the benefits of using GitHub Issues as CMS.

Optimizations to try after you are done deploying

  • Customize your JSON+LD for FAQ pages, organization, or products. There is a schema for blogposts, but it is so dead simple that SwyxKit does not include it.
  • Have a process to submit your sitemap to Google? (or configure robots.txt and hope it works)
  • Testing: make sure you have run npx playwright install and then you can run npm run test

Further Reading

Acknowledgements

Todos

  • Implement ETag header for GitHub API
  • Store results in Netlify build cache
  • Separate hydration path for mobile nav (so that we could hydrate=false some pages)
  • Custom components in MDX, and rehype plugins
  • (maybe) Dynamic RSS in SvelteKit:
    • SvelteKit Endpoints don't take over from SvelteKit dynamic param routes ([slug].svelte has precedence over rss.xml.js)
      • Aug 2022: now solved due to PlusKit
    • RSS Endpoint runs locally but doesn't run in Netlify because there's no access to the content in prod (SvelteKit issue)

swyxkit's People

Contributors

swyxio avatar trangml avatar ak4zh avatar dependabot[bot] avatar mattcroat avatar coderj10n avatar vitorarjol avatar taismassaro avatar rocketrene avatar paoloricciuti avatar mytakeon avatar martypenner avatar logan-anderson avatar leovoon avatar hdoro avatar georgeoffley avatar farosato avatar macbraughton avatar dmoosocool avatar

swyxkit's Issues

Understanding MuZero and why it hasn't been used

MuZero was one of the most impressive RL breakthroughs ever, solving a multitude of hard domain tasks without prior knowledge. But still the main RL algorithms I see used in practice are PPO variants. Why is this?

Welcome


title: Welcome
subtitle: Welcome to my blog, here's a bit about me
slug: welcome
tags: []
category: blog

date: 2023-09-25
canonical: https://trangml.com

Welcome to my blog! My name is Matthew and I am an Autonomous Systems Engineer at MIT Lincoln Laboratory.

I've previously been at Shield AI, Virginia Tech, Collins Aerospace, Moog, and George Mason University.

DevLog Chess Challenge Bot - 1


title: DevLog Chess Challenge Bot - 1
subtitle: Documenting my work on a token constrained AI Chess-bot
slug: devlog-1-chess-challenge
tags: [ML, AI]
category: blog

date: 2023-08-31
canonical: https://trangml.com

I've been working on a bot to compete in Sebastian Lague's (SebLague) Chess Coding Challenge https://github.com/SebLague/Chess-Challenge.

Problem

In this challenge, the goal is to build a C# Chess bot to compete against other submitted chess bots.

However, the is a very tight restriction in place in terms of the number of tokens that the code of the Chess Bot can have, and additionally, a very limited number of namespaces are allowed.

Specifically:

  • Only the following namespaces are allowed:
    • ChessChallenge.API
    • System
    • System.Numerics
    • System.Collections.Generic
    • System.Linq
  • You may not store data inside the name of a variable/function/class etc (to be extracted with nameof(), GetType().ToString(), Environment.StackTrace and so on)
  • If your bot makes an illegal move or runs out of time, it will lose the game.
    • Games are played with 1 minute per side by default (this can be changed in the settings class). The final tournament time control is TBD, so your bot should not assume a particular time control, and instead respect the amount of time left on the timer (given in the Think function).
  • Your bot may not use more than 256mb of memory for creating look-up tables (such as a transposition table).
  • There is a size limit on the code you create called the bot brain capacity. This is measured in β€˜tokens’ and may not exceed 1024. The number of tokens you have used so far is displayed on the bottom of the screen when running the program.
    • All names (variables, functions, etc.) are counted as a single token, regardless of length. This means that both lines of code: bool a = true; and bool myObscenelyLongVariableName = true; count the same. Additionally, the following things do not count towards the limit: white space, new lines, comments, access modifiers, commas, and semicolons.

Clearly, this makes a traditional ML or RL approach pretty hard, but I still wanted to try.

Approach

I decided to use Alpha-Beta pruning to search the action space, and a machine learning based function for calculating the value of each state.

Development

Of course, it wasn't that simple. The Alpha-Beta pruning algorithm itself was straightforward to implement, however, by the time I had gotten that written, I only had about half my tokens remaining. Once I wrote the code which took the board and created a single vector from the FEN representation, I was at around 750 tokens. That meant that I didn't have enough space to store the weights for even a small network.

Workaround

Instead of using the weights of an entire network, I decided to try doing machine learning by updating only the biases of the network, an approach inspired by a paper called Machine Learning in under 256kb.

Code

The code for my agent can be found here https://github.com/trangml/Chess-Challenge.

Finally Making a Blog


title: Finally Making a Blog
subtitle: I should have done this a long time ago
description: After spending two years thinking about doing it and multiple attempts to program one, I've finally made an actual blog. Here I'm going to write about all stuff TrangML.
slug: finally-making-a-blog
tags: [Published]
category: blog

date: 2023-08-31
canonical: https://trangml.com

I've tried building my own blog via Javascript with automatic reading from Markdown files. While a great learning experience, I realized that I despise web dev, and that making a blog look nice and be functional with my barely serviceable JavaScript knowledge was not the most optimal use of my time. So, I've decided to scrap that, and instead use this fantastic template from @swxy. I really like this template because it used Github Issues as the hosting platform for all of the blog posts, and I can easily incorporate code and all the nice labels and tags that Github has.

What to expect

From now on, I'm going to attempt to overcome my laziness by writing about the things that I find interesting, and also writing brief DevLogs about some of the projects I work on outside of work. Expect to see writings on AI, Robotics, Machine Learning, Programming, Ethics, Math, Computer Hardware, and whatever else I find interesting.

Now that I have a functional blog, I no longer have any excuses for putting off writing about things.

End to end Learned Robotics vs Traditional methods


title:
subtitle:
description:
slug:
tags: []
category: blog

date: 2023-08-31
canonical: https://blog.trangml.com

Recently, I read the following tweet discussion between two leaders in the robotics space, discussing the issues with the robotics field.

https://twitter.com/chris_j_paxton/status/1753460957458632823?t=ta4aOQRg3pmdNqDy7GkQ9g&s=19

My experience working with the Boston dynamics spot has given me some thoughts on both sides of the problem.

The manipulation and grasping achieved using largely traditional techniques are incredibly impressive. The Spot is able to grab many different objects with impressive generalization capabilities.

But it's still clear that there's a long way to go before it can really be useful. The grasping still often fails, by either missing the object or grabbing poor grasp points. The UI for grabbing is still clunky, and autograsp often isn't available for an object. The API method of picking a pixel location in a camera to grasp is effective, but leads to more difficult pipeline engineering for the developer. There's no conceptual understanding of items backing the grasps, for example it doesn't know to grab a brush by the handle.

There's simply far too many limitations that prevent a traditional control based system from being able to work.

Algorithmic patches are the main thing I've worked on in my robotics time, and I agree with Dr. Fan that it's unsustainable. For example, the amount of hacking to get a simple computer vision object detection pipeline to consistently recognize an object is quite large. My code for this involves multiple checks. Others, such as the 3rd place team in the Habitat OVMM Mobile Manipulation challenge rely on using two models at once to do the same thing, which on a compute limited device like a robot is usually far from ideal.

AI Ethics: Is an AI a nuke? Should we all have an AI?

Various experts have vastly contrastive opinions about AI and the widespread adoption of it.

George Hotzman says we should all have AI.

Eliezer Yudkowsky warns heavily against the proliferation of AI to the masses.

Who's right here?

Is AI really the equivalent of a nuke? Is there value to the argument that everyone having one would lead to less bad actors? Or will it be a chain reaction.

I can see both sides having value. Here are my thoughts on how we could implement both forms.

Why I became an engineer


title: Why I became an engineer
subtitle: A story of my personal motivation
description: I always struggle to put to words my motivation for becoming an engineer. I hope that this blog post will help me consolidate my thoughts, and also inspire me to move forward.
slug: why-i-became-an-engineer
tags: [misc]
category: blog

date: 2024-01-17
canonical: https://blog.trangml.com

Why I became an engineer

I've always been a lover of building things. My earliest toys were Legos, and as a kid I would spend days building. I never had any of the fancy Lego sets, just the basic 4x2s, 2x2s, and some plate pieces, so I really had to use my imagination to make my creations mean something. My father is an engineer, so he really encouraged this behavior and influenced me to work on things like that.

As I grew older, Lego playtime turned into a more structured form of building - I joined my middle school robotics team. I enjoyed working together with my friends to come up with some vaguely robotic, but again, actual structure and function was lacking. Still, I enjoyed it as a fun hobby.

I inevitably grew older still, and gained more experience in high school, and learned the actual methodologies of engineering. To begin high school, I wasn't really sure what I actually wanted to do. My dad's engineering background obviously had some pull, but my sister's were all biologists and medical professionals of some kind, so I thought about becoming a doctor. I considered a future pursuing my other interest, marine animals, and had spent a summer labeling dolphin bones at the Smithsonian, but understood the financial implications of pursuing that interest.

I wondered these generally, but felt like I had plenty of time to decide. But then something happened which changed that thought. We heard that my aunt had gotten into a car crash. She was left paralyzed from the neck down.

When I saw my aunt again, she was in a hospital-style bed, in the living room of her house, unable to move, but still able to talk. She was in the living room so that the entire family would be able to help her throughout the day and night, and so that she could use the smart device in the living room to open the blinds or control the TV. Despite everything that'd happened to her, I remember her smiling and being happy to see me.

It's difficult to describe how I felt in those first moments I spent sitting besides her. Sorrow, mournfulness, anger, all come short. But the main thing I felt was helplessness. I despised the fact that I couldn't do anything for her. In my frustration, I resolved myself to become someone who would change people's lives for the better.

I realized that the best thing I could do, is use my strengths in robotics and engineering to achieve that, and my academic and career path, projects, and passions have been guided by that ever since.

I've explored developing medical and assistive devices. I worked on a smart knee sleeve that is able to monitor a patient for blood clots and facilitate proper recovery. I've worked with a fellow Virginia Tech student who is wheelchair-bound, and has difficulty using a computer, to develop a custom computer mouse.

But I realized, specific hardware solutions like these won't have as widespread of an impact as something else I got exposed to, machine learning and artificial intelligence.

I got an internship at Heron Systems, where I worked on reinforcement learning AI fighter jets, which were able to demonstrate super-human performance. I realized after that experience how impactful RL, and more generally AI would be. So, I decided to focus entirely on machine learning.

I continued to work on machine learning projects and problems, developing solutions to DARPA hard challenges, and building up knowledge for me to apply what I learn to my goals in the future. I researched reinforcement learning for my masters thesis, and developed algorithms to improve the continual learning process.

I now work as an engineer at MIT Lincoln Laboratory, where I directly apply my RL background to developing more advanced autonomy capabilities for a Boston Dynamics Spot robot.

And that's where I see my story continue to develop. One of the things that I remember, is how difficult it was for my cousins to take care of my aunt, and all the other things around the house, and importantly themselves. The Alexa my aunt used to turn on the TV, was important not just because she could watch her shows, but because she could do something and see it happen, even just verbalizing her need. And it automating things for my cousins was just a small thing that they were able to get off their plate, but it was off their plate.

I've realized a niche for what I, as an engineer, am capable of, and that's delivering something which can help take back time from the daily drudgeries and win back just a few moments. Because a moment matters. I want to develop advanced robotics systems that are able to accomplish everyday tasks. I know that these systems would have helped my aunt, and beyond that, the potential applications of a generally capable robot are endless. Any task which is boring, dangerous, or otherwise undesirable can be automated by a robot. The key is to do so in equitable ways.

My goals for the future are to work on AI applications to robotics. I see a few key ways to do so. Achieving general purpose non-embodied AI is a key step. OpenAI and other industry leaders are doing a great job propelling that forward. However, to move to the robotics domain, embodiment data will be necessary. OpenEmbodiment X and Ego-Exo are key steps to achieve that. Google DeepMind's Robot-Transformer papers have shown that you can train a general purpose robot foundational model from a mixture of robot data and open Internet data. Other papers have explored using specifically human data to transfer to robotics. I think that step is the key. Being able to efficiently transfer across domain and embodiment will vastly accelerate the costly process of robots learning from teleop trajectories, and could lead to robots learning from viewing a single 1 shot example of a human performing a task, or verbally describing a task, or even just zero-shot performance. Work in incorporating LLMs to robot decision making has been very effective. I think there's a lot of exciting work to continue to be done in that field. I hope to tackle these problems.

Background for a Robotics/AI engineer


title:
subtitle:
description:
slug:
tags: []
category: blog

date: 2023-08-31
canonical: https://blog.trangml.com

Questions:

  • What is the Jacobian? What is the Hessian? In machine learning, what are they useful for?

    • The Jacobian is a matrix of the first-order partial derivatives of a vector-valued function. In robotics, the Jacobian is crucial for understanding the relationship between the velocity of robot joints and the velocity of the end effector. In machine learning, it's used for sensitivity analysis and to understand how changes in inputs affect outputs.
    • Hessian: The Hessian matrix is a square matrix of second-order partial derivatives of a scalar-valued function. It's used in optimization problems to find local maxima and minima, which is essential in training machine learning models, especially for understanding the curvature of the loss landscape.
  • What is matrix decomposition and what are some algorithms that do it?

    • Matrix decomposition involves breaking down a matrix into a product of matrices to simplify certain matrix computations and analysis. Common algorithms include:
    • LU Decomposition: Factorizes a matrix into Lower and Upper triangular matrices.
      
    • QR Decomposition: Decomposes a matrix into an orthogonal matrix (Q) and an upper triangular matrix (R).
      
    • Eigen Decomposition: Factorizes a matrix into its eigenvectors and eigenvalues.
      
    • Singular Value Decomposition (SVD): Breaks down a matrix into singular vectors and singular values.
      
    • These decompositions are used in solving linear equations, optimization problems, and principal component analysis (PCA), among others, in robotics and machine learning.
  • What is a transformer model? Can I implement it from scratch?

    • A Transformer model is a deep learning model that uses self-attention mechanisms to process sequential data, such as text or time-series data, without relying on recurrence.
    • Transformers are quadratic in sequence width, which is why they are slower than RNNs
  • What is a generative model?

    • A generative model learns the relationship between input and output distributions, and can generate new data based on what it's trained on. Generative Adversarial Networks, Variational Autoencoders, etc.
  • How do you solve the vanishing gradients in LSTM?

    • LSTMs introduce a series of games (input, forget, and output), as well as internal cell states. The gates allow LSTMs to selectively remember or forget info.
  • Know the fundamentals of RNN, LSTM, Transformer, GNN, GAN, CNN

    • Recurrent Neural Networks (RNN): Designed to handle sequential data by maintaining a hidden state that captures information from previous inputs.
    • Long Short-Term Memory (LSTM): An advanced RNN that uses gating mechanisms to control the flow of information, addressing the vanishing gradient problem.
    • Transformer: Uses self-attention mechanisms to weigh the significance of different parts of the input data differently, excelling in handling sequential data without the sequential computation of RNNs.
    • Graph Neural Networks (GNN): Processes graph-structured data, updating node representations based on their neighbors, ideal for social networks, molecule structures, etc.
    • Generative Adversarial Networks (GAN): Consists of a generator and discriminator competing against each other, useful for generating realistic images, videos, etc.
    • Convolutional Neural Networks (CNN): Specialized for processing structured grid data like images, using convolutional layers to capture spatial hierarchies.
  • What is a contrastive representation learning algorithm?

    • Contrastive representation learning algorithms learn to encode similar data points closer together and dissimilar ones farther apart in the representation space. This is achieved by using contrastive loss functions, which compare pairs or groups of examples to learn discriminative features. These algorithms are essential for unsupervised learning, semi-supervised learning, and self-supervised learning tasks, improving performance in downstream tasks like classification without requiring labeled data.
  • How do classical robot navigation and path planning algorithms work?

    • Classical robot navigation involves making a grid or graph based map, then using algorithms like A*, Dijkstra's Algorithm, or Rapidly-exploring Random Trees (RRT) to find a path in the grid.
  • How do classical robot grippers work?

    • Model Predictive Control algorithms. Inverse Kinematics.
    • Inverse kinematics is the process of calcuating the joint parameters needed to place the end effector of a robot at a desired position and orientation in space.
    • Model Predictive control uses a model of the robots dynamics to predict its future state and optimize control inputs to achieve desired outcomes. It's good because it's adaptive and will constantly change to catch up with the challenge.
  • What are some normalization algorithms and what do they do?

    • Normalization algorithms in machine learning and deep learning include Batch Normalization, Layer Normalization, Instance Normalization, and Group Normalization. These techniques adjust and scale the inputs or the activations of a network to stabilize and accelerate the training process, reduce the sensitivity to initialization, and help combat the vanishing/exploding gradient problems.
  • What is a hard problem you've had to solve?

  • What are some challenges you've overcome?

  • What is something you learned over the last year?

  • What is Bayes rule?

    • Bayes' rule is a fundamental theorem in probability theory and statistics that describes the probability of an event, based on prior knowledge of conditions that might be related to the event. In robotics and machine learning, Bayes' rule is used for Bayesian inference, filtering, and decision making under uncertainty.

Takeaways:

  • review linear algebra
  • review classical robotics
  • review different machine learning models and learning frameworks

Ideas on how I can improve myself

  • take a crash course|brilliant course on linear algebra and statistics
  • read a fundamental paper each week
  • implement transformer from scratch
  • create educational content

Transforming Handwritten Notes in OneNote into A Second Brain


title: Transforming Handwritten Notes in OneNote into A Second Brain
subtitle: OCR Technology and Future Possibilities with LLM-Based RAG Databases
slug: transforming_notes
tags: [AI, ML]
category: blog

date: 2023-10-21
canonical: https://trangml.com

In the age of digital note-taking, converting handwritten notes into editable text is a transformative capability, especially for academics and students. OneNote, a popular digital note-taking application, has gained widespread use for its intuitive interface and seamless integration with Microsoft Office Suite, and often comes included for students. It was for these reasons I personally used OneNote to take notes during college. The most valuable feature of OneNote to me, was the ability to upload PDFs of lecture slides and directly annotate on top of those. Now, having graduated, I'd like to store these notes in a more permanent and accessible manner. However, to do so, the challenge lies in efficiently converting these handwritten notes into digital text. In this blog post, we'll explore the journey from collecting notes in OneNote to leveraging cutting-edge OCR technology and integrating advanced language models (LLMs) to enhance the usability of these converted texts.

1. Collecting Handwritten Notes from OneNote:

OneNote offers a unique platform for collecting handwritten notes. Users can create digital notebooks, jot down notes using digital pens or touch devices, and even annotate PDFs within the application. However, the need often arises to convert these handwritten notes into editable text for easier organization, search, and sharing.

OneNote itself has OCR technology to convert handwritten text, but the performance is limited, and it's ability to understand academic text such as complex equations is not sufficient.

Thus, a better approach is to download the notebooks as PDFs, then use an academic focused OCR tech, such as Nougat to convert the handwriting and images.

Aside

The actual process of downloading the notebooks had more issues than it should have. My first attempt to download the images via Microsoft Graph API failed due to the API looking for a OneNote Business account when called through Python, despite working correctly in the Graph Explorer online.

In order to work around this, I had to install OneNote 2019 on Windows, which has the ability to download entire notebooks or sections as PDFs. This function isn't available on any other form of OneNote.

2. Researching OCR Technology:

Optical Character Recognition (OCR) technology has come a long way in recent years, owing to advancements in machine learning and computer vision. OCR algorithms work by recognizing text characters within images, making them indispensable for converting handwritten notes into digital text. Researchers have been constantly innovating in this field, improving accuracy, speed, and language support.

3. Choosing the Best OCR Technology for Academic Documents:

Selecting the right OCR technology for academic documents and handwritten notes involves considering several factors:

  • Accuracy: The OCR tool must accurately recognize handwritten characters, ensuring minimal errors in the converted text.
  • Language Support: Ensure the OCR solution supports multiple languages and various handwriting styles commonly found in academic notes.
  • Formatting Preservation: Maintaining the original formatting, such as bullet points and diagrams, is crucial for the context of academic notes.

A few models were considered more deeply:

  1. Nougat OCR:
  • Pros:
    • Accuracy: Nougat OCR is known for its high accuracy in recognizing text from various sources, including images and documents.
    • Language Support: It offers robust support for multiple languages, making it versatile for a global user base.
    • Preprocessing Capabilities: Nougat OCR includes preprocessing techniques like noise reduction and image enhancement, improving accuracy.
  • Cons:
    -Complexity: Setting up and configuring Nougat OCR might require more technical expertise compared to user-friendly alternatives.
    • Limited Community Support: The user community and available resources might be smaller compared to more established OCR engines.
  • Technology:
    • Nougat OCR employs deep learning techniques, particularly convolutional neural networks (CNNs), for character recognition. These neural networks are trained on extensive datasets to recognize patterns in handwritten and printed text.
  1. Tesseract OCR:
  • Pros:
    • Open Source: Tesseract OCR is open-source, making it accessible to developers and allowing for community-driven improvements.
    • Versatility: It can recognize a wide range of languages and various font styles, supporting both printed and handwritten text.
      -Continuous Improvement: The Tesseract community actively maintains and enhances the engine, ensuring continuous development.
  • Cons:
    -Accuracy Challenges: While Tesseract has improved significantly, its accuracy might lag behind specialized commercial solutions, especially for complex documents or handwritten text.
    • Limited Preprocessing: Tesseract may require additional preprocessing steps to handle noisy or distorted images effectively.
  • Technology:
    • Tesseract OCR uses a combination of traditional computer vision techniques and deep learning models, including LSTM (Long Short-Term Memory) networks for text recognition.
  1. JaidedAI EasyOCR:
  • Pros:
    • User-Friendly: EasyOCR is designed for ease of use, making it accessible to developers without extensive machine learning expertise.
    • Multi-language Support: It supports multiple languages and scripts, catering to a diverse user base.
    • Flexible Output Formats: EasyOCR can output recognized text in various formats, accommodating different application needs.
  • Cons:
    • Accuracy: While suitable for many applications, EasyOCR might face challenges with highly stylized fonts or intricate handwritten text.
    • Limited Customization: It might lack advanced customization options for users with specific OCR requirements.
  • Technology:
    • EasyOCR employs deep learning models, primarily based on CNNs and recurrent neural networks (RNNs), for text recognition tasks. The models are trained on extensive datasets to recognize text patterns effectively.

5. Using Nougat OCR to Convert Handwritten Notes:

The choice was made to use Nougat as the main OCR technology.
image

4. Future Prospects: Leveraging LLM-Based RAG Databases:

The future of digitized handwritten notes lies in the integration of advanced Language Models (LLMs) and technologies like Redundant Array of Independent Databases (RAG). RAG databases enhance traditional databases by allowing for complex and context-aware queries, making them ideal for handling unstructured data like handwritten notes.
Benefits of LLM-Based RAG Databases:
-Contextual Understanding: LLMs comprehend the context of the notes, enabling more nuanced and accurate searches.
-Semantic Queries: Users can employ natural language queries to find specific information within their notes.
-Knowledge Mining: LLMs can identify patterns and extract insights from vast amounts of textual data, aiding in academic research.

5. Conclusion:

Converting handwritten notes from OneNote into editable text has evolved significantly, thanks to OCR technology. As we look ahead, integrating advanced Language Models and database technologies like RAG databases will revolutionize how we interact with our digitized notes. The synergy between OCR, LLMs, and databases opens doors to a future where our handwritten thoughts seamlessly integrate with the digital world, making academic pursuits and knowledge management more efficient and insightful than ever before.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.