Code Monkey home page Code Monkey logo

sanechain's Introduction

Sane Chain

An attempt to make langchainjs easier to work with

WIP - nothing works yet, just saving the name Some things work, just um - not tested, no warranties ๐Ÿฅ‡

Adds the following loaders:

  1. Utility Classes
    1. DocumentLoader
  2. Loaders
    1. ChatGPT Loader
    2. Simpler GithubRepoLoader
    3. Roadmap

Utility Classes

DocumentLoader

This class essentially packages up all of langchainjs (plus sanechain) and creates a class: DocumentLoader that can basically load up all your documents regardless of type.

Example:

const filesAndDirectories = [
  'path/to/somefile.md',
  'path/to/somefile.pdf',
  'path/to/somefile.text',
  'path/to/somefile.html',
  'path/to/somedirectory',
  'https://github.com/some/repo',
  'https://github.com/some/other_repo',
  'path/to/chatgpt.json'
]

const documentLoader = new DocumentLoader(filesAndDirectories)
const documents = documentLoader.loadDocuments()
const splitDocuments = documentLoader.splitDocuments()
// Might take time, probably gonna implement a queue system to speed things up, already using async though.
// also @todo add full parity with all langchain python loaders.

Loaders

ChatGPT Loader

import { ChatGPTLoader } from './chat_gpt_loader.js';

const loader = new ChatGPTLoader('path/to/chat/log.json', 10);
const documents = await loader.load();

Simpler GithubRepoLoader

Insert github link, get repo documents.

  import {GithubRepoLoader} from 'sanechain'
  const loader = new GithubRepoLoader("https://github.com/owner/repo", { /*params*/ });
  const documents = await loader.load();

Roadmap

  • Models
    • General
    • Chat
    • Embeddings
  • Prompts
    • General Templates
    • Chat Template
    • Example Selectors
    • Output Parsers
  • Indexes (Primary focus at first)
    • Document Loaders %%
      • Airbyte JSON
      • Apify Dataset
      • Arxiv
      • AWS S3
      • AZLyrics
      • Azure Blob Storage
      • Bilibili
      • Blackboard
      • Blockchain
      • ChatGPT Data
      • Confluence
      • CoNLL-U
      • Copy / Paste
      • CSV (langchainjs)
      • Diffbot
      • Discord
      • DuckDB
      • Email
      • EPub (langchainjs)
      • EverNote
      • Facebook Chat
      • Figma
      • File Directory (langchainjs)
      • Git (langchainjs + custom url loader)
      • GitBook
      • Google BigQuery
      • Google Cloud Storage
      • Google Drive
      • Gutenberg
      • Hacker News
      • HTML
      • HuggingFace dataset
      • iFixit
      • Images
      • Image captions
      • IMDB
      • JSON Files (langchain)
      • Jupyter Notebook
      • Markdown (sorta, just parses using TextLoader)
      • MediaWikiDump
      • Microsoft OneDrive
      • Microsoft PowerPoint
      • Microsoft Word (langchainjs)
      • Modern Treasury
      • Notion DB 1/2
      • Notion DB 2/2
      • Obsidian
      • Pandas DataFrame
      • PDF (langchain)
      • Using PyPDFium2
      • ReadTheDocs Documentation
      • Reddit
      • Roam
      • Sitemap
      • Slack
      • Spreedly
      • Stripe
      • Subtitle (langchain)
      • Telegram
      • TOML
      • Twitter
      • Unstructured File (half way)
      • URL (langchainjs via puppetter, playwright, cheerio, etc)
      • Selenium URL Loader
      • Playwright URL Loader (langchainjs)
      • WebBaseLoader
      • WhatsApp Chat
      • Wikipedia
      • YouTube transcripts [ Text Splitters ]
      • Character Text Splitter
      • HuggingFace Length Function
      • Latext Text SPlitter
      • Markdown Text Splitter
      • NLTK Text Splitter
      • RecursiveCharacterTextSplitter
      • Spacy Text Splitter
      • tiktoken (OpenAI) Length Function
      • TiktokenTextSplitter
    • Vector stores
    • Retrievers
  • Memory (TBD)
  • Chains (TBD)
  • Agents
    • Tools (TBD)
    • Agents (TBD)
    • Toolkits (TBD)
    • AgentExecutors (TBD)

sanechain's People

Contributors

patrickcurl avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.