dust-tt / dust Goto Github PK

View Code? Open in Web Editor NEW

886.0 886.0 97.0 212.65 MB

Amplify your team's potential with customizable and secure AI assistants.

Home Page: https://dust.tt

License: MIT License

Rust 14.59% Shell 0.33% JavaScript 0.53% CSS 0.02% PEG.js 0.01% TypeScript 84.39% Dockerfile 0.05% MDX 0.08%

large-language-models rust

dust's People

Contributors

Stargazers

Watchers

Forkers

invisible-hand mauriyouth aickin codeaudit voltek62 dillchen pbnewron garinmckayl mbrukman robertalanm thepok shreyasg101 danielbarankin alexisrobert feynmanliang justinliang1020 ljoukov zurawiki wujiang wooters oytuntez rmbrntt datakumquat goulartnogueira vidalmaxime andrewdkennedy1 fontanierh lasryaric gmpetrov baronrustamov mailmahee yashgoenka ninehills amorist lyhiving escwxyz rachelspark cdancette tonyxia2016 clayoneil sam-cohan goldzulu popdaph wierwu wolfenchill shayanjm tarunchy bhimprakashsingh dsvolk uzay-g admariner kyrolabs miken666 brittlewis12 thirteenli kyutarou univercitysolutions prestomac rand oceanumeric botogoske rush59 yutcam workflowai sal-ix afrog33k touristshaun william-dandrea weigriffiths chrisemoulton ds31 visioninhope niganuga milstan chuyuanlinzi brunoscaglione jvpoucke mvandermeulen gabhubert thomasbrach mohsinss xutuke arthurbnhm julesbelveze josephrp kumpera bm777 alessandro-gi tomaas theclash8

dust's Issues

New block empty title not orange

Dust panics and the UI hangs when the few shot count is larger than the number of few shot pieces of data

I made a Dust app with the new search block which pulls news headlines from Google and feeds them in as few shot examples to GPT3. When I used a topic that had no stories, though, the UI hung. Looking at Dust API's console, it looked like it hit an index out of bounds error and panicked, and the UI is left spinning. IMO, Dust should probably just read up to few_shot_count and not panic if there aren't enough elements in the array.

Simpler Dust app endpoint: final block output only

Offer a simplified endpoint for deployed Dust apps that only returns the completion of the final block.

Add check for last update time before updating an app

If an app is opened from multiple tabs, editing one tab then another might lead to overwriting the state and loosing data.

Apps should refresh on tab change but it's still possible to race against that.

We should check that the update time on the server is not more recent than the last update time known from the client

Add community example apps to the user apps page + landing

When block_executions are repeated UNIQUE KEY error

Same goes for datasets if they contain twice the same point.

Add "Create dataset" option to the dataset selection dropdown menu

Caching for networked blocks (curl, search)

getting errors when google result includes .pdf files

example:

BrowserlessAPIError: Error with HTTP status 400 and body "net::ERR_ABORTED at https://www.juniper.net/content/dam/www/assets/factsheet/us/en/corporate-social-responsibility-report-2021.pdf"

Mention Browserless 5mn delay in Provider setup

+ block in between blocks

Really annoying that blocks are added at the end and that the + block button is only at the top or bottom. Need to be able to add a block "in place"

curl support for `http`

Move "New block" and "Run" CTAs to the bottom

As you build your app and start stacking blocks, having to scroll back up each time you want to add a new block or run the app doesn't feel like the best experience.

Another alternative could be to keep the top sticky (might make sense to navigate to datasets too?). Happy to try a PR if you think it makes sense.

Simple INPUT i can directly change in the UI for experiments

App cloning

dust-api: https://github.com/dust-tt/dust/pull/44/files

When a protocol is passed to the Curl block URL automatically update protocol and clean it

copying an URL directly with protocol in the curl block should work seamlessly even if the protocol is present

Error when running `docker compose up` on main branch.

Forked repo.
Ran docker compose up which seemed to work # Note, I did not yet init the database.
Ctrl-C, gracefully shut down.
Ran docker compose up again

> docker compose up
[+] Running 2/0
 ⠿ Container dust-api    Created                                                                  0.0s
 ⠿ Container dust-front  Created                                                                  0.0s
Attaching to dust-api, dust-front
dust-front  |
dust-front  | > dev
dust-front  | > next dev
dust-front  |
dust-front  | ready - started server on 0.0.0.0:3000, url: http://localhost:3000
dust-front  | event - compiled client and server successfully in 445 ms (209 modules)
dust-front  | wait  - compiling /_error (client and server)...
dust-front  | event - compiled client and server successfully in 175 ms (210 modules)
dust-front  | warn  - Fast Refresh had to perform a full reload. Read more: https://nextjs.org/docs/basic-features/fast-refresh#how-it-works
dust-front  | wait  - compiling / (client and server)...
dust-front  | event - compiled client and server successfully in 507 ms (897 modules)
dust-front  | [next-auth][warn][EXPERIMENTAL_API]
dust-front  | `unstable_getServerSession` is experimental and may be removed or changed in the future, as the name suggested.
dust-front  | https://next-auth.js.org/configuration/nextjs#unstable_getServerSession}
dust-front  | https://next-auth.js.org/warnings#EXPERIMENTAL_API
dust-front  | error - SerializableError: Error serializing `.ga_tracking_id` returned from `getServerSideProps` in "/".
dust-front  | Reason: `undefined` cannot be serialized as JSON. Please use `null` or omit this value.
dust-front  |     at isSerializable (/dust-front/node_modules/next/dist/lib/is-serializable-props.js:36:19)
dust-front  |     at /dust-front/node_modules/next/dist/lib/is-serializable-props.js:43:66
dust-front  |     at Array.every (<anonymous>)
dust-front  |     at isSerializable (/dust-front/node_modules/next/dist/lib/is-serializable-props.js:40:39)
dust-front  |     at Object.isSerializableProps (/dust-front/node_modules/next/dist/lib/is-serializable-props.js:63:12)
dust-front  |     at Object.renderToHTML (/dust-front/node_modules/next/dist/server/render.js:570:67)
dust-front  |     at async doRender (/dust-front/node_modules/next/dist/server/base-server.js:669:38)
dust-front  |     at async cacheEntry.responseCache.get.isManualRevalidate.isManualRevalidate (/dust-front/node_modules/next/dist/server/base-server.js:778:28)
dust-front  |     at async /dust-front/node_modules/next/dist/server/response-cache/index.js:80:36 {
dust-front  |   page: '/'
dust-front  | }
dust-api    | [i] 0 pending runs

...and i get a 404 when I try visiting http://localhost:3000/api/init per the instructions in the readme.

This is on my MBP M1. Am I doing something wrong?

Timeout errors from Browserless API interrupt runs

Error with HTTP status 400 and body "Navigation timeout of 30000 ms exceeded"

Output Transform Code

Add ability to define transformation code to run on the output of a block to avoid having too many glue blocks

Same block type naming index fails

Create dataset does not alert if unsaved

Should build the same logic as Dataset edition.

Duplicate key error when repeated outputs from LLM are cached concurrently

Add support for a while loop

Need a way to express "run this blocks until they succeed or budget is used up"

Implement a browserless.io block type for web scraping

Stan's webGPT demo is super cool, but one thing that may make it hard for someone to understand and reproduce is that it uses a replit for headless browser web scraping. I think that web scraping is going to be a pretty basic operation in Dust, so perhaps we should create a specific block type for it.

I would probably start with browserless.io, which offers a web scraping service as an API, with an API key, very similar to the way SerpAPI works with Google. They also let you put in a CSS selector if you want to just return a particular part of the document. It would be very similar to the SerpAPI block in terms of code, provider, etc.

Set-up the frequency penalty and presence penalty

Running status from setRunRequested is not propagaged to UI due to refresh from useRunStatus

LLM block thinks that having SerpApi enabled means you have an LLM provider available.

To repro:

Disable all providers
Enable just SerpApi and no LLM providers.
Add an LLM block.

The LLM block will have an "Add Provider" button but nothing happens when you click it.

Expected behavior: The Setup Providers button shows up.

Add LLM - Dummy that generates random text for cheap testing

Please add a license to this repo

First, thank you for sharing this project with us!

Could you please add an explicit LICENSE file to the repo so that it's clear
under what terms the content is provided, and under what terms user
contributions are licensed?

Per GitHub docs on licensing:

[...] without a license, the default copyright laws apply, meaning that you
retain all rights to your source code and no one may reproduce, distribute,
or create derivative works from your work. If you're creating an open source
project, we strongly encourage you to include an open source license.

Thanks!

Map block's `from` field not big enough in most cases

CSS selectors that include special characters cause an app to error

To repro:

Create a new app.
Add a browser block.
Enter a URL, and enter the selector section[name=articleBody].
Hit Run.

Expected: the app runs and pulls down the content.
Actual: The app doesn't run, and it reports an error like

Specification error: --> 6:20 | 6 | selector: section[name=articleBody] | ^--- | = expected key

I'm fairly certain what's happening here is that the dust spec doesn't like special characters like '[' and ']' when the value is on the same line as the name of the attribute; we need to turn selector into a multiline.

I'm going to work on this right now; assigning to myself.

Cache unique key constraint when running model with same prompt on multiple inputs

Dataset not saved warning is fired even if nothing is changed when description is empty

put few-shot count below Examples block

init/init.sh not running in dust-front Docker image

front/Dockerfile currently has 2 CMD commands. But Docker only allows one per Dockerfile. So the first one is getting ignored.

dust/front/Dockerfile

Lines 9 to 12 in eaef34b

    
           CMD ["./init/init.sh"] 
        
           EXPOSE 3000 
        
           CMD ["npm", "run", "dev"]

The first one is the command that sets up the database so this is failing silently and then I got a database error. When I logged into the docker image and ran the init script it fixed things.

Grouping of Blocks in UI

Would be great to group blocks and enable some folding

App Deploy

Automatically clean-up runs that led to an unexpected error & store error

Need to move logs to flatfiles as a starter, or datadog / similar, because they are currently lost

Consider renaming map-reduce blocks some kind of "loop" in the UI

Personally, I found the names map and reduce a little misleading; it made it seem like they were part of a massive parallel processing engine, and it also made it seem like the reduce step should have logic. Having read over the code, I'm pretty sure they are more similar to a for loop.

In light of this, maybe we should rename map and reduce blocks in the UI to one of the following:

loop and end_loop
for_each and end_for_each
each and end_each

llm blocks do not do variable expansion of the few shot introduction

To repro, feed in some data from a previous block and try to read it out in a few shot introduction with something like "Here are some news stories about ${INPUT.person}". The variable will not get expanded, and Dust sends the literal string "${INPUT.person}" to GPT3.

Outputs green ticks get resized when text output is big (mobile)

Eg: https://dust.tt/apointil/a/ef09c8d861

Map section should show its output in a visually different way

When you run a map block over an array that is output from a previous block, the UI shows the output of the map block identically to the previous block (see picture below). When I saw this, I got confused, because it didn't seem like the map block was fanning out jobs across the array. But when I put a search block between map and reduce, it was clear that search ran once for each element of the array. Further, the output from search is different than the output from map, with a green check next to each individual run of the block.

I think I would want the output of the map block to look more like the output of the blocks inside of map-reduce. Other ideas are to have map output a flat list of all of its results with small visual indicators in between the different clusters that will reduce down together.

Note in the picture here that the output of INPUT and LOOP are the same. I think that LOOP's output should look more like SEARCH to indicate that there will be 3 runs, not one run on an array of 3:

[
  {
    "Question": "Who lived longer, Theodor Haecker or Harry Vaughan Watkins?",
    "Are follow up questions needed here": "Yes.",
    "Follow up": "How old was Harry Vaughan Watkins when he died?",
    "Intermediate answer": "Harry Vaughan Watkins was 69 years old when he died.",
    "So the final answer is": "Harry Vaughan Watkins."
  },
  {
    "Question": "Who lived longer, Theodor Haecker or Harry Vaughan Watkins?",
    "Are follow up questions needed here": "Yes.",
    "Follow up": "How old was Harry Vaughan Watkins when he died?",
    "Follow up": "How old was Harry Vaughan Watkins when he died?",
    "Intermediate answer": "Harry Vaughan Watkins was 69 years old when he died.",
    "So the final answer is": "Harry Vaughan Watkins."
  }
]

Is accepted by the front-end but refused by the back-end. Should fix validation.