Code Monkey home page Code Monkey logo

genai's Introduction

Install | License | Code of Conduct | Contributing

GenAI: generative AI tooling for IPython

🦾 Get GPT help with code, SQL queries, DataFrames, Exceptions and more in IPython.

🌍 Supports all Jupyter environments, including IPython, JupyterLab, Jupyter Notebook, and Noteable.

TL;DR Get started now

%pip install genai
%load_ext genai

Genai In Action

Genai making a suggestion followed by running suggested code

Introduction

We've taken the context from IPython, mixed it with OpenAI's Large Language Models, and are bringing you a more informed notebook experience that works in all Jupyter environments, including IPython, JupyterLab, Jupyter Notebook, and Noteable. 🦾🌏

Requirements

Python 3.8+

Installation

Poetry

poetry add genai

Pip

pip install genai

Loading the IPython extension

Make sure to set the OPENAI_API_KEY environment variable first before using it in IPython or your preferred notebook platform of choice.

%load_ext genai

Features

  • %%assist magic command to generate code from natural language
  • Custom exception suggestions

Custom Exception Suggestions

In [1]: %load_ext genai

In [2]: import pandas as pd

In [3]: df = pd.DataFrame(dict(col1=['a', 'b', 'c']), index=['first', 'second', 'third'])

In [4]: df.sort_values()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[4], line 1
----> 1 df.sort_values()

File ~/.pyenv/versions/3.9.9/lib/python3.9/site-packages/pandas/util/_decorators.py:331, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
    325 if len(args) > num_allow_args:
    326     warnings.warn(
    327         msg.format(arguments=_format_argument_list(allow_args)),
    328         FutureWarning,
    329         stacklevel=find_stack_level(),
    330     )
--> 331 return func(*args, **kwargs)

TypeError: sort_values() missing 1 required positional argument: 'by'

πŸ’‘ Suggestion

The error message is indicating that the sort_values() method of a pandas dataframe is missing a required positional argument.

The sort_values() method requires you to pass a column name or list of column names as the by argument. This is used to determine how the sorting will be performed.

Here's an example:

import pandas as pd

df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Carol', 'David', 'Eva'],
    'Age': [32, 24, 28, 35, 29],
    'Salary': [60000, 40000, 35000, 80000, 45000]
})

# sort by Age column:
df_sorted = df.sort_values(by='Age')
print(df_sorted)

In this example, the by argument is set to 'Age', which sorts the dataframe by age in ascending order. Note that you can also pass a list of column names if you want to sort by multiple columns.

Example

In [1]: %load_ext genai

In [2]: %%assist
   ...:
   ...: Can you explain this query to me so I can be sure we're doing the right things?
   ...:
   ...: ```sql
   ...: SELECT
   ...:   COUNT(*) AS num_downloads,
   ...:   DATE_TRUNC(DATE(timestamp), DAY) AS day
   ...: FROM `bigquery-public-data.pypi.file_downloads`
   ...: WHERE
   ...:   file.project = 'genai'
   ...:   -- Only query the last 6 months of history
   ...:   AND DATE(timestamp)
   ...:     BETWEEN DATE_TRUNC(DATE_SUB(CURRENT_DATE(), INTERVAL 6 MONTH), MONTH)
   ...:     AND CURRENT_DATE()
   ...: GROUP BY day
   ...: ORDER BY day DESC
"This is a SQL query that counts the number of downloads for the 'genai' project on the Python Package Index (PyPI) over the last 6 months, grouped by day. The query selects the count and the date, truncating the date to the day level so that it's easier to read.

Here is a breakdown of each part of the query:

- `SELECT COUNT(*) AS num_downloads, DATE_TRUNC(DATE(timestamp), DAY) AS day`: This selects the count of the number of rows matched by the query as `num_downloads`, and the date truncated to the day level as `day`.
- `FROM `bigquery-public-data.pypi.file_downloads``: This specifies the table to query from.
- `WHERE file.project = 'genai'`: This filters the rows by only including downloads for the 'genai' project.
- `AND DATE(timestamp) BETWEEN DATE_TRUNC(DATE_SUB(CURRENT_DATE(), INTERVAL 6 MONTH), MONTH) AND CURRENT_DATE()`: This filters the rows by only including downloads from the last 6 months. The `DATE_SUB` function subtracts 6 months from the current date (`CURRENT_DATE()`), `DATE_TRUNC` truncates the result to be the start of the month and `DATE` converts the timestamp column to a date so the `BETWEEN` condition can filter rows between the start of 6 months ago and "today."
- `GROUP BY day`: This groups the rows by day so that the counts are aggregated by date.
- `ORDER BY day DESC`: This orders the rows so that the most recent date appears first in the result."

genai's People

Contributors

dependabot[bot] avatar mseal avatar pierrebrunelle avatar rgbkrk avatar sauravmaheshkar avatar shouples avatar willingc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

genai's Issues

[Prompt] Suggest how to install when ModuleNotFound

When a user hits a ModuleNotFound Error, we should send a different kind of prompt and context. Sometimes the user makes a typo, and sometimes they don't have the package installed. genai should be able to figure out between the two.

Missing Package

image

Typo

image

Context

  • Send the user's code
  • Send the output of pip freeze (assuming this isn't too big) as a role: system message
Bonus Context / Optimization

As a bonus, if the pip freeze output is too big maybe we can use something like levenshtein distance to pull the closest string matches.

packages = !pip list --format=freeze
packages = [pkg.split('==')[0] for pkg in packages]

import Levenshtein

def find_closest_strings(target, string_list, n=20):
    """
    Finds the n closest strings in string_list to the target string.
    """
    # Compute the Levenshtein distance between the target string and each string in the list
    similarity_scores = [(string, Levenshtein.distance(target, string)) for string in string_list]

    # Sort the list based on the similarity scores
    similarity_scores.sort(key=lambda x: x[1])

    # Return the n closest strings
    return [x[0] for x in similarity_scores[:n]]

find_closest_strings("pndas", packages)

Outputs

['pandas',
 'conda',
 'dask',
 'anyio',
 'appdirs',
 'attrs',
 'dx',
 'fqdn',
 'fs',
 'genai',
 'geopandas',
 'idna',
 'jedi',
 'Jinja2',
 'openai',
 'parso',
 'partd',
 'patsy',
 'pip',
 'py']

Prompt suggestion

  • Use !pip or %pip if an install is needed
  • "Here are the packages installed on the system"

[?][Design] Should %%assist return Markdown output?

I'm starting to think that we should emit markdown output instead of creating a new cell with %%assist.

Reasons:

  • Responses tend to come back in either Markdown or a plain text format
  • set_next_input (the IPython new cell creator) can't stream updates
  • The response usually needs some editing
  • The document gets littered with new cells on repeated runs

Create a way to override the default prompts

I'd like to be able to experiment with the prompts to do tuning outside of making another release.

import genai

genai.set_default_error_prompt(
    """
You are a data scientist diagnosing errors from your colleagues.
They ran into an error in their notebook. Be concise. Format your response in markdown.
Be extremely minimal with prose, aiming primarily for concise fixes.
""".strip()
)

As the python package solidifies, the most likely piece to change over time is going to be the prompt.

Remove the pip prompting

Maybe the prompt about %pip vs !pip is too much. Sometimes GPT includes this in code segments that don't even have an install:

image

Ignore Keyboard Interrupt

We should not provide suggestions for KeyboardInterrupt or any other terminations. Probably should also stop if out of memory.

Trim down reprs

We should make sure to set pandas display options to something reasonable when using the get_historical_context, because this blows up huge pretty quick

def craft_output_message(output):
    return {
        "content": repr(output),
        "role": "system",
    }

We should always try to keep this under a certain length as well as do any processing in advance for known useful objects.

Set up CI

Totally happy with the same base we use for other projects.

[MAINT] Combine the two dev deps sections for poetry

Due to poetry changes we now have two development dependencies sections:

[tool.poetry.dev-dependencies]
flake8-docstrings = "^1.6.0"
pytest = "^7.2.2"
black = "^23.1.0"
isort = "^5.10.1"
pytest-cov = "^4.0.0"
pytest-asyncio = "^0.19.0"
nox = "^2022.1.7"
nox-poetry = "^1.0.1"
pytest-mock = "^3.8.2"
bump2version = "^1.0.1"

[tool.poetry.group.dev.dependencies]
pandas = "^1.5.3"

Those need to all be under tool.poetry.group.dev.dependencies based on the feedback poetry now gives when you poetry add --dev

The --dev option is deprecated, use the `--group dev` notation instead.

Increase output context sharing

We are currently limited to execute_result for available outputs for sharing with LLMs (by nature of how Out works). How can we increase output context sharing? I'd like to be able to have !pip show pkg be part of what the LLM reads from for context when assisting and working through errors.

Setting intentions

The goal of this package is to expose AI tools to ipython primitives like:

  • Cell creation set_next_input
  • Custom Exception handling
  • Completion

As well as using contextual information like the current in-memory variables and previous inputs (In) to provide context to ChatGPT.

Consider a feedback loop

For suggestions, we could use input to do back & forth between the user and ChatGPT.

# Assume messages is all the messages we sent prior as well as "assistant responses"
messages.append(completion["choices"][0]["message"])

user_input = input("chat> ")

while user_input is not "":
    user_message = {"role": "user", "content": user_input}
    messages.append(user_message)

    completion = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=messages,
    )

    assistant_message = completion["choices"][0]["message"]
    messages.append(assistant_message)

    display(
        {
            "text/plain": assistant_message["content"],
            "text/markdown": assistant_message["content"],
        },
        raw=True,
    )

    user_input = input("chat> ")

It's not the best user experience. However, it'll work across all the jupyter platforms, even ipython at the command line.

[Updates] Switch to progress bar for `%%assist`

The messages that get sent with %%assist don't make much sense when we tend to get markdown back. We should switch this to a progress bar. In fact... maybe we should just emit Markdown and allow the user to copy the code instead of creating a new cell.

Send unprocessed code over to ChatCompletion

Sometimes ChatCompletion will use the get_ipython().run_cell_magic line which I assume comes from it seeing that in the original code:

image

We should make sure it's seeing the cell with the magic.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.