Code Monkey home page Code Monkey logo

databutler's People

Contributors

rbavishi avatar somaniarushi avatar

Stargazers

 avatar  avatar

Watchers

 avatar

databutler's Issues

Upgrade attrs to latest version

PAT was originally written with attrs==19.1.0. The API has transformed a lot since then, and mostly for good.

The backward compatibility is great, and PAT still functions correctly with the latest version 21.4.0, but the usage of the API does not reflect best practices in many places. Upgrade API usage in PAT to ensure consistency with the rest of the code.

Configure Logging

It may be worthwhile to use the loguru library, but the benefits need to be fully assessed. It does have better coloring and formatting support, plus rotation file-logging.

Integrate Gauss

Integrate the interaction-based synthesis engine for Pandas - Gauss (https://github.com/rbavishi/gauss-oopsla-2021) as an API to invoke in a notebook environment.

An ongoing issue with the original Gauss is high memory usage. Use on-demand loading of the knowledge base (oracle) to mitigate this.

Add framework for generating and describing code-changes

  • The framework must allow for easy addition of different classes or strategies of code changes such as removing keyword args, removing function calls, removing assignments etc.
  • It should be easy to write new code change strategies.

Finalize structure of the corpus and a corpus member

Need to finalize how to represent the corpus of code variants as a graph. This should be able to store changes along with their descriptions, and the vanilla description of a code snippet. Note that we want to represent the corpus implicitly, as something that can be generated, rather than explicitly.

Add campaign functionality for corpus creation

Once the corpus organization is finalized, the next step is to write robust functionality for undertaking campaigns for generating the corpus.

The campaign must support the following:

  • Create a campaign directory.
  • Allow the user to create and fill in few-shot examples, along with errors and warnings about mistakes / missing examples.
  • Use bidirectional consistency for generating high-fidelity descriptions for changes and code snippets.

Add Code Processor for Variable-name and Dataframe-Column Optimization

We need a code-processor that can optimize away unnecessary variable names and dataframe columns. For example, convert

def f(df, col1):
   import seaborn as sns
   df1 = df.dropna()
   sns.distplot(df1[col1])

to

def f(df, col1):
   import seaborn as sns
   df1 = df.dropna()
   sns.distplot(df1[col1])

and

def f(df, col1):
   import seaborn as sns
   df["NewCol"] = df[col1].dropna()
   sns.distplot(df1["NewCol"])

to

def f(df, col1):
   import seaborn as sns
   df[col1] = df[col1].dropna()
   sns.distplot(df1[col1])

This optimization allows for easy independence of code transformations, which is necessary for representing our corpus of code variants efficiently (linear vs exponential space).

Add abstractions for using multiple library versions for executing code in a different process

Running mined code often requires specific versions of various libraries to function properly. Additionally, the regular dev code relies on libraries whose latest versions are often incompatible with the ones required for running the mined code. Thus, there is a need to be able to run the mined code without disturbing the primary install.

So far, the best idea is to install the other versions to a directory using something like pip install --target=<path-to-dir>. Then to execute the mined code, we spawn a new process and modify sys.path to have this directory at the front so the correct version is picked up.

There are other methods like importlib, but they are hard to get write for big libraries like pandas which themselves depend on other libraries such as numpy.

Add framework for code processors

Setup an easy-to-extend framework for adding code processing transforms, such as unnecessary variable removal, keyword-arg normalization, etc.

Integrate VizSmith as a Mode

Provide the existing VizSmith interface as one mode in the overall Datana design, using Codex as the NL metadata provider rather than the existing comment-mining mechanism.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.