Code Monkey home page Code Monkey logo

gpt-dm's Introduction

Greetings, Earthling πŸ‘‹

Hi πŸ‘‹, I'm Sean, a passionate full stack developer from Ireland now living in the Netherlands

  • 🌱 I’m currently learning more about Clean and Sustainable architecture, Machine Learning, Data Engineering and Functional Programming ...
  • ⚑ Fun fact: I enjoy cooking new things ...

Sean's GitHub stats

Top Langs

πŸ“ Blogs

Anti-Patterns Codin' - coding problems and solutions

Latest posts

UI Blunderz - software engineering and quality commentary

Latest posts

Badges

mrseanryan

Buy Me a Coffee at ko-fi.com

gpt-dm's People

Contributors

mrseanryan avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

gpt-dm's Issues

A try remove LangChain - via extraction or new Menu prompt

There is a dependency on LangChain which makes porting this away from Python difficult. Also may couple to Open AI.

Options:

  • extract the multi prompt processing from LangChain
  • try a simpler β€œmenu prompt” approach to just ask the LLM to pick the best command or else the menu summary

Try generating higher volume of test data (via Mistral 7B ?)

Approach:

  • use a larger context window

  • repeat and concatenate, with higher temperature

  • multiply programmatically (post-process)


LLM - Mistral 7B

Context size: a sliding 4K window - see https://huggingface.co/mistralai/Mistral-7B-v0.1/discussions/4

There are many flavours of Mistral-7B:

  1. regular mistral-7B, quantized
  1. cognitivecomputations/dolphin-2.6-mistral-7b-dpo
    • mistral-7B with larger context window

4 - instruct prompting, and newer:

  1. a base version, suitable for FT
  1. hermes-2.5 - mistral-7B
  1. phi from Microsoft

Add feature to summarize a large schema

take a list of tables, with their properties

output high level summary:

  • classify tables - but grouping via associations could be more important.

  • table summary

  • class description

  • higher level class of classes
    (top level category, then table category)

use LLM
or just an embedding like sbert? (word2vec) then cluster - or dillibert no case
but need name the cluster

  • use llm for that.
    or better have categories and convert to embedding. pick the closest one:

dot prod of 2 normalised vectors = cos Angle
cosine distance = 1 - v.w
smaller then closer

  • user can add categories. different views (sets of categories).

  • stem entity names by their casing

Try modelling the DM as a graph. Then can use DOT

A Data Model can be modelled as a graph:

  • dm -> entity
  • entity -> attribute [label = name, type]
  • entity -> entity [label = association-type]

The can use DOT instead of json
can use smaller LLM like CodeLlama 13B

see gpt-workflow issues

Try BPML as alt to DOT

BPML = Business Process Modeling Language

a DSL so could be better format than DOT...

Try self hosted Mistral 7B-instruct

Mistral 7B instruct is easy to host and already seems promising, without any fine tuning:

Prompt format:

text = "<s>[INST] What is your favourite condiment? [/INST]"
"Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!</s> "
"[INST] Do you have mayonnaise recipes? [/INST]"

Example:

<s>[INST] 

Follow these steps:

- Step 1: Ask the user what is the name of the application. In later steps, this will be referred to as <application-name>.
- Step 2: Ask the user for the list of entity names. In later steps, this will be referred to as <entities-list>.
- Step 3: Ask the user to describe how the entities are related.
- Step 4: Create a database schema in JSON for the application named <application-name>.
This application manages the entities: <entities-list>.
The entities are related as follows: <entity-relationships-list>.

The output MUST be in JSON only, based on the following example:
---
{
  "application": β€œmy-cars",
  "entity-names": ["Users","Projects","Tasks"],
  "entity-attributes": {
    <entity-A-name>: [<list of suitable attributes>],
    <entity-B-name>: [<list of suitable attributes>]
  },
    "entity-relationships": {
    <entity-A-name>: {
      <entity-B-name>: β€œ<multiplicity>”
    }
  }
}
---

Where:
  - <multiplicity> can be one of these: one-to-one, one-to-many, many-to-one, many-to-many
  - <entity-A-name> is the name of some entity
  - <entity-B-name> is the name of some other entity

IMPORTANT: For step 4, only output valid JSON.
[/INST] Sure, I'd be happy to help you with that! Can you please provide me with more information about the entities and their relationships?</s>

[INST]
A Car has Wheels. A Wheel has a Tyre.
[/INST]
Great, thank you for providing the entity relationship details. Based on your input, here's an example of how the JSON output could look like:
---json
{
	"application": "car_app",
	"entity-names": ["Cars", "Wheels", "Tyres"],
	"entity-attributes": {
		"Cars": ["color", "make", "model", "year"],
		"Wheels": ["size", "brand", "material"],
		"Tyres": ["pressure", "tread", "wear"]
	},
	"entity-relationships": {
		"Cars": {
			"Wheels": "many-to-many"
		},
		"Wheels": {
			"Tyres": "many-to-many"
		}
	}
}
``</s>
[INST]Generate example data in JSON format[/INST]
Sure, here's an example of how the data for each entity could look like in JSON format:
---json
// Cars
{
  "id": 1,
  "color": "red",
  "make": "Toyota",
  "model": "Camry",
  "year": 2021
}

// Wheels
{
  "id": 1,
  "size": 15,
  "brand": "Goodyear",
  "material": "steel"
}

// Tyres
{
  "id": 1,
  "pressure": 30,
  "tread": 8,
  "wear": 20
}
---
Please note that the above data is just an example and may not necessarily reflect real-world data. The actual data would depend on the specific requirements of the

note: in this example, backtick `` x 3 was replaced with --- so that I could paste it into Markdown format

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.