Have fun with cratedb
pip install crash
docker-compose -f infra/local-instance/docker-compose.yml up
or
docker run -d -p 4200:4200 -p 4300:4300 -p 5432:5432 --env CRATE_HEAP_SIZE=1g crate \
crate -Cnetwork.host=_site_
cd infra/aws-instance
terafform init
terraform plan
terraform apply
-
Terraform work but healcheck fails and I cannot connect to cluster
- problem got resolved when I used subnet connected to IGW
- but this also revealed https://community.cratedb.com/t/issue-connecting-to-cratedb-cloud-cluster-from-local-machine/1707/11
- there is undocumented
--verify-ssl=False
- psql works fine
- problem got resolved when I used subnet connected to IGW
-
Docker compose with few instances and with single instance on MacOS + colima I had to do, to make it work abiosoft/colima#384
-
Cloud shared works.
-
Crate:
- SQL import files via URL, cool feature!
-
Crash:
- works fine
-
Cloud/Console:
- console.cratedb.cloud SQL is unpleasant on Safari // I had to write this select in text editor and copy it in SELECT * FROM guestbook.countries
- edge regions sounds cool, but I'm not sure what it is and how it works, I would love to see some documentation
-
CrateDB/Console:
-
- Go implementation works fine
- It could be more consistent with methods receivers that are sometime pointers and sometimes values
- Go implementation works fine
-
-
Building RAG search on crate documentation I had problem with
- When using
crate.client
python package throws error and cratedb search in documentation is not helpfulRemoved server https://localhost:4200 from active pool
- SQLAlchemy and pandas works fine...
crash
CLI also works fine...
- When using
-
A quite flustrating expirence I had with fulltext search
CREATE TABLE IF NOT EXISTS "doc"."docs_3" ( "title" TEXT, "url" TEXT, "html" TEXT INDEX USING FULLTEXT WITH ( analyzer = 'english' ) )
This is SQL that I RUN
select d.* from doc.docs_3 as d where match("d"."html" , "find me crate") limit 100;
I get error:
io.crate.exceptions.ColumnUnknownException: Column find me crate unknown
Where error should be something along the lines use
'
maybe you want to use single quotes? -
Suggestion to AdminUI, allow binding params in SQL queries, so I can use
?
instead offind me crate
-
RAG observations
- FAISS index and OpenAI build baseline for RAG search
- I was positively suppressed how well it works and how it help me better navigate through Crate documentation
- There are few queries like "give me latest blog posts/updates" RAG fails, because it does not have time update in index.
- Adding query pre-processing with "self-querying" retriever could help to introduce features like "sort" documents by date, or filter docs by some fields like "type:blog" or "type:news" etc.
- Experimenting with Mistral-7B and sentence_transformers as replacement for commential solutions,
shows very BAD results.
- I haven't done any formal analysis, but reading through answers, there are few problems.
- Small context window " Requested tokens (552) exceed context window of 512"
- Poor instruction following, etc.
- Slow execution is also an problem
- I haven't done any formal analysis, but reading through answers, there are few problems.
- Experimenting with CrateDB and fulltext, where I store whole documents (without chunking) + using Mistral-7B show very unsatisfactory results
- I decided to chunk documents and do fulltext index on them.
- CrateDB in Docker with 3 instances took forever to index this data. FAISS was MUCH faster.
- I decided to implement dedicated Crate Vectorstore crate.py It significantly improved performance, although it revealed that there is not easy way to do bach insert in CrateDB (requires REST api)
- Chunked results got better, but I notice that either RAG.
- Rank fusion experiment.
- Crate Vectorstore allows to for changing retrival methods from fulltext search to knn, which allows for next experiment relate to rank fusion.
- I decided to use FAISS and fulltext search to retrive documents and then rank them using Borda count.
- Results look good, but not different when using semantic search. Overall, it at least help to verify implementation.
- Funny enough I wanted to find if CrateDB offers fusion search (it does not, according to my searches)
- And whe I ask RAG to find answer to question: "How to do fusion search and connect vector search with fulltext search" It got it wrong.
-
- I sent invitation to new user, and when I clicked on link in email it shows me "Invitation not found or expired." message. In the console I see that invitation is valid for 24h, but I clicked on it in less than 1h. I use Safari and I was logged in to CrateDB console Using Chrome in incognito shows login page, no info what to do with invitation.
- JobScheduler in terraform would be awesome
- Budget / costs overview on timeline
- Permission system and roles in cloud to run certain operations
- what JOB scheduler are (in context of CrateDB) using?
- what are edge regions?
- What are edge regions
- what and how BLOB tables work?
- How AWS marketplace works, and why I cannot see deployment in my account?
- How Cloud console
- What are limits of scalability?