Code Monkey home page Code Monkey logo

cratedb-examples's Introduction

CrateDB Examples

✨ A collection of clear and concise examples how to work with CrateDB. ✨

πŸ”— Quick links: Application β€’ Dataframe β€’ Language β€’ Testing β€’ Topic

πŸ“– More information: Drivers and Integrations β€’ Integration Tutorials β€’ Reference Documentation

πŸ‘¨β€πŸ’» Usage

  • You can explore the content by browsing folders within the repository. Main sections can be explored by using the quick links in the header area.

  • If you are looking for something specific, please use GitHub search, for example, searching for "jdbc".

  • You can use the code snippets for educational and knowledge base purposes, or as blueprints within your own projects.

  • The repository is also used to support QA processes. Each example is designed to be invoked as an integration test case, accompanied by a corresponding CI validation job.

🧐 What's inside

This section gives you an overview about what's inside the relevant folders.

  • by-dataframe contains example code snippets how to work with dataframe libraries like pandas, Polars, Dask, Spark, and friends.

  • by-language contains demo programs / technical investigations outlining how to get started quickly with CrateDB using different programming languages and frameworks.

  • application contains integration scenarios with full-fledged applications and software frameworks.

  • testing contains reference implementations about how to use different kinds of test layers for testing your applications with CrateDB.

  • topic mostly contains Jupyter Notebooks outlining different use cases around working with time-series data, and demonstrating machine learning technologies together with CrateDB.

βœ… CI Status

Please visit the Build Status page to inspect the build status of relevant drivers, applications, and integrations for CrateDB, on one page.

πŸ•οΈ Testing

In the same way as on CI, you can invoke the example programs easily on your workstation, in order to quickly get started on behalf of working example code, or to verify connectivity within your computing environment.

Prerequisites

For invoking the software integration tests, you will need installations of Docker, Python, and Git on your workstation.

Before running the tests, make sure to supply an instance of CrateDB. In order to use and verify the most recent available code, let's select the OCI image crate/crate:nightly.

docker run --rm -it --pull=always \
    --name=cratedb --publish=4200:4200 --publish=5432:5432 \
    --env=CRATE_HEAP_SIZE=4g \
    crate/crate:nightly -Cdiscovery.type=single-node

Test Runner ngr

The repository uses a universal test runner to invoke test suites of different languages and environments, called ngr.

In order to run specific sets of test cases, you do not need to leave the top-level directory, or run any kind of environment setup procedures. If all goes well, just select one of the folders of interest, and invoke ngr test on it, like that:

ngr test by-language/java-jdbc
ngr test by-language/python-sqlalchemy
ngr test by-language/php-amphp
ngr test by-dataframe/dask
ngr test application/apache-superset
ngr test testing/testcontainers/java
ngr test topic/machine-learning/llm-langchain

Note

It is recommended to invoke ngr from within a Python virtualenv, in order to isolate its installation from the system Python. Installing ngr works like this:

git clone https://github.com/crate/cratedb-examples
cd cratedb-examples
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Test Matrix Support

Some examples optionally obtain parameters on invocation time.

One example is the test suite for Npgsql, which accepts the version number of the Npgsql driver release to be obtained from the environment at runtime, overriding any internally specified versions. Example:

ngr test by-language/csharp-npgsql --npgsql-version=6.0.9

Tip

This feature is handy if you are running a test matrix, which is responsible for driving the version numbers, instead of using the version numbers nailed within local specification files of any sort.

πŸ’ Contributing

Interested in contributing to this project? Thanks so much for your interest!

As an open-source project, we are always looking for improvements in form of contributions, whether it be in the form of a new feature, improved infrastructure, or better documentation.

Your bug reports, feature requests, and patches are greatly appreciated.

🌟 Contributors

Contributors to CrateDB Examples

cratedb-examples's People

Contributors

amotl avatar andnig avatar ckurze avatar dependabot[bot] avatar hammerhead avatar hlcianfagna avatar karynzv avatar marijaselakovic avatar matriv avatar proddata avatar surister avatar wierdvanderhaar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cratedb-examples's Issues

Issue with requirements.txt

When installing requirements.txt in cratedb-vectorstore-rag-openai-sql.ipynb notebook in Google Colab I got the following ResolutionImpossible Error:

ERROR: Cannot install -r https://raw.githubusercontent.com/crate/cratedb-examples/main/topic/machine-learning/llm-langchain/requirements.txt (line 4), crate[sqlalchemy]>=0.34 and langchain[cratedb,openai]==0.1.4 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested crate[sqlalchemy]>=0.34
    cratedb-toolkit 0.0.5 depends on crate[sqlalchemy]>=0.34
    langchain[cratedb,openai] 0.1.4 depends on crate[sqlalchemy]<0.35.0 and >=0.34.0; extra == "cratedb"

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

@amotl is it possible for you to check this and loose the range of packages?

LangChain: FileNotFoundError: [Errno 2] No such file or directory: 'mlb_teams_2012.sql'

About

Nightly scheduled tests tripped here.

FAILED test.py::test_file[document_loader.py] - FileNotFoundError: [Errno 2] No such file or directory: 'mlb_teams_2012.sql'

-- https://github.com/crate/cratedb-examples/actions/runs/6938705593/job/18874863128#step:6:697

And there.

FAILED test.py::test_notebook[document_loader.ipynb] - Failed: Direct construction of pytest_notebook.plugin.JupyterNbCollector has been deprecated, please use pytest_notebook.plugin.JupyterNbCollector.from_parent.

-- https://github.com/crate/cratedb-examples/actions/runs/6938705593/job/18874863128#step:6:689

AutoML: CI trips with `ValueError: Input contains NaN.`

Originally coming from an issue that mixed things up, GH-170, let's get things straight here.

Problem

CI on the AutoML job occasionally trips like this, failing the CI run.

FAILED test.py::test_file[automl_timeseries_forecasting_with_pycaret.py] - ValueError: Input contains NaN.
self = <joblib.parallel.BatchCompletionCallBack object at 0x7f4f737cb910>

    def _return_or_raise(self):
        try:
            if self.status == TASK_ERROR:
>               raise self._result
E               ValueError: Input contains NaN.

-- https://github.com/crate/cratedb-examples/actions/runs/7884792002/job/21514554253#step:6:1146

Outlook

@andnig shared his suggestions at #170 (comment) already. Maybe you can add them here instead?

LangChain: Resources need upgrades re. `openai>=1`

Problem

# APIRemovedInV1: You tried to access openai.ChatCompletion, but this is no longer supported in openai>=1.0.0
openai==0.28
E           APIRemovedInV1: 
E           
E           You tried to access openai.ChatCompletion, but this is no longer supported in openai>=1.0.0 - see the README at https://github.com/openai/openai-python for the API.
E           
E           You can run `openai migrate` to automatically upgrade your codebase to use the 1.0.0 interface. 
E           
E           Alternatively, you can pin your installation to the old version, e.g. `pip install openai==0.28`
E           
E           A detailed migration guide is available here: https://github.com/openai/openai-python/discussions/742

References

Workaround

I've chosen to downgrade for now. Can you have a look, @andnig?
-- 04d46e3

TSML: Error in `timeseries-anomaly-detection.ipynb`

Problem

The timeseries-anomaly-detection.ipynb notebook errors out, both on Python 3.10 and 3.11 12.

ValueError: Found array with 0 sample(s) (shape=(0, 1)) while a minimum of 1 is required by SimpleImputer.

Observations

Because it happens on both versions of Python, it is most probably unrelated to the change per se where it started tripping.

Thoughts

Most probably another dependency flaw?

Footnotes

  1. https://github.com/crate/cratedb-examples/actions/runs/8743815972/job/23995257165?pr=425#step:6:1505 ↩

  2. https://github.com/crate/cratedb-examples/actions/runs/8743815972/job/23995257591?pr=425#step:6:1350 ↩

Dependency installation fails on Google Colab for `cratedb-vectorstore-rag-openai-sql.ipynb`

Steps to reproduce:

  1. Go to the README file of the folder for the RAG pipeline notebook: https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/README.md
  2. Next to cratedb-vectorstore-rag-openai-sql.ipynb, click the Open in Colab button
  3. Uncomment the remote requirements.txt installation and run it: !pip install -r https://raw.githubusercontent.com/crate/cratedb-examples/main/topic/machine-learning/llm-langchain/requirements.txt

After some time, it fails with:

INFO: pip is looking at multiple versions of langchain[cratedb,openai] to determine which version is compatible with other requirements. This could take a while.
ERROR: Cannot install -r https://raw.githubusercontent.com/crate/cratedb-examples/main/topic/machine-learning/llm-langchain/requirements.txt (line 4), crate[sqlalchemy] and langchain[cratedb,openai]==0.1.4 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested crate[sqlalchemy]
    cratedb-toolkit 0.0.3 depends on crate[sqlalchemy]>=0.34
    langchain[cratedb,openai] 0.1.4 depends on crate[sqlalchemy]<0.35.0 and >=0.34.0; extra == "cratedb"

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

I could so far not reproduce the issue locally with a Python 3.10 environment.

AutoML: CI trips with `CellTimeoutError` / `ValueError: Input contains NaN.`

Dear @andnig,

the CI caught an error from automl_timeseries_forecasting_with_pycaret.py 1.

FAILED test.py::test_file[automl_timeseries_forecasting_with_pycaret.py] - ValueError: Input contains NaN.

Apparently, it started tripping like this only yesterday 2, so it is likely the error is related to changed input data.

However, the result of debugging this error may well converge into a corresponding issue at PyCaret, because its promises are so high. On the other hand, the code may just need a particular data cleansing step, to accomodate the situation. May I ask you to have a look?

With kind regards,
Andreas.

Footnotes

  1. https://github.com/crate/cratedb-examples/actions/runs/7027445018/job/19121837475#step:6:1492 ↩

  2. https://github.com/crate/cratedb-examples/actions/runs/7013620591/job/19080058650 ↩

Issue with time series forecasting with pycaret notebook

When running automl_timeseries_forecasting_with_pycaret.ipynb notebook the mlflow-cratedb module gets installed but it is not found when importing:

ModuleNotFoundError                       Traceback (most recent call last)
[<ipython-input-3-c9198aa96905>](https://localhost:8080/#) in <cell line: 6>()
      4 import plotly
      5 import plotly.graph_objects as go
----> 6 import mlflow_cratedb  # Required to enable the CrateDB MLflow adapter.
      7 from dotenv import load_dotenv
      8 

ModuleNotFoundError: No module named 'mlflow_cratedb'

Can you check/test the code? The issue persists both locally and in Colab.

[Testing] Demonstrate CrateDB test layers with parameterization

About

GH-280 only demonstrates basic usage of CrateDB test layer variants. On a subsequent iteration, we may want to demonstrate how to parameterize them.

Requirements

The minimum set of parameters needed to cover common use cases in software testing.

  • Software version of CrateDB (GA release, testing, nightly).
  • TCP ports (HTTP and PG) CrateDB should be listening on.
  • Heap size used by CrateDB, CRATE_HEAP_SIZE.
  • Path to its data directory when aiming to expose it.

As an outlook...

  • cr8's test layer is also capable of running clusters of multiple nodes, right? That is probably happening somewhere in crate-qa already? It should also be demonstrated in this context here.

References

Those are pointers to where parameterization is used, and not demonstrated here, yet.

Use Markdown or Python for writing Jupyter Notebooks

About

Markdown as the lingua franca for many technical writing tasks should be used more, as it is roughly interoperable with, for example, GitHub and Discourse. 1

Conversion from Jupyter Notebooks

Just use nbconvert.

pip install nbconvert
jupyter nbconvert --to markdown automl_classification_with_pycaret.ipynb

Authoring Jupyter Notebooks

Instead of converting from, Jupyter Notebooks can be written in Markdown itself, see Notebooks written entirely in Markdown.

The easiest way to create a MyST notebook is to use Jupytext, a tool that allows for two-way conversion between .ipynb and a variety of text files. See also Notebooks as Markdown.

Footnotes

  1. Also with HubSpot, when throwing https://github.com/crate-workbench/hubspot-tech-writing into the mix. ↩

LangChain: `cratedb_rag_customer_support.ipynb` trips with `CellTimeoutError`

Problem

cratedb_rag_customer_support.ipynb has been introduced just recently.

It looks like the call to embeddings.embed_documents(pages_text) might take longer than expected / uses more compute resources / stalls for any other reasons?

E           nbclient.exceptions.CellTimeoutError: A cell timed out while it was being executed, after 120 seconds.
E           The message was: Cell execution timed out.
E           Here is a preview of the cell contents:
E           -------------------
E           embeddings = OpenAIEmbeddings(deployment='my-embedding-model', chunk_size=1)
E           pages_embeddings = embeddings.embed_documents(pages_text)
E           -------------------

/opt/hostedtoolcache/Python/3.11.7/x64/lib/python3.11/site-packages/nbclient/client.py:801: CellTimeoutError
------------------------------ Captured log call -------------------------------
ERROR    pytest_notebook.execution:client.py:795 Timeout waiting for execute reply (120s).

-- https://github.com/crate/cratedb-examples/actions/runs/7881120885/job/21504241805#step:6:848

Q & A

Can you dig a bit into this, @marijaselakovic? Do you have any idea where this may be coming from, or how it can be improved?

NB: It's not a unique thing. We are also taking care about the same details at GH-170 and GH-299.

`Received unexpected backend message of type ParseComplete`: `by-language/csharp-npgsql` starts failing with CrateDB 4.8.4

About

At GH-14, we discovered that the csharp-npgsql program would fail its test suite. While it still worked with CrateDB 4.8.3, it starts croaking with CrateDB 4.8.4.

Exception

The exception can be reproduced by running those commands:

docker run -it --rm --publish=4200:4200 --publish=5432:5432 crate:4.8.4
dotnet test --framework=net5.0
[xUnit.net 00:00:00.93]     demo.tests.DemoProgramTest.TestSystemQueryExample [FAIL]
[xUnit.net 00:00:00.95]     demo.tests.DemoProgramTest.TestBasicConversationExample [FAIL]
[xUnit.net 00:00:00.96]     demo.tests.DemoProgramTest.TestAsyncUnnestExample [FAIL]
  Failed demo.tests.DemoProgramTest.TestSystemQueryExample [1 ms]
  Error Message:
   System.AggregateException : One or more errors occurred. (Received unexpected backend message of type ParseComplete) (The following constructor parameters did not have matching fixture data: DatabaseFixture fixture)
---- System.Exception : Received unexpected backend message of type ParseComplete
---- The following constructor parameters did not have matching fixture data: DatabaseFixture fixture
  Stack Trace:

----- Inner Stack Trace #1 (System.Exception) -----
   at Npgsql.NpgsqlDataReader.ProcessMessage(IBackendMessage msg)
   at Npgsql.NpgsqlDataReader.NextResult(Boolean async, Boolean isConsuming, CancellationToken cancellationToken)
   at Npgsql.NpgsqlDataReader.NextResult()
   at Npgsql.NpgsqlCommand.ExecuteReader(CommandBehavior behavior, Boolean async, CancellationToken cancellationToken)
   at Npgsql.NpgsqlCommand.ExecuteReader(CommandBehavior behavior, Boolean async, CancellationToken cancellationToken)
   at Npgsql.NpgsqlCommand.ExecuteReader(CommandBehavior behavior)
   at Npgsql.PostgresDatabaseInfo.LoadBackendTypes(NpgsqlConnection conn, NpgsqlTimeout timeout, Boolean async)
   at Npgsql.PostgresDatabaseInfo.LoadPostgresInfo(NpgsqlConnection conn, NpgsqlTimeout timeout, Boolean async)
   at Npgsql.PostgresDatabaseInfoFactory.Load(NpgsqlConnection conn, NpgsqlTimeout timeout, Boolean async)
   at Npgsql.NpgsqlDatabaseInfo.Load(NpgsqlConnection conn, NpgsqlTimeout timeout, Boolean async)
   at Npgsql.NpgsqlConnector.LoadDatabaseInfo(Boolean forceReload, NpgsqlTimeout timeout, Boolean async, CancellationToken cancellationToken)
   at Npgsql.NpgsqlConnector.Open(NpgsqlTimeout timeout, Boolean async, CancellationToken cancellationToken)
   at Npgsql.ConnectorPool.OpenNewConnector(NpgsqlConnection conn, NpgsqlTimeout timeout, Boolean async, CancellationToken cancellationToken)
   at Npgsql.ConnectorPool.<>c__DisplayClass38_0.<<Rent>g__RentAsync|0>d.MoveNext()
--- End of stack trace from previous location ---
   at Npgsql.NpgsqlConnection.<>c__DisplayClass41_0.<<Open>g__OpenAsync|0>d.MoveNext()
--- End of stack trace from previous location ---
   at Npgsql.NpgsqlConnection.Open()
   at demo.tests.DatabaseFixture..ctor() in /Users/amo/dev/crate/docs/cratedb-examples/by-language/csharp-npgsql/tests/DemoProgramTest.cs:line 22
----- Inner Stack Trace #2 (Xunit.Sdk.TestClassException) -----

Screenshot

image

-- https://github.com/crate/cratedb-examples/actions/runs/4009337440

Problem invoking Docker Compose on tutorial about Apache Kafka, Apache Flink and CrateDB

Hi there,

at [1], @jainhemant163 shared with us that he isn't able to invoke the docker-compose.yml file, neither on his workstation nor on AWS EC2 instances. The invocation croaks like:

ERROR: The Compose file './docker-compose.yml' is invalid because:
Unsupported config option for services: 'kafka-zookeeper'
Unsupported config option for networks: 'scada-demo'

With kind regards,
Andreas.

[1] https://dev.to/jainhemant163/comment/1eckd

RAG: Problems resolving dependencies on Google Colab

Problem

@hammerhead reported a flaw with the cratedb_rag_customer_support_langchain.ipynb Notebook when invoked on Google Colab.

Dependency resolution around Dask fails, bzw. takes ages to complete, if at all.

Collecting dask (from fsspec[abfs,dask,gcs,git,github,http,s3,smb]<2024.3->pueblo[cli,fileio,nlp]>=0.0.7->-r https://raw.githubusercontent.com/crate/cratedb-examples/main/topic/machine-learning/llm-langchain/requirements.txt (line 9))
  Downloading dask-2022.5.2-py3-none-any.whl (1.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 65.4 MB/s eta 0:00:00
Collecting distributed (from fsspec[abfs,dask,gcs,git,github,http,s3,smb]<2024.3->pueblo[cli,fileio,nlp]>=0.0.7->-r https://raw.githubusercontent.com/crate/cratedb-examples/main/topic/machine-learning/llm-langchain/requirements.txt (line 9))
  Downloading distributed-2022.5.1-py3-none-any.whl (871 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 871.6/871.6 kB 54.8 MB/s eta 0:00:00
Collecting dask (from fsspec[abfs,dask,gcs,git,github,http,s3,smb]<2024.3->pueblo[cli,fileio,nlp]>=0.0.7->-r https://raw.githubusercontent.com/crate/cratedb-examples/main/topic/machine-learning/llm-langchain/requirements.txt (line 9))
  Downloading dask-2022.5.1-py3-none-any.whl (1.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 57.8 MB/s eta 0:00:00
Collecting distributed (from fsspec[abfs,dask,gcs,git,github,http,s3,smb]<2024.3->pueblo[cli,fileio,nlp]>=0.0.7->-r https://raw.githubusercontent.com/crate/cratedb-examples/main/topic/machine-learning/llm-langchain/requirements.txt (line 9))
  Downloading distributed-2022.5.0-py3-none-any.whl (856 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 856.7/856.7 kB 60.6 MB/s eta 0:00:00
Collecting dask (from fsspec[abfs,dask,gcs,git,github,http,s3,smb]<2024.3->pueblo[cli,fileio,nlp]>=0.0.7->-r https://raw.githubusercontent.com/crate/cratedb-examples/main/topic/machine-learning/llm-langchain/requirements.txt (line 9))
  Downloading dask-2022.5.0-py3-none-any.whl (1.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 59.9 MB/s eta 0:00:00
Collecting distributed (from fsspec[abfs,dask,gcs,git,github,http,s3,smb]<2024.3->pueblo[cli,fileio,nlp]>=0.0.7->-r https://raw.githubusercontent.com/crate/cratedb-examples/main/topic/machine-learning/llm-langchain/requirements.txt (line 9))
  Downloading distributed-2022.4.2-py3-none-any.whl (856 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 856.7/856.7 kB 61.6 MB/s eta 0:00:00
Collecting dask (from fsspec[abfs,dask,gcs,git,github,http,s3,smb]<2024.3->pueblo[cli,fileio,nlp]>=0.0.7->-r https://raw.githubusercontent.com/crate/cratedb-examples/main/topic/machine-learning/llm-langchain/requirements.txt (line 9))
  Downloading dask-2022.4.2-py3-none-any.whl (1.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 61.3 MB/s eta 0:00:00
Collecting distributed (from fsspec[abfs,dask,gcs,git,github,http,s3,smb]<2024.3->pueblo[cli,fileio,nlp]>=0.0.7->-r https://raw.githubusercontent.com/crate/cratedb-examples/main/topic/machine-learning/llm-langchain/requirements.txt (line 9))
  Downloading distributed-2022.4.1-py3-none-any.whl (855 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 855.5/855.5 kB 59.3 MB/s eta 0:00:00
Collecting dask (from fsspec[abfs,dask,gcs,git,github,http,s3,smb]<2024.3->pueblo[cli,fileio,nlp]>=0.0.7->-r https://raw.githubusercontent.com/crate/cratedb-examples/main/topic/machine-learning/llm-langchain/requirements.txt (line 9))
  Downloading dask-2022.4.1-py3-none-any.whl (1.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 60.5 MB/s eta 0:00:00
Collecting distributed (from fsspec[abfs,dask,gcs,git,github,http,s3,smb]<2024.3->pueblo[cli,fileio,nlp]>=0.0.7->-r https://raw.githubusercontent.com/crate/cratedb-examples/main/topic/machine-learning/llm-langchain/requirements.txt (line 9))
  Downloading distributed-2022.4.0-py3-none-any.whl (853 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 853.8/853.8 kB 52.3 MB/s eta 0:00:00
Collecting dask (from fsspec[abfs,dask,gcs,git,github,http,s3,smb]<2024.3->pueblo[cli,fileio,nlp]>=0.0.7->-r https://raw.githubusercontent.com/crate/cratedb-examples/main/topic/machine-learning/llm-langchain/requirements.txt (line 9))
  Downloading dask-2022.4.0-py3-none-any.whl (1.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 56.7 MB/s eta 0:00:00
Collecting distributed (from fsspec[abfs,dask,gcs,git,github,http,s3,smb]<2024.3->pueblo[cli,fileio,nlp]>=0.0.7->-r https://raw.githubusercontent.com/crate/cratedb-examples/main/topic/machine-learning/llm-langchain/requirements.txt (line 9))
  Downloading distributed-2022.3.0-py3-none-any.whl (851 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 851.2/851.2 kB 54.7 MB/s eta 0:00:00
Collecting dask (from fsspec[abfs,dask,gcs,git,github,http,s3,smb]<2024.3->pueblo[cli,fileio,nlp]>=0.0.7->-r https://raw.githubusercontent.com/crate/cratedb-examples/main/topic/machine-learning/llm-langchain/requirements.txt (line 9))
  Downloading dask-2022.3.0-py3-none-any.whl (1.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 60.3 MB/s eta 0:00:00
Requirement already satisfied: httplib2>=0.9.1 in /usr/local/lib/python3.10/dist-packages (from oauth2client>=1.5.2->gcsfs->fsspec[abfs,dask,gcs,git,github,http,s3,smb]<2024.3->pueblo[cli,fileio,nlp]>=0.0.7->-r https://raw.githubusercontent.com/crate/cratedb-examples/main/topic/machine-learning/llm-langchain/requirements.txt (line 9)) (0.22.0)
  Downloading dask-2023.8.1-py3-none-any.whl (1.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 7.8 MB/s eta 0:00:00
INFO: pip is looking at multiple versions of distributed to determine which version is compatible with other requirements. This could take a while.
INFO: pip is looking at multiple versions of distributed to determine which version is compatible with other requirements. This could take a while.
INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. See https://pip.pypa.io/warnings/backtracking for guidance. If you want to abort this run, press Ctrl + C.
ERROR: Operation cancelled by user

Thoughts

It looks like it is clearly related to the Python 3.11.9 vs. Dask hiccup from last week.

References

Maybe related; I will execute this first; maybe, it will yield some insights.

@hammerhead also provided a fix already.

CI: Collection of flukes

About

This ticket collects all sorts of flukes and anomalies observed when running validation jobs on CI.

Testcontainers for Python

About

  • Similar to GH-54, we would like to demonstrate CrateDB with testcontainers-python, a Β»TestcontainersΒ« implementation for Python.
  • I've started a corresponding implementation on behalf of the LorryStream project the other day, and already reused it at the CrateDB Retention project. It needs to be reviewed and submitted to the upstream repository as a contribution before further elaborating on it.

Check tutorial about Kafka, Flink and CrateDB with the vanilla PostgreSQL JDBC Driver

Hi there,

because @proddata just asked about the state of the CrateDB JDBC Driver, I would like to put down that note here.

When refreshing the resources [1,2] the other day, building upon [3] by @kovrus, @carlotas19 converged the resources into [4] (cheers!). While supporting that, I also created some other accompanying resources at [5], which add some infrastructure and documentation to run the example in a reproducible manner out of the box.

My memories about the details are a bit faded, but I remember that the example did not work with the vanilla PostgreSQL JDBC Driver. Apparently, I already took a little note about it at the place where you would be able to switch the driver 1, alongside 2:

Currently, org.postgresql:postgresql croaks with

org.postgresql.util.PSQLException: No hstore extension installed.

We should get back to this and use latest software versions of the corresponding components when testing again, this time specifically focused on shedding some more light onto the problem discovered here.

With kind regards,
Andreas.

/cc @hammerhead

[1] https://www.ververica.com/blog/smart-systems-iot-use-case-open-source-kafka-flink-cratedb
[2] https://crate.io/resources/white-papers/lp-wp-flink-kafka-cratedb
[3] https://github.com/crate/cratedb-flink-jobs
[4] https://dev.to/crate/build-a-data-ingestion-pipeline-using-kafka-flink-and-cratedb-1h5o
[5] https://github.com/crate/cratedb-examples/tree/main/spikes/kafka-flink

Footnotes

  1. build.gradle#L53-L58 ↩

  2. TaxiRidesStreamingJob.java#L54-L59 ↩

AutoML: Test harness trips when installing `catboost` on macOS with Python 3.11, works with Python 3.10

About

When invoking the test cases on the automl folder,

git clone https://github.com/crate/cratedb-examples
cd cratedb-examples
pip install -r requirements.txt
ngr test topic/machine-learning/automl

the process fails at installation time already.

ERROR: No matching distribution found for catboost<1.2,>=0.23.2; platform_system == "Darwin" and extra == "models"

References

May be relevant.

Evaluate mainlining of `cratedb_toolkit.sqlalchemy.patch.patch_inspector`

About

@ckurze and @amotl discovered a case where the flaw is reproducible, that SQLAlchemy's introspection/reflection machinery is not able to pick up the schema name correctly.

Details

832c3bc fixes it. Indeed, we apparently need a runtime fix here, when using a non-standard schema (doc vs. testdrive).

# TODO: Bring this into the `crate-python` driver.
from cratedb_toolkit.sqlalchemy.patch import patch_inspector
patch_inspector()

Originally posted by @amotl in #136 (comment)

Testcontainers for Java: Backlog

Hi.

At GH-54, we are bringing in some ready-to-run code examples for basic use of Testcontainers for Java with CrateDB. There are some items which should be addressed within subsequent iterations.

With kind regards,
Andreas.

Startup

  • [Test worker] WARN tc.crate:5.2 - Reuse was requested but the environment does not support the reuse of containers
    To enable reuse of containers, you must set 'testcontainers.reuse.enable=true' in a file located at /Users/amo/.testcontainers.properties
    

Database provisioning

Other test frameworks

Testcontainers for Java also provides integrations for other test frameworks. Currently, all test cases are based on JUnit 4.

Footnotes

  1. See also https://github.com/crate/crate-jdbc/issues/377. ↩

Backlog for Python

About

Coming from a few recent patches, this ticket collects and/or summarizes a few backlog items.

by-language/python-sqlalchemy

  • Some adjustments may be added to bring all insert_*.py programs into the same shape.
  • Other than demonstrating only write operations, also demonstrate read operations?
  • Demonstrate applicability on behalf of relevant Jupyter Notebook(s).

References

CI: Npgsql test matrix is incorrect

Problem

It is unexpected that GH-169 is green, because Npgsql 8.0 does not support .NET 6 any longer. So, why doesn't it fail?

Observation

This test matrix slot, which should invoke .NET 5.0.x, apparently also uses .NET 8, which is wrong.

dotnet test --framework=net8.0 --collect:"XPlat Code Coverage"

-- https://github.com/crate/cratedb-examples/actions/runs/7033351740/job/19138983476?pr=169#step:7:136

Conclusion

Test matrix slot value propagation is flawed somewhere and needs to be fixed.

Improve tutorials about Apache Superset

  • Carried over from #217.

Backlog

  • The tutorial Set up an Apache Superset development sandbox with CrateDB should be updated on a few spots: CrateDB version used should be latest, and CSRF_TOKEN as well as HTTP session is no longer needed.
  • There should also be a separate user-oriented tutorial on the community forum, where all developer-like steps like git clone are omitted. Most probably, a user-focused tutorial should be based on Docker Compose for running both Superset and CrateDB, but also not like the upstream documentation Installing Superset Locally Using Docker Compose is doing it, because it also uses a git clone inside. This new tutorial should be the primary resource to advertise when educating users about this integration, and it should also outline how to connect to CrateDB Cloud.
  • Monitor and add support for Python 3.12, when available.

AutoML: CI trips with `CellTimeoutError`

Originally coming from an issue that mixed things up, GH-170, let's get things straight here.

Problem

CI on the AutoML job occasionally trips like this, failing the CI run.

E           nbclient.exceptions.CellTimeoutError: A cell timed out while it was being executed, after 300 seconds.
E           The message was: Cell execution timed out.
E           Here is a preview of the cell contents:
E           -------------------
E           s = setup(data, fh=15, target="total_sales", index="month", log_experiment=True)
E           -------------------

/opt/hostedtoolcache/Python/3.11.6/x64/lib/python3.11/site-packages/nbclient/client.py:801: CellTimeoutError

-- #170 (comment)

Outlook

@andnig suggested at #170 (comment), that maybe the PYTEST_CURRENT_TEST environment variable, and what it is guarding, is not being evaluated correctly.

However, at #170 (comment), we have been able to confirm it works well.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.