Code Monkey home page Code Monkey logo

pykx's Introduction

PyKX

Introduction

PyKX is a Python first interface to the worlds fastest time-series database kdb+ and it's underlying vector programming language q. PyKX takes a Python first approach to integrating q/kdb+ with Python following 10+ years of integrations between these two languages. Fundamentally it provides users with the ability to efficiently query and analyze huge amounts of in-memory and on-disk time-series data.

This interface exposes q as a domain-specific language (DSL) embedded within Python, taking the approach that q should principally be used for data processing and management of databases. This approach does not diminish the ability for users familiar with q or those wishing to learn more about it from making the most of advanced analytics and database management functionality but rather empowers those who want to make use of the power of kdb+/q who lack this expertise to get up and running fast.

PyKX supports three principal use cases:

  • It allows users to store, query, manipulate and use q objects within a Python process.
  • It allows users to query external q processes via an IPC interface.
  • It allows users to embed Python functionality within a native q session using it's under q functionality.

Users wishing to install the library can do so following the instructions here.

Once you have the library installed you can get up and running with PyKX following the quickstart guide here.

What is q/kdb+?

Mentioned throughout the documentation q and kdb+ are respectively a highly efficient vector programming language and highly optimised time-series database used to analyse streaming, real-time and historical data. Used throughout the financial sector for 25+ years this technology has been a cornerstone of modern financial markets providing a storage mechanism for historical market data and tooling to make the analysis of this vast data performant.

Kdb+ is a high-performance column-oriented database designed to process and store large amounts of data. Commonly accessed data is available in RAM which makes it faster to access than disk stored data. Operating with temporal data types as a first class entity the use of q and it's query language qsql against this database creates a highly performant time-series analysis tool.

q is the vector programming language which is used for all interactions with kdb+ databases and which is known both for its speed and expressiveness.

For more information on using q/kdb+ and getting started with see the following links:

Installation

Installing PyKX using pip

Ensure you have a recent version of pip:

pip install --upgrade pip

Then install the latest version of PyKX with the following command:

pip install pykx

To install a specific version of PyKX run the following command replacing <INSERT_VERSION> with a specific released semver version of the interface

pip install pykx==<INSERT_VERSION>

Warning: Python packages should typically be installed in a virtual environment. This can be done with the venv package from the standard library.

PyKX License access and enablement

Installation of PyKX via pip provides users with access to the library with limited functional scope, full details of these limitations can be found here. To access the full functionality of PyKX you must first download and install a kdb+ license, this can be achieved either through use of a personal evaluation license or receipt of a commercial license.

Personal Evaluation License

The following steps outline the process by which a user can gain access to an install a kdb Insights license which provides access to PyKX

  1. Visit https://kx.com/kdb-insights-personal-edition-license-download/ and fill in the attached form following the instructions provided.
  2. On receipt of an email from KX providing access to your license download this file and save to a secure location on your computer.
  3. Set an environment variable on your computer pointing to the folder containing the license file (instructions for setting environment variables on PyKX supported operating systems can be found here.
    • Variable Name: QLIC
    • Variable Value: /user/path/to/folder

Commercial Evaluation License

The following steps outline the process by which a user can gain access to an install a kdb Insights license which provides access to PyKX

  1. Visit https://kx.com/kdb-insights-commercial-evaluation-license-download/ and fill in the attached form following the instructions provided.
  2. On receipt of an email from KX providing access to your license download this file and save to a secure location on your computer.
  3. Set an environment variable on your computer pointing to the folder containing the license file (instructions for setting environment variables on PyKX supported operating systems can be found here.
    • Variable Name: QLIC
    • Variable Value: /user/path/to/folder

Note: PyKX will not operate with a vanilla or legacy kdb+ license which does not have access to specific feature flags embedded within the license. In the absence of a license with appropriate feature flags PyKX will fail to initialise with full feature functionality.

Supported Environments

KX only officially supports versions of PyKX built by KX, i.e. versions of PyKX installed from wheel files. Support for user-built installations of PyKX (e.g. built from the source distribution) is only provided on a best-effort basis. Currently, PyKX provides wheels for the following environments:

  • Linux (manylinux_2_17_x86_64) with CPython 3.8-3.11
  • macOS (macosx_10_10_x86_64) with CPython 3.8-3.11
  • Windows (win_amd64) with CPython 3.8-3.11

Dependencies

Python Dependencies

PyKX depends on the following third-party Python packages:

  • pandas>=1.2, < 2.2.0
  • numpy~=1.22, <2.0; python_version<'3.11'
  • numpy~=1.23, <2.0; python_version=='3.11'
  • numpy~=1.26, <2.0; python_version=='3.12'
  • pytz>=2022.1
  • toml~=0.10.2

They are installed automatically by pip when PyKX is installed.

PyKX also has an optional Python dependency of pyarrow>=3.0.0, which can be included by installing the pyarrow extra, e.g. pip install pykx[pyarrow]

When using PyKX with KX Dashboards users will be required to install ast2json~=0.3 this can be installed using the dashboards extra, e.g. pip install pykx[dashboards]

When using PyKX Beta features users will be required to install dill>=0.2.0 this can be installed using the beta extra, e.g. pip install pykx[beta]

Warning: Trying to use the pa conversion methods of pykx.K objects or the pykx.toq.from_arrow method when PyArrow is not installed (or could not be imported without error) will raise a pykx.PyArrowUnavailable exception. pyarrow is supported Python 3.8-3.10 but remains in Beta for Python 3.11.

Optional Non-Python Dependencies

  • libssl for TLS on IPC connections.
  • libpthread on Linux/MacOS when using the PYKX_THREADING environment variable.

Windows Dependencies

To run q or PyKX on Windows, msvcr100.dll must be installed. It is included in the Microsoft Visual C++ 2010 Redistributable.

Building from source

Installing Dependencies

The full list of supported environments is detailed here. Installation of dependencies will vary on different platforms.

apt example:

apt-install python3 python3-venv build-essential python3-dev

yum example:

yum install python3 gcc gcc-c++ python3-devel.x86_64

Windows:

To install the above dependencies, you can run the w64_install.ps1 script as an administrator:

cd pykx
.\w64_install.ps1

Building

Using a Python virtual environment is recommended:

python3 -m venv pykx-dev
source pykx-dev/bin/activate

Build and install PyKX:

cd pykx
pip3 install -U '.[all]'

To run PyKX in licensed mode ensure to follow the steps to receive a Personal Evaluation License

Now you can run/test PyKX:

(pykx-dev) /data/pykx$ python
Python 3.10.6 (main, May 29 2023, 11:10:38) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pykx
>>> pykx.q('1+1')
pykx.LongAtom(pykx.q('2'))

Testing

Contributions to the project must pass a linting check:

pflake8

Contributions to the project must include tests. To run tests:

export PATH="$PATH:/location/of/your/q/l64" # q must be on PATH for tests
export QHOME=/location/of/your/q #q needs QHOME available
python -m pytest -vvv -n 0 --no-cov --junitxml=report.xml

PyKX License access and enablement

This work is dual licensed under Apache 2.0 and the Software License for q.so and users are required to abide by the terms of both licenses in their entirety.

Community Help

If you have any issues or questions you can post them to community.kx.com. Also available on Stack Overflow are the tags pykx and kdb.

Customer Support

pykx's People

Contributors

cmccarthy1 avatar devopskx avatar marcosvm13 avatar nbyrnekx avatar neutropolis avatar nipsn avatar rianoc-kx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pykx's Issues

Addition of isnull, isna, notnull, notna, idxmax, idxmin, kurt and sem functions


name: Addition of isnull, isna, notnull, notna, idxmax, idxmin, kurt and sem functions.
about: Missing pandas API functionality
title: 'Addition of isnull, isna, notnull, notna, idxmax, idxmin, kurt and sem functions.'
labels: ''
assignees: '@nipsn @tortolavivo23 @neutropolis @MiguelGomezC'


Is your feature request related to a problem? Please describe.
No

Describe the solution you'd like
An implementation of said functions.

Describe alternatives you've considered

Additional resource
Links to pandas documentation of said functions:

Passing a DataFrame with datetime64[us] to kdb errors

Describe the bug
Calling a KDB function over IPC and providing a DataFrame as an argument that contains a column of datetime64[us] results in an error: TypeError: ktype cannot be inferred from Numpy dtype datetime64[us]

I imagine datetime64[ns] is the standard pandas/numpy type, however we receive us when pulling data from a SQL database using turbodbc (which we're then trying to send to KDB), and converting between the two types feels unnecessary.

To Reproduce
q('myfunction', df)
where df has a column of datetime64[us]

Expected behavior
The df to be converted to a KDB table, with column of type timestamp.

Screenshots
N/A

Desktop (please complete the following information):

  • PyKx v2.3.0

Additional context
N/A

Thanks

Example of qsql.select to filter with Python Datetime

Is your feature request related to a problem? Please describe.
So far all the examples for qsql.select which use the "where" only filter on simple data types like string or integer. I cannot make it work for datetime and unfortunately no examples are provided. Many thanks in advance !

Describe the solution you'd like
A few examples which use qsql.select to filter data on datetime columns using "where" parameter.

Describe alternatives you've considered
I have tried a lot of possible way to do this using qsql.select, but all of them throw an exception.

Addition of skew, add_prefix, add_suffix, count, std functions


name: Addition of skew, add_prefix, add_suffix, count, std functions
about: Missing pandas API functionality
title: 'Addition of skew, add_prefix, add_suffix, count, std functions'
labels: ''
assignees: '@nipsn @marcosvm13 @neutropolis @MiguelGomezC'


Is your feature request related to a problem? Please describe.
No

Describe the solution you'd like
An implementation of said functions.

Describe alternatives you've considered

Additional resource
Links to pandas documentation of said functions:

Is there a way to suppress empty output (::) when using the magic %%q in Jupyter notebook?

Is your feature request related to a problem? Please describe.
When using the %%q magic with in jupyter notebook, it seems that the output from each line of code is printed, including ::. See, e.g., the example on the documentation. Is there a way to suppress empty output (::)?

1697212278056

Describe the solution you'd like
I expect either of the following (or add an option in the %%q magic):

  • all output of :: to be removed; or
  • only the output of the last line of code in a cell is printed with all results of previous lines removed (even if that line is not ended with ;).

Describe alternatives you've considered
I tried to add ; to each line of code, but still a :: is printed for each line of code ended with ;.

[Pandas API] Unexpected behaviour for max, min, prod and sum.

Describe the bug
The functions max, min, prod and sum produce a length error when invoked from a keyed table.

To Reproduce

>>> import pykx as kx
>>> t = kx.q('([a:1 2]b:3 4;c:5 6)')
>>> t
pykx.KeyedTable(pykx.q('
a| b c
-| ---
1| 3 5
2| 4 6
'))
>>> t.max()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/j/Github/hablapps/pykx/pykx-dev/lib/python3.9/site-packages/pykx/pandas_api/__init__.py", line 57, in return_va
l
    res = func(*args, **kwargs)
  File "/Users/j/Github/hablapps/pykx/pykx-dev/lib/python3.9/site-packages/pykx/pandas_api/pandas_meta.py", line 80, in inner
    return q('{[x; y] y!x}', res, cols)
  File "/Users/j/Github/hablapps/pykx/pykx-dev/lib/python3.9/site-packages/pykx/embedded_q.py", line 226, in __call__
    return factory(result, False)
  File "pykx/_wrappers.pyx", line 507, in pykx._wrappers._factory
  File "pykx/_wrappers.pyx", line 500, in pykx._wrappers.factory
pykx.exceptions.QError: length

Expected behavior
In my view, these methods should behave as follows:

>>> import pykx as kx
0 1 2 3 4 5 6 7 8 9
>>> t = kx.q('([a:1 2]b:3 4;c:5 6)')
>>> t
pykx.KeyedTable(pykx.q('
a| b c
-| ---
1| 3 5
2| 4 6
'))
>>> t.max()
pykx.Dictionary(pykx.q('
b| 4
c| 6
'))

Desktop (please complete the following information):

  • OS: [macOS Sonoma 14.0]
  • KDB+ banner information [KDB+ 4.0 2022.05.11 Copyright (C) 1993-2022 Kx Systems]
  • Repository version [0.1.dev21+gd28e93a.d20240117]

Additional context
This problem is originated in the preparse_computations function at the Pandas API https://github.com/KxSystems/pykx/blob/main/src/pykx/pandas_api/pandas_meta.py#L52-L70. Indeed, this function filters the keys from the keyed table but returns the original columns. Later, once the @convert_result decorator is invoked, it results in a length mismatch, since the number of columns is greater than the number of results. In fact, this problem should also affect all and any methods from the Pandas API, but I see that they can't be invoked since they are overwritten by the KeyedTable class https://github.com/KxSystems/pykx/blob/main/src/pykx/wrappers.py#L3291-L3295.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.