Code Monkey home page Code Monkey logo

datalens's Introduction

DataLens

DataLens is a modern business intelligence and data visualization system. It was developed and extensively used as a primary BI tool in Yandex and is also available as a part of Yandex Cloud platform. See also our roadmap and community in telegram.

Getting started

Installing Docker

DataLens requires Docker to be installed. Follow these instructions depending on the platform you use:

Running containers

Use the following command to start DataLens containers:

git clone https://github.com/datalens-tech/datalens && cd datalens

HC=1 docker compose up

# or with an external metadata database
METADATA_POSTGRES_DSN_LIST="postgres://{user}:{password}@{host}:{port}/{database}" HC=1 docker compose up

This command will launch all containers required to run DataLens and UI will be available on http://localhost:8080

If you want to use a different port (e.g. 8081), you can set it using the UI_PORT env variable:

UI_PORT=8081 docker compose up
Notice on Highcharts usage
  Highcharts is a proprietary commercial product. If you enable highcharts in your DataLens instance (with `HC=1`` variable), you should comply with Highcharts license (https://github.com/highcharts/highcharts/blob/master/license.txt).

  When Highcharts is disabled in DataLens, we use D3.js instead. However, currently only few visualization types are compatible with D3.js. We are actively working on adding D3 support to additional visualizations and are going to completely replace Highcharts with D3 in DataLens.

How to update

Just pull the new docker-compose.yml and restart.

docker compose down
git pull
docker compose up

All your user settings will be stored in the metadata folder.

Parts of the project

DataLens consists of the three main parts:

  • UI is a SPA application with corresponding Node.js part. It provides user interface, proxies requests from users to backend services and also applies some light data postprocessing for charts.
  • Backend is a set of Python applications and libraries. It is responsible for connecting to data sources, generating queries for them and post-processing the data (including formula calculations). The result of this work is an abstract dataset that can be used in UI for charts data request.
  • UnitedStorage (US) is a Node.js service that uses PostgreSQL to store metadata and configuration of all DataLens objects.

What's already available

We are releasing DataLens with first minimal set of available connectors (clickhouse, clickhouse over ytsaurus and postgresql) as well as other core functionality such as data processing engine and user interface. However, to kick off this project in a reasonable timeframe we have chosen to drop some of the features out of the first release: this version does not contain middleware and components for user sessions, object ACLs and multitenancy (although code contains entry-points for such extensions). We are planning to add missing features based on our understanding of community priorities and your feedback.

Cloud Providers

Below is a list of cloud providers offering DataLens as a service:

  1. Yandex Cloud platform
  2. DoubleCloud platform

Authentication (beta)

DataLens supports authentication via Zitadel identity platform.

Use the following command to initialize Zitadel (you need to do this only once):

bash init.sh

Notice the updated .env file after initialization: it contains Zitadel access keys. Keep that file safe and do not share it's contents.

After initialization you can start DataLens containers using special version of docker compose file:

HC=1 docker compose -f docker-compose.zitadel.yml up

After that you can login to DataLens on http://localhost:8080 using the default user credentials:

Username Password
[email protected] Password1!

You can use the same credentials to configure Zitadel and add new users using Zitadel control panel at http://localhost:8085/. Don't forget to login there at least once to change the default password.

FAQ

Where does DataLens store it's metadata?

We use the metadata folder to store PostgreSQL data. If you want to start over, you can delete this folder: it will be recreated with demo objects on the next start of the datalens-us container.

I use the METADATA_POSTGRES_DSN_LIST param for external metadata database and the app doesn't start. What could be the reason?

We use some PostgresSQL extensions for the metadata database and the application checks them at startup and tries to install them if they haven't been already installed. Check your database user's rights for installing extensions by trying to install them manually:

CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE EXTENSION IF NOT EXISTS btree_gin;
CREATE EXTENSION IF NOT EXISTS btree_gist;
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

If this attempt is unsuccessful, try to install dependencies by database admin and add param METADATA_SKIP_INSTALL_DB_EXTENSIONS=1 on startup, this parameter allows the app to skip installing extensions.

If you're using managed database, it's also possible that extensions for your database cluster are controlled by external system and could be changed only using it's UI or API. In such case, consult with documentation for managed database service which you're using. Don't forget to add METADATA_SKIP_INSTALL_DB_EXTENSIONS=1 after installing extensions this way.

My PostgresSQL cluster has multiple hosts, how can I specify them in METADATA_POSTGRES_DSN_LIST param?

You can write all cluster hosts separated by commas:

METADATA_POSTGRES_DSN_LIST="postgres://{user}:{password}@{host_1}:{port}/{database},postgres://{user}:{password}@{host_2}:{port}/{database},postgres://{user}:{password}@{host_3}:{port}/{database}" ...

How can I specify custom certificate for connecting to metadata database?

You can add additional certificates to the database in ./certs/root.crt, they will be used to connect to the database from the datalens-us container.

If datalens-us container does not start even though you provided correct certificates, try to change METADATA_POSTGRES_DSN_LIST like this: METADATA_POSTGRES_DSN_LIST="postgres://{user}:{password}@{host}:{port}/{database}?sslmode=verify-full&sslrootcert=/certs/root.crt"

Why do i see two compose files: docker-compose.yml & docker-compose-dev.yml?

docker-compose-dev.yml is a special compose file that is needed only for development purposes. When you run DataLens in production mode, you always need to use docker-compose.yml. The docker-compose up command uses it by default.

What are the minimum system requirements?

  • datalens-ui - 512 MB RAM

  • datalens-data-api - 1 GB RAM

  • datalens-control-api - 512 MB RAM

  • datalens-us - 512 MB RAM

  • datalens-pg-compeng - 1 GB RAM

  • datalens-pg-us - 512 MB RAM

Summary:

  • RAM - 4 GB

  • CPU - 2 CORES

This is minimal basic system requirements for OpenSource DataLens installation. Аctual consumption of VM resources depends on the complexity of requests to connections, connections types, the number of users and processing speed at the source level

datalens's People

Contributors

berancad avatar bt4r9 avatar gaploid avatar github-actions[bot] avatar goshander avatar imsitnikov avatar konstantanxiety avatar mcpn avatar melikhov-dev avatar ovsds avatar pashkov-v avatar paveldubinin avatar resure avatar stankis avatar vallbull avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

datalens's Issues

Trino connector

Hello there.

Do you have plans to add Trino connector or custom JDBC connector for your BI-system?

Error certificate when create connetction on "Managed Service for PostgreSQL" yandex cloud

When I create a new connection to the "Managed Service for PostgreSQL" database for use in collections, I select the root certificate; it is not used for the connection. The following error appears.

connection to server at "***********.mdb.yandexcloud.net" (158.XX.XX.XX), port 6432 failed: could not open certificate file "/root/.postgresql/postgresql.crt": Permission denied

Despite the fact that the US service is connected to the same database using a similar connection string METADATA_POSTGRES_DSN_LIST="postgres://{user}:{password}@{host}:{port}/{database}?sslmode=verify-full&sslrootcert=/certs/ root.crt"

I tried to upload the root certificate into the container, changed the rights to the root directory (for testing), this helps to find the certificate, but then I get another error:

connection to server at "***********.mdb.yandexcloud.net" (158.XX.XX.XX), port 6432 failed: could not open certificate file "/root/.postgresql/postgresql.key"

Add MySQL connector

I see that you're recently added Oracle connector, as well as some other ones. Would be nice to have MySQL connector in open-source version.

Thanks in advance!

Please add map-charts

Guys, a very big thanks for sharing DataLens! It would be nice to have map-charts in open source version.

Access is denied.

C:\Windows\system32>git clone https://github.com/datalens-tech/datalens
Cloning into 'datalens'...
remote: Enumerating objects: 35, done.
remote: Counting objects: 100% (35/35), done.
remote: Compressing objects: 100% (27/27), done.

Receiving objects: 100% (35/35), 13.48 KiB | 299.00 KiB/s, done.
Resolving deltas: 100% (15/15), done.

C:\Windows\system32>cd datalens

C:\Windows\System32\datalens>set HC=1

C:\Windows\System32\datalens>docker compose up
[+] Running 1/0
✔ Network datalens_default Created 0.0s

  • Container datalens-pg-us-1 Creating 0.0s
  • Container datalens-pg-compeng-1 Creating 0.0s
  • Container datalens-postgres-connection-1 Creating 0.0s
    Error response from daemon: mkdir C:\Windows\System32\datalens\us-data: Access is denied.

Add Metrica/AppMetrica connectors

Enable Metrica/AppMetrica connectors

Open questions:

  • Token field in connection & listing counters/applications
    • allow setting up own OAuth application via configs & using Metrica API to get token
    • OR allow only manual mode for these fields
  • Whether to include a demo dashboard

Vulnerabilities in Docker images

Hello!

I decided to scan docker images, that have been creted after docker-compose up -d command. So, Snyk've found multiple vulnerabilities in datalens images. In attachment you can find full scan results. The great concern is about critical vulnerabilities with RCE. Are they affect to current version of datalnes? Is there any plans to update old components in the future?

Thank you!
datalens-us_0.96.0.txt
datalens-ui_0.795.0.txt
datalens-data-api_0.2037.0.txt
datalens-control-api_0.2037.0.txt

Add Snowflake connector

Hey there,

I’d love to test DataLens with Snowflake. Any plans adding snowflake support?

White sheet when creating a connection

After installation I can't create connections. When I click "Create connection" I only see a white sheet.

In the cmopose file all parameters are standard.

Add Bitrix24 connector

Enable Bitrix24 connector

Prerequisites:

  • Enable RQE (remote-query-executor) or allow direct connection to Bitrix
  • Enable caches (optional; without RQE caches Bitrix connector can be very slow)

Difficult to work with large tables

I defined a dataset on a table with 21 billion records.
When I tried to create my first chart, it did this:

![Screenshot_20230926_190229](https://github.com/datalens-tech/datalens/assets/18581488/d81e4044-9b75-4eff-8107-98b6f15dd4ce)

Running this query on ClickHouse:

SELECT `t1`.`created_at` AS `res_0` FROM `default`.`twitter` AS `t1` GROUP BY `res_0` LIMIT 1000001

The query will process nicely, but the aggregation is performed by every unique time point, and it requires 94.5 seconds to finish on a service in ClickHouse Cloud.

Add file connector

Everything needed to enable file connector as a part of DataLens OSS is present across repositories.

Steps required:

  • Add relevant package dependencies
  • Make relevant changes to env and configs
  • Include services necessary for file-uploader to work:
    • S3 (file storage)
    • Redis (task queue & meta storage)
    • ClickHouse (data processing & retrieval)
  • Implement, include and configure file-uploader services:
    • file-uploader-api
    • file-uploader-worker
  • Make new services optional (maybe)

The calendar map chart

5e28f2bf64852b76f9d63c5c_Screen Shot 2020-01-22 at 22 09 39 I believe there is a need for enterprise users to be able to create a large calendar for the entire year for planning in 2 clicks. This is essentially some sort of implementation of a Gantt chart. However, what is needed here is the ability to pan the rows and scrolling dates by month and by day for the entire current year and/or other periods. In addition, you need the simple ability to place multiple time periods in each row. Possibly overlapping time periods. This chart can be used for easy presentation and scheduling of vacations, training, sick leave, business trips and tracking of overlapping and overlapping events. An example, not 100% reflective of the intended design, is provided below.

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=ru

I am loosing all my preferences after restarted docker container

After executing command - docker compose down&&docker compose up -d I am loosing all my settings. I tried to attach volumes to containers pg-compeng, pg-us like this:

   volumes:
        - pg-compeng:/var/lib/postgresql/data
    volumes:
        - pg-us:/var/lib/postgresql/data
 volumes:
  pg-compeng:
  pg-us:       

Can anyone help me make the docker-compose file resistant to reboots?

Ошибка после установки (Error after instalation)

Привет.
После установки ошибка.
Нет коллекций и воркбуков.

ECONNABORTED
{
"title": "Ошибка",
"description": "Превышено время ожидания ответа"
}

изображение


[paulmer@docker-01 ~]$ cat /etc/os-release
NAME="Fedora Linux"
VERSION="38 (Thirty Eight)"
ID=fedora
VERSION_ID=38
VERSION_CODENAME=""
PLATFORM_ID="platform:f38"
PRETTY_NAME="Fedora Linux 38 (Thirty Eight)"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:38"
DEFAULT_HOSTNAME="fedora"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f38/system-administrators-guide/"
SUPPORT_URL="https://ask.fedoraproject.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=38
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=38
SUPPORT_END=2024-05-14
[paulmer@docker-01 ~]$

Cannot connect to metadata PostgreSQL with TLS/SSL

When trying to connect to metadata PostgreSQL DB with TLS/SSL, the following error appears when running scripts/preflight.sh (I was trying to connect to Yandex Managed Service for PostgreSQL):

error: odyssey: c3577ff1eb926: SSL is required

Although the correct root certificate is present in /certs/root.crt, it does not help.

It seems like here certificate path is not passed to initPosgresDB() (although I could not find the signature of this function since gravity-ui/postgreskit is probably a private repo).

The solution that worked for me was to add the following parameters to DSN:
postgres://{user}:{password}@{host}:{port}/{database}?sslmode=verify-full&sslrootcert=/certs/root.crt.

I think that this solution may be added to README.

Error in creating a PostgreSQL connection

I am facing issue while connecting my database(PostgreSQL). I am running this (datalens project) on my wsl2. How can I able to connect my PostgreSQL database to it ? Should I run PostgreSQL server on wsl2 as well? Or I can connect this via remote connection to my Windows machine already running PostgreSQL server on it.
I've tried both and getting errors, still unable to connect to DB :(

Backup/restore and upgrade instructions

Please add instructions for backing up and restoring datalens configuration data (connections, datasets, charts, dashboards). For example, I want to move datalens to a new server. Please also add update instructions. I want to update DataLens to a new version and am afraid of losing my data.

Нет возможности добавить линейный диаграмму

Добрый день! Пытаюсь построить линейную диаграмму в DataLens: при выборе графика - такого нет. Прикладываю скрин экрана:
image

Захожу сюда:
image

У меня ошибка:
image

Подскажите, может я что то делаю не то.

Failing to pull docker containers

C:\Users\spull\ideaprojects\datalens>docker compose up
[+] Running 4/4
 ✘ control-api Error                                                                                               0.6s
 ✘ datalens Error                                                                                                  0.6s
 ✘ data-api Error                                                                                                  0.6s
 ✘ us Error                                                                                                        0.6s
Error response from daemon: Head "https://ghcr.io/v2/datalens-tech/datalens-data-api/manifests/0.2037.0": denied: denied

Сould not open certificate file (PostgreSQL)

When connecting to Yandex Cloud Managed Service for PostgreSQL, it gives error. DataLens is deployed locally, loading a certificate file via Advanced connection settings does not help.

Through dbeaver I can connect to the database from this machine.

OS: Linux version 6.2.0-34-generic (buildd@lcy02-amd64-025) (x86_64-linux-gnu-gcc-12 (Ubuntu 12.3.0-1ubuntu1~23.04) 12.3.0, GNU ld (GNU Binutils for Ubuntu) 2.40)
CPU: AMD Ryzen 5 3500U with Radeon Vega Mobile Gfx

Error text:
Database error. ERR.DS_API.DB { { "data": { { "code": "ERR.DS_API.DB", "details": { "db_message": "connection to server at "..." (...), port 6432 failed: could not open certificate file \"/root/.postgresql/postgresql.crt\": Permission denied\n"" }, "message": "Database error." "debug": { "db_message": "connection to server at "..." (...), port 6432 failed: could not open certificate file \"/root/.postgresql/postgresql.crt\": Permission denied\n". "query": "select 1" } } }

Please add MS SQL connection

In cloud version MS SQL is available, but not in open source. We have MS SQL in private network - the only way to connect is to use DataLens self-hosted.

It would be great if MS SQL connector become public.

Great thanks to your team.

Publisher/Presentation tools

добрый день, была бы полезна функциональность печати дашбоардов и экспорта как pdf

How to add custom chart plugin?

Hi, datalens community!
Is it possible to make a custom Datalens plugin chart?
Any guides or information will be appreciated.
Best regards!

Non clear error messages

Hi there!
Are there any docs (besides those that are in Yandex.Cloud docs) covering the creation of datasets and possible error codes? I'm trying to create a dataset with an SQL query for a connected Postgres connection, but I'm stuck with this:

Relation requires at least one condition
ERR.DS_API.FORMULA
{
    "title": "ERR.DS_API.FORMULA",
    "description": "Relation requires at least one condition"
}
ERR.DS_API.VALIDATION.ERROR
[
    {
        "id": "340a4261-635d-11ee-85cb-db624a6af76d",
        "type": "avatar_relation",
        "errors": [
            {
                "level": "error",
                "code": "ERR.DS_API",
                "message": "Relation requires at least one condition",
                "details": {}
            }
        ]
    }
]

Unfortunately, nothing was found on the Internet, except this page, but it doesn't help :(

The query works independently and it has conditions.

Thanks!

Start containers using non-privileged users instead of root

Currently our containers are running entry-point container processes using root user, which is far from ideal. We should switch to non-privileged users in all our containers:

  • data-api
  • control-api
  • us
  • ui

This non-root users should have minimum possible access to the files in the container.

No connection to local databases

I have PostgreSQL and Clickhouse databases running on a server. Both are accessible via DBeaver or via http
But when I try to make a connection in Datalens, connection attempt fails, with the error like this(for clickhouse):
Data source refused connection. ERR.DS_API.DB.SOURCE_CONNECT_ERROR { "data": { "details": { "db_message": "HTTPSConnectionPool(host='192.168.1.224', port=8123): Max retries exceeded with url: /?query_id=e7c0bdde-b9f7-11ee-bdd0-0242ac120006&database=system&join_use_nulls=1&distributed_ddl_task_timeout=280&send_progress_in_http_headers=0&output_format_json_quote_denormals=1 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f43f6287130>, 'Connection to 192.168.1.224 timed out. (connect timeout=1.0)'))" }, "message": "Data source refused connection.", "debug": { "db_message": "HTTPSConnectionPool(host='192.168.1.224', port=8123): Max retries exceeded with url: /?query_id=e7c0bdde-b9f7-11ee-bdd0-0242ac120006&database=system&join_use_nulls=1&distributed_ddl_task_timeout=280&send_progress_in_http_headers=0&output_format_json_quote_denormals=1 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f43f6287130>, 'Connection to 192.168.1.224 timed out. (connect timeout=1.0)'))", "query": "select 1" }, "code": "ERR.DS_API.DB.SOURCE_CONNECT_ERROR" } }
The same problem appears while running Datalens container on the server or on the other machine in network.

Could you please tell me, what may be the issue?

No XLSX option in table export

Please advice how to eneble XLSX export option. CSV export not working well also. Any selected encoding shows incorrect results when open in Excel.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.