Code Monkey home page Code Monkey logo

tensorbase's Introduction

chat on Discord

Status of the Project

TensorBase hasn't been updated for a while. Thanks for all friends' concern and inquiries, we reply as follows:

TensorBase hopes the open source not become a copy game. TensorBase has a clear-cut opposition to fork communities, repeat wheels, or hack traffics for so-called reputations (like Github stars). After thoughts, we decided to temporarily leave the general data warehousing field.

Here, let's recap all the world's first of TensorBase:

  1. The world's first ClickHouse compatible open-source implementation.
  2. 2x faster write throughput than that of ClickHouse (based on our bug fixed Rust client, you can get ~1.7x speedup by our another simple concurrent script here).
  3. Faster query speed in the simple aggregation than that of ClickHouse(benchmarked against ClickHouse 2021.6).
  4. First no-LSM, write and read optimized storage layer proposed.
  5. First make "copy-free, lock-free, async-free, dyn-free" happened in an open-source DBMS's critical path.
  6. First DBMS running on the real-world RISC-V hardware.
  7. First top-performance whole-lifecycle JIT SQL query engine(not completely open sourced, but we release the initial prototype in the network which you can try, and more you can get the ideas from the blogs, presentations and videos.)...

For people looking for production level data warehouse solution, we still recommend ClickHouse. We wish that ClickHouse can learn from these work and evolve itself to better.

For people who want to learn how a database system can be built up, or how to apply modern Rust to the high performance field, or embed a lightweight data analysis system into your own big one. You can still try, ask or contribute to TensorBase. The committers are still around the community. We will help you in all kinds of interesting things pursued in the project by us and maybe you. We still maintain the project to look forward to meeting more database geniuses in this world, although no new feature will be added in the near future.

The core team of TensorBase has moved to another new type of domain-specific database. We are hiring!


What is TensorBase

TensorBase is a new big data warehousing with modern efforts.

TensorBase is building on top of Rust, Apache Arrow and Arrow DataFusion.

TensorBase hopes to change the status quo of bigdata system as follows:

  • low efficiency (in the name of 'scalable')
  • hard to use (for end users) and understand (for developers)
  • not evolving with modern infrastructures (OS, hardware, engineering...)

Features

  • Out-of-the-box to play ( get started just now )
  • Lighting fast architectural performance in Rust ( real-world benchmarks )
  • Modern redesigned columnar storage
  • Top performance network transport server
  • ClickHouse compatible syntax
  • Green installation with DBA-Free ops
  • Reliability and high availability (WIP)
  • Cluster (WIP)
  • Cloud-Native Adaptation (WIP)
  • Arrow dataLake (...)

Architecture (in 10,000 meters altitude)

arch_base

Quick Start

play_out_of_the_box

Benchmarks

TensorBase is lighting fast. TensorBase has shown better performance than that of ClickHouse in simple aggregation query on 1.47-billion rows NYC Taxi Dataset.

TensorBase has enabled full workflow for TPC-H benchmarks from data ingestion to query.

More detail about all benchmarks seen in benchmarks.

Roadmap

Community Newsletters

Working Groups

Working Group - Engineering

This is a wg for engineering related topics, like codes or features.

Working Group - Database

This is a higher kind wg for database related topics, like ideas from papers.

Join these working groups on the Discussions or on Discord server.

Communications

Wechat group or other more are on community

Contributing

We have a great contributing guide in the Contributing.

Documents (WIP)

More documents will be prepared soon.

Read the Documents.

License

TensorBase is distributed under the terms of the Apache License (Version 2.0), which is a commercial-friendly open source license.

It is greatly appreciated that,

  • you could give this project a star, if you think these got from TensorBase are helpful.
  • you could indicate yourself in Who is Using TensorBase, if you are using TensorBase in any project, product or service.
  • you could contribute your changes back to TensorBase, if you want your changes could be helpful for more people.

Your encouragements and helps can make more people realize the value of the project, and motivate the developers and contributors of TensorBase to move forward.

See LICENSE for details.

tensorbase's People

Contributors

abhikjain360 avatar asloan7 avatar chenyukang avatar dallasc avatar dependabot[bot] avatar edenshin avatar fandahao17 avatar frank-king avatar grapebaba avatar jinmingjian avatar kuan-li avatar leomikezee avatar luluai-cn avatar meijies avatar nautaa avatar pandaplusplus avatar pymongo avatar tianjiqx avatar ygf11 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tensorbase's Issues

Status of this project?

What is the status of this project? I only see an initial commit m0 and hardly any coding changes afterwards. Is this project terminated?

build failed cause by wrong branch name

error: failed to get `baselog` as a dependency of package `meta v0.1.0 (/Users/kaichen/Documents/projects/tensorbase/crates/meta)`

Caused by:
  failed to load source for dependency `baselog`

Caused by:
  Unable to update https://github.com/tensorbase/baselog.git

Caused by:
  failed to find branch `master`

Caused by:
  cannot locate remote-tracking branch 'origin/master'; class=Reference (4); code=NotFound (-3)

make test runnable for anyone with linux system.

  • test with absolute data path, which causes test failed for others.
    we should use relative data path or use /tmp directory which exists any linux system.
  • test with data which isn't exists at this repository.

Issue encountered attempting to create a table prior to importing a CSV (additional documentation requested)

Hello,

I am very interested in your project and I am attempting to begin testing it out. However, the documentation for tools that exist in m0 does not seem to be accurate (baseops, baseshell). Subsequently, attempting to use the clickhouse client to create a very simple table using ddl fails. I am not sure what to use for ENGINE although it appears to be required and using MergeTree fails. I tried with and without ORDER BY. Any assistance you can provide would be greatly appreciated.

-Chris Whelan

TensorBase :) create table sales (title string) ENGINE = MergeTree ORDER BY title;

CREATE TABLE sales
(
    `title` string
)
ENGINE = MergeTree
ORDER BY title

Query id: 22fd667c-851d-4087-9fb7-5a58128003de


0 rows in set. Elapsed: 0.001 sec.

Received exception from server (version 2021.3.0):
Code: 3. DB::Exception: Received from localhost:9528. WrappingLangError(ASTError). Error when AST processing.

setup some github bots

  • github action for building
  • github action for testing
  • github action for coverage
  • add github performance bot
  • Dependabot

Error inserting string value into table

Hi, the documentation indicates that the string data type is supported, but attempting to insert a string into an existing tables fails with NoFixedSizeDataTypeError.


SHOW CREATE TABLE sales

Query id: e96c5bbe-52ad-4fcc-9df0-1afdab76700e

┌─statement─────────────────────────────────────────────────┐
│ create table sales ( Region String ) ENGINE = BaseStorage │
└───────────────────────────────────────────────────────────┘

1 rows in set. Elapsed: 0.000 sec.

TensorBase :) insert into sales (Region) values ('North')

INSERT INTO sales (Region) VALUES

Query id: 98e40494-bbbc-461d-bb7d-7e7798987b4d


1 rows in set. Elapsed: 0.001 sec.

Received exception from server (version 2021.3.0):
Code: 4. DB::Exception: Received from localhost:9528. WrappingMetaError(NoFixedSizeDataTypeError). No fixed size for dynamic sized data type.

The same problem occurs with Decimal(x,y) data types.

Support table function `numbers` and `numbers_mt`

It's not an easy way to try tensorbase.

The Blog Hello, Base has some docs about nyc_taxi datasets benchmarks with ClickHouse. Yet the dataset is pretty large, it's hard for users to explorer tensorbase quickly.

Maybe we can implement some table functions like numbers or number_mt in ClickHouse.

JDBC Driver Connection Times Out on Read

@jinmingjian , Jin, I attempted to connect to the TB server using the Clickhouse JDBC driver (pulled in by DBeaver) on port 9528, but the connection attempt times out on the read. I tried configuring the connection properties both with no Database/Schema specified as well as with default. I also tried with the No authentication option set. Below are screen captures illustrating the connection properties and the connection error. Note that I also confirmed that the firewall is turned off.

clickhouse-jdbc-conn

clickhouse-jdbc-conn-error

Cannot join slack channel from the links in README or in offcial website, maybe consider opening a new communication channel like gitter?

When click the Slack Channel link in README or in official website, it redirects you to tensorbase's official Slack Channel link: https://tensorbase.slack.com/, but without an invitation. So I think newcomers cannot log in.
image

I can log in other slack workspaces like Kubernetes, so I guess it's just because tensorbase's slack link is not an invitation link. See that of k8s', there is a button says "GET MY INVITE" for people who are not in the group.
image

[RFC]Enable TPC-H Benchmarks

Why

Arrow-DataFusion has already supported the parts of TPC-H. But TensorBase does not support the storage of all that data types. To enable this benchmarks, it makes TensorBase more feature-mature.

How

From the Arrow-DataFusion, we should support the following type: DataType::Float64, DataType::Utf8, DataType::Date32. However, this is not economical and performance way. Firstly, it suggest enable Decimal, String, Datetime.

TODO

  • support Datetime type
  • support Date type #54
  • support String type #22
  • support Decimal type #26

add storage layer

  • basic data layout and data partition design
  • enable data write path
  • add server client arch
    • add server
    • add client driver(rust)
    • change baseops to use client driver
    • change baseshell to use client driver
  • basic stress tests and benchmarks

[RFC] Distributed Storage & Query

WHY

Currently, TensorBase only supports single node mode. A single node may not have enough space for all the data and we need to store them in a distributed manner. By introducing components like Ballista, we can enable TB to support distributed storage and query.

HOW

Currently, a ClickHouse compatible SQL query will be parsed and passed to TB/engine, TB/engine will then invoke DataFusion to execute the query. To support distributed storage and query, we can add a distributed engine (e.g., Ballista) between TB/engine and DataFusion.

For example, when TB is configured to use Ballista to support distributed storage and query, TB/engine can act as a Ballista client and send ExecuteQuery to Ballista scheduler. The scheduler will then distribute the work to executer(s). For more details about the architecture of Ballista, please refer to this doc.

In the future, TB may support different distributed engines other than Ballista. We should be able to integrate them in a similar manner.

TODO

  • Add Ballista as a distributed storage and query engine
    • Add arrow-datafusion/ballista to TB #87

Distributing Query

if uses DataFusion Ballista, this feature may be easily achieved. not sure who wants to try firstly:)

build failed with `rustc --explain E0433`

error[E0433]: failed to resolve: could not find `addr_of` in `ptr`
   --> /Users/kaichen/.cargo/registry/src/github.com-1ecc6299db9ec823/anyhow-1.0.40/src/error.rs:606:14
    |
606 |         ptr::addr_of!((*unerased.as_ptr())._object) as *mut E,
    |              ^^^^^^^ could not find `addr_of` in `ptr`

error[E0433]: failed to resolve: could not find `addr_of` in `ptr`
   --> /Users/kaichen/.cargo/registry/src/github.com-1ecc6299db9ec823/anyhow-1.0.40/src/error.rs:647:22
    |
647 |                 ptr::addr_of!((*unerased.as_ptr())._object) as *mut E,
    |                      ^^^^^^^ could not find `addr_of` in `ptr`

error: aborting due to 2 previous errors

For more information about this error, try `rustc --explain E0433`.
error: could not compile `anyhow`

To learn more, run the command again with --verbose.
warning: build failed, waiting for other jobs to finish...
error: build failed

Write Ahead Log

TB now already survives from the application crash. it is nice to have a WAL to protect against the kernel crash or machine sudden shutdown.

Joint a Foundation to Ensure Open Source Continuity

Is it possible to consider joining a foundation to ensure opensource continuity. Since this is AL 2.0 perhaps ASF may be a good fit if you can be accepted, otherwise there are other foundation like the Linux Foundation, Cloud Native Foundation, etc. which you can approach.

complete main type supports

  • LowCardinality String
  • String/Blob
  • Decimal
  • several other fixed length types (low hanging fruits)
  • Nullable types

[RFC]Support Builtin functions with Datafusion

The easiest way to support builtin functions seems to be by using ScalarUDF or AggregateUDF.

Changes

To support this the following changes would have to be made

lightjit::builtins

We have to convert the functions into type datafusion::ScalarUDF.

Also, add a get_udf() function that matches a string to the function.

Should we still have these located here or maybe move them into a different crate?

lang::parse

add a new function parse_builtins(p). This would be similar to the current parse_tables(p) function but look for any builtins and return a HashSet of the ones found.

engine::run()

add let builtins = parse::parse_builtins(p)?;. We would have to also add this field to datafusion::run the same as tabs or cols.

    let (tabs, cols) = parse::parse_tables(p)?;
    let builtins = parse::parse_tables(p)?; // <--- New
    log::debug!("projections - tabs: {:?}, cols: {:?}", tabs, cols);
    datafusions::run(ms, ps, current_db, raw_query, query_id, tabs, cols, builtins, qs) // <- also pass builtins

engine::datafusion::run()

Before running we can check the sql if any builtins are used. If used all we need is to cxt.register_udf(builtin).

Sudo code

let mut ctx = ExecutionContext::new();
...
if !builtins.is_empty() {
    for f in builtins.drain() {
        let udf = get_udf(f)?;
        ctx.register_udf();
    }
}
let df = ctx.sql(raw_query)?;
...

support float type

generally, all support to fixed length type is easy to add. Just pick up one currently implemented type as an example:)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.