addb-swstarlab / addb Goto Github PK

View Code? Open in Web Editor NEW

53.0 53.0 9.0 6.9 MB

Analytic Distributed DBMS produced by Data Engineering Lab at Yonsei University

License: Apache License 2.0

Makefile 0.28% Shell 1.61% C 82.30% C++ 0.15% Ruby 2.00% Tcl 13.63% Smarty 0.03%

addb's People

Contributors

Stargazers

Watchers

Forkers

hshs1103 lihaolin-datae muyoutu hanseungsung kid-7391 deniskim82 shengminp rlagnlrns dkdl012

addb's Issues

[fpWriteCommand]Suggest processing of empty string is possible

Hello. I am majoring in computer science at YUST and am interested in ADDB.

During code review of how data is stored, there are parts that need to be supplemented and left behind.

The addb_table.c file in the v1.1.0 branch shows only processing for null value.

Considering the frequency of the empty string in addition to the null value, we suggest adding processing parts for empty string.

[ISSUE] Memory Optimization Utilizing ZipList is Required

A default data structure, Hash type, is not appropriate in case that the structure has small amounts of entries.

We need to optimize the data structure to reduce the memory usage of Redis.

Using Ziplist under a predefined threshold can be a possible solution.

Reduce relational model conversion memory overhead

Currently, memory overhead is significant due to relational conversions when loading data. We need to make improvements. Improvements are needed to improve performance by reducing memory overhead.

Because partitionInfo already has information about the columns and values, I suggest ways to take advantage of them.

Avoid vector iteration in Scan operation

In fpScan, transferring the batches of column values will improve the scan performance.

In the current version, data is transferred to SparkSQL in form of row format.

Lots of iterations generate unnecessary computing overhead.

SparkRedisConnector is also required to support this modification

[ISSUE] Increases serialization performance by using google/protobuf

Naive vector serialization, stringify, is extremely slow and has big data size.
ex) Vector = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
--> Serialization result = "V:{T:int:N:10}:D:[1:2:3:4:5:6:7:8:9:10]"
and, it can't recognize '[', ':', ']' characters because of using tokens in string. (This is a big problem)

We need to use other awesome serialization algorithms to optimize serialization performance. I find a nice serialization library "protocol buffer" developed by Google.
google/protobuf
github
protobuf C Implementation

protobuf performance link
According to above analysis, protocol bufffer has the best serialization performance and small data size.

So, using "protocol buffer" can be a possible solution.

RocksDB configuration modification error

To modify RocksDB configuration, we should add knobs at addb/deps/rocksdb/examples/simple_example.cc and then compile that code.

However, when compiling the file, errors occurred which are related to libraries such as zlip and bzip2.
It can be solved by modifying addb/deps/rocksdb/Makefile, such as changing ZLIB_VER to 1.2.12 or redefining BZIP2_DOWNLOAD_BASE path.

The revised Makefile version will be updated soon.

addb-swstarlab / addb Goto Github PK

addb's People

Contributors

Stargazers

Watchers

Forkers

addb's Issues

[fpWriteCommand]Suggest processing of empty string is possible

[ISSUE] Memory Optimization Utilizing ZipList is Required

Reduce relational model conversion memory overhead

Avoid vector iteration in Scan operation

[ISSUE] Increases serialization performance by using google/protobuf

RocksDB configuration modification error

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent