addb-swstarlab / addb Goto Github PK
View Code? Open in Web Editor NEWAnalytic Distributed DBMS produced by Data Engineering Lab at Yonsei University
License: Apache License 2.0
Analytic Distributed DBMS produced by Data Engineering Lab at Yonsei University
License: Apache License 2.0
Hello. I am majoring in computer science at YUST and am interested in ADDB.
During code review of how data is stored, there are parts that need to be supplemented and left behind.
The addb_table.c file in the v1.1.0 branch shows only processing for null value.
Considering the frequency of the empty string in addition to the null value, we suggest adding processing parts for empty string.
A default data structure, Hash type, is not appropriate in case that the structure has small amounts of entries.
We need to optimize the data structure to reduce the memory usage of Redis.
Using Ziplist under a predefined threshold can be a possible solution.
Currently, memory overhead is significant due to relational conversions when loading data. We need to make improvements. Improvements are needed to improve performance by reducing memory overhead.
Because partitionInfo already has information about the columns and values, I suggest ways to take advantage of them.
In fpScan, transferring the batches of column values will improve the scan performance.
In the current version, data is transferred to SparkSQL in form of row format.
Lots of iterations generate unnecessary computing overhead.
SparkRedisConnector is also required to support this modification
Naive vector serialization, stringify, is extremely slow and has big data size.
ex) Vector = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
--> Serialization result = "V:{T:int:N:10}:D:[1:2:3:4:5:6:7:8:9:10]"
and, it can't recognize '[', ':', ']' characters because of using tokens in string. (This is a big problem)
We need to use other awesome serialization algorithms to optimize serialization performance. I find a nice serialization library "protocol buffer" developed by Google.
google/protobuf
github
protobuf C Implementation
protobuf performance link
According to above analysis, protocol bufffer has the best serialization performance and small data size.
So, using "protocol buffer" can be a possible solution.
To modify RocksDB configuration, we should add knobs at addb/deps/rocksdb/examples/simple_example.cc and then compile that code.
However, when compiling the file, errors occurred which are related to libraries such as zlip and bzip2.
It can be solved by modifying addb/deps/rocksdb/Makefile, such as changing ZLIB_VER to 1.2.12 or redefining BZIP2_DOWNLOAD_BASE path.
The revised Makefile version will be updated soon.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.