TB now already survives from the application crash. it is nice to have a WAL to protec

There is interesting <a href="https://valyala.medium.com/wal-usage-looks-broken-in-mod

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

I have quickly review your references: the VictoriaMetrics's a

Maybe you are right. I think that the main point of article is WAL as a separate

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

Write Ahead Log about tensorbase HOT 17 OPEN

tensorbase commented on May 13, 2024

Write Ahead Log

from tensorbase.

Comments (17)

dmitry-salin commented on May 13, 2024

There is interesting article from the creator of VictoriaMetrics database. ClickHouse added WAL support as part of Polymorphic parts work, but this is a configurable parameter.

from tensorbase.

jinmingjian commented on May 13, 2024

@dmitry-salin thanks for your reference! It is interesting to see WAL can also cause some discussions. TB does not use the LSM. I am interesting to leave some thoughts after reading your references. It is a great opportunity to share some ideas to you and our community.

from tensorbase.

dmitry-salin commented on May 13, 2024

Thanks @jinmingjian ! I Think that the good option might be the ability to change ingestion settings for any db instance or table. So if we want the best possible safety guarantees and are willing to pay for it with ingestion performance - every incoming data chunk can be persisted to disk. I also looked at HSE, which is one of the newly created storage engines based on trie-like data structures. It also does not use WAL, but has API calls for forcing cached data to media - C0 data

I used to play with ClickHouse and ported it to Windows as part of my side project. This is not a classic LSM. The merge happens mainly in storage part, but it gives excellent compression rate and query performance (when it comes to large ranges of data from disk) Can you point me to TB part that implements storage?

from tensorbase.

jinmingjian commented on May 13, 2024

I have quickly review your references:

the VictoriaMetrics's article is a bit exaggerating the problem of WAL. The modern SSD is absolutely affordable as least for WAL kinds.
for the OLAP system, we expect batch to come. You can test to insert one-row-by-one-row into ClickHouse, then watch the performance... So, if we are optimized for batch style data writing, the WAL implementation is quitely simple : DIO write for every incoming batch. For small-row inserting, this is low performance (and may waste a little more space) but you know that make this fast does not save system from low performant fate. Optimization on this case is not worth it.
One of my interesting ideas about WAL. In a complex system, we have kinds of logs. Can we piggyback the WAL into other log?:)

from tensorbase.

dmitry-salin commented on May 13, 2024

Maybe you are right. I think that the main point of article is WAL as a separate data structure (which is a kind of write amplification).
Yes, ClickHouse is also optimized for relatively large batches, but in this default scenario it does not use WAL - compressed in-memory block is simply written to disk with DIO as it is.

from tensorbase.

jinmingjian commented on May 13, 2024

So if we want the best possible safety guarantees and are willing to pay for it with ingestion performance - every incoming data chunk can be persisted to disk.

Cool! We are seemly on the same line. 😄 For your HSE, I will have a look.

This is not a classic LSM. The merge happens mainly in storage part, but it gives excellent compression rate and query performance.

Yes. And you said "the merge happens mainly in storage" is understandable. This also makes the I said OLTP ingestion(one-row-by-one-row) prohibitive for CH. The logic as I said belongs to OLAP. But CH's delayed merging behavior worsened the already heavy data movement on external storage. In the heavy bigdata ingestions (but this should be common for bigdata), I have observed severe performance degradation.

Can you point me to TB part that implements storage?

thanks for interesting. I should roll out more readings for community. But the time is a bit tight. I hope make it coming soon. Not sure if the people in community has some interests to help:)

Let's do a quick note before more detailed explanation:

TB uses a data structure that we call "Partition Tree" (now we use a sled as a implementation the api is in the PartStore here). The storage lay is thin now and embedded in the runtime::write. The data is directly written to the partition file. While maintaining the append only write performance, it avoids the subsequent compact overhead of the LSM structure.

from tensorbase.

jinmingjian commented on May 13, 2024

Yes, ClickHouse is also optimized for relatively large batches, but in this default scenario it does not use WAL - compressed in-memory block is simply written to disk with DIO as it is.

I have not well investigating the implementation of CH. But from the simple observation, this seemly does not hold because the common CH store file is not aligned to the block size (512 or 4k). And small size batches less than block size can not be DIO-ed obviously. DIO is not really good in performance, its best value is at the full control. So, it is not reasonable to use DIO blindly.

from tensorbase.

GrapeBaBa commented on May 13, 2024

any progress for this task?

from tensorbase.

jinmingjian commented on May 13, 2024

@GrapeBaBa this is not too hard. But still no one works on this. Personally, I hope it could be resolved recently or at least in this summer. Do you have interests on this?:)

from tensorbase.

GrapeBaBa commented on May 13, 2024

@GrapeBaBa this is not too hard. But still no one works on this. Personally, I hope it could be resolved recently or at least in this summer. Do you have interests on this?:)

Have you done some design for wal such as wal record data format, wal file format, storage(e.g.mmap, bufferedio), concurrency control? There are also many systems which can be refer such as mysql/pg redo log, leveldb wal, etcd wal,jraft log storage.

from tensorbase.

jinmingjian commented on May 13, 2024

Great! You can give out your design. Yes, I have some ideas. We do not couple current wal to with other techniques. It is a just pure local wal for machine level crashing. Wait for a while, let me upgrade this to a RFC to attach some more details.

from tensorbase.

GrapeBaBa commented on May 13, 2024

Great! You can give out your design. Yes, I have some ideas. We do not couple current wal to with other techniques. It is a just pure local wal for machine level crashing. Wait for a while, let me upgrade this to a RFC to attach some more details.

That's great.

from tensorbase.

nautaa commented on May 13, 2024

The data in tensorbase is directly flush to disk, is the purpose of wal to ensure the atomicity of writing data?
Is it possible to write a block to the log area on the disk first, and then write the block to the corresponding partition? It is similar to mysql double write?
@jinmingjian

from tensorbase.

jinmingjian commented on May 13, 2024

@nautaa yeah. "Double write" is easy. You should carefully implement a recovery processing to reuse some mark in meta store.

from tensorbase.

GrapeBaBa commented on May 13, 2024

What is the progress for RFC? @jinmingjian

from tensorbase.

jinmingjian commented on May 13, 2024

@GrapeBaBa thanks for concerning. After thoughts, I decided that this issue is a little tricky. I postpone this issue to the later of #147. The good news is that the summer 2021 is coming... 😄

from tensorbase.

jinmingjian commented on May 13, 2024

it seems that current mainstream WALs are still in using "periodically flushing". This is not 100% guarantee way. But our TB design is 100% guarantee even for machine crashing. We may check in a simple version firstly.

from tensorbase.

Write Ahead Log about tensorbase HOT 17 OPEN

Comments (17)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent