Code Monkey home page Code Monkey logo

mini-lsm's Introduction

banner

LSM in a Week

CI (main)

Build a simple key-value storage engine in a week! And extend your LSM engine on the second + third week.

The Mini-LSM book is available at https://skyzh.github.io/mini-lsm. You may follow this guide and implement the Mini-LSM storage engine. We have 3 weeks (parts) of the tutorial, each of them consists of 7 days (chapters).

Community

You may join skyzh's Discord server and study with the mini-lsm community.

Join skyzh's Discord Server

Add Your Solution

If you finished at least one full week of this tutorial, you can add your solution to the community solution list at SOLUTIONS.md. You can submit a pull request and we might do a quick review of your code in return of your hard work.

Development

For Students

You should modify code in mini-lsm-starter directory.

cargo x install-tools
cargo x copy-test --week 1 --day 1
cargo x scheck
cargo run --bin mini-lsm-cli
cargo run --bin compaction-simulator

For Course Developers

You should modify mini-lsm and mini-lsm-mvcc

cargo x install-tools
cargo x check
cargo x book

If you changed public API in the reference solution, you might also need to synchronize it to the starter crate. To do this, use cargo x sync.

Code Structure

  • mini-lsm: the final solution code for <= week 2
  • mini-lsm-mvcc: the final solution code for week 3 MVCC
  • mini-lsm-starter: the starter code
  • mini-lsm-book: the tutorial

We have another repo mini-lsm-solution-checkpoint at https://github.com/skyzh/mini-lsm-solution-checkpoint. In this repo, each commit corresponds to a chapter in the tutorial. We will not update the solution checkpoint very often.

Demo

You can run the reference solution by yourself to gain an overview of the system before you start.

cargo run --bin mini-lsm-cli-ref
cargo run --bin mini-lsm-cli-mvcc-ref

And we have a compaction simulator to experiment with your compaction algorithm implementation,

cargo run --bin compaction-simulator-ref
cargo run --bin compaction-simulator-mvcc-ref

Tutorial Structure

We have 3 weeks + 1 extra week (in progress) for this tutorial.

  • Week 1: Storage Format + Engine Skeleton
  • Week 2: Compaction and Persistence
  • Week 3: Multi-Version Concurrency Control
  • The Extra Week / Rest of Your Life: Optimizations (unlikely to be available in 2024...)

Tutorial Roadmap

Week + Chapter Topic
1.1 Memtable
1.2 Merge Iterator
1.3 Block
1.4 Sorted String Table (SST)
1.5 Read Path
1.6 Write Path
1.7 SST Optimizations: Prefix Key Encoding + Bloom Filters
2.1 Compaction Implementation
2.2 Simple Compaction Strategy (Traditional Leveled Compaction)
2.3 Tiered Compaction Strategy (RocksDB Universal Compaction)
2.4 Leveled Compaction Strategy (RocksDB Leveled Compaction)
2.5 Manifest
2.6 Write-Ahead Log (WAL)
2.7 Batch Write and Checksums
3.1 Timestamp Key Encoding
3.2 Snapshot Read - Memtables and Timestamps
3.3 Snapshot Read - Transaction API
3.4 Watermark and Garbage Collection
3.5 Transactions and Optimistic Concurrency Control
3.6 Serializable Snapshot Isolation
3.7 Compaction Filters

License

The Mini-LSM starter code and solution are under Apache 2.0 license. The author reserves the full copyright of the tutorial materials (markdown files and figures).

mini-lsm's People

Contributors

ben1009 avatar caicancai avatar cypppper avatar dimbtp avatar el-even-11 avatar felixonmars avatar foreverhighness avatar gleiphir2769 avatar husharp avatar lacop avatar leiysky avatar letterbeezps avatar mahinshaw avatar pinelliac avatar pjzhong avatar ppdogg avatar redixhumayun avatar skyzh avatar ss18 avatar stopire avatar xiaguan avatar xxchan avatar xzhseh avatar yangchenye323 avatar yongxin-hu avatar yyin-dev avatar zeng1998 avatar zxch3n avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mini-lsm's Issues

Should function signatures be consistent between mini-lsm-starter and mini-lsm?

Hi, I noticed that some function signatures is different beween mini-lsm-starter and mini-lsm.

For example, decode_block_meta in mini-lsm-starter

/// Decode block meta from a buffer.
pub fn decode_block_meta(buf: impl Buf) -> Vec<BlockMeta> {
unimplemented!()

decode_block_meta in mini-lsm

pub fn decode_block_meta(mut buf: &[u8]) -> Result<Vec<BlockMeta>> {
let mut block_meta = Vec::new();
let num = buf.get_u32() as usize;

According to the commit history, it seems that decode_block_meta in mini-lsm was later modified, causing the function signature to change. I found many similar situations while reading the code.

Does it need to be fixed? If it is, let me know and I can help to fix them.

no such command: `nextest`

I got an error when using cargo x scheck to check the style and run all test cases according to Tutorial:

cargo fmt
cargo check
    Finished dev [unoptimized + debuginfo] target(s) in 0.04s
cargo nextest run
error: no such command: `nextest`

        Did you mean `test`?

        View all installed commands with `cargo --list`
Error: command ["cargo", "nextest", "run"] exited with code 101

after I replace nextest with test in test(), only style checking works, no test cases run.

[Doc] Any plans to support chinese version tutorial

Hi, I think this is a very good repository for understanding LSM and Rust. But I found that the tutorial does not seem to have a Chinese version, which seems to be difficult for non-English native developers to understand.

Do we have any plans to support chinese version tutorial? If not, I think I can help to support it.

Leveled compaction crashes when recovering from manifest

The LeveledCompactionController::apply_compaction_result method tries to sort the merged SSTs by key inside the method, which will cause problem when this method is called in manifest recovery context, because at that point no actual SSTs are loaded.

Tracking: Proof Reading, v1 release

  • Preface
  • Overview + Setup
  • Week 1 Overview
  • 1.1
  • 1.2
  • 1.3
  • 1.4
  • 1.5
  • 1.6
  • 1.7
  • Week 2 Overview
  • 2.1
  • 2.2
  • 2.3
  • 2.4
  • 2.5
  • 2.6
  • 2.7
  • Week 3 Overview
  • 3.1
  • 3.2
  • 3.3
  • 3.4
  • 3.5
  • 3.6
  • 3.7

Why remove the `memtable` from `imm_memtables` in `LsmStorgae.sync()`?

// Add the flushed L0 table to the list.
{
let mut guard = self.inner.write();
let mut snapshot = guard.as_ref().clone();
// Remove the memtable from the immutable memtables.
snapshot.imm_memtables.pop();
// Add L0 table
snapshot.l0_sstables.push(sst);
// Update SST ID
snapshot.next_sst_id += 1;
// Update the snapshot.
*guard = Arc::new(snapshot);
}

In sync() we push the memtable to imm_memtables first, but remove it during flush to ssTable. What's the reason for removing memtable from imm_memtables? If we do not remove it, we can get key-value in imm_memtables which is located in memory.

And what is the role of imm_memtables in LsmStorage? It's unclear to me.

Feedback after complete mini-lsm.

First of all, thank you for such a great tutorial on LSM-Tree storage engine.

After finishing 3 weeks project, I have got a lot of to share, including some questions and suggestions.

1. StorageIterator specification

Is there an implicit StorageIterator specification?

In week 1 project, there are several iterators that implement StorageIterator.

As I implemented these iterators, I often asked myself, "Have I implemented these iterators in a coherent way?"

It turns out that I wrote the same assertions again and again, so I feel there is some implicit specification that I should apply.

The spec I summarized is listed below:

  • fn key(&self) -> KeyType<'_>
    • This function can be called iff the iterator is valid, if it is called from an invalid iterator, we could panic.
    • The return value must be an valid key, which means it is an non-empty key.
  • fn value(&self) -> &[u8]
    • This function can be called iff the iterator is valid, if it is called from an invalid iterator, we could panic.
  • fn is_valid(&self) -> bool
    • This function return false iff its underlying storage cannot produce a new element.
    • For FuseIterator, it also return false after next return an error.
  • fn num_active_iterators(&self) -> usize
    • For invalid iterator, it should return 0
    • For valid iterator, it should not return 0.
    • For FuseIterator, when the iterator is valid, it should not return 0. when the iterator is invalid, the return value is unspecified.
  • fn next(&mut self) -> Result<()>
    • For invalid iterator, it is no-op.
    • For valid iterator, if return Ok(())
      • If underlying storage cannot produce a new element, turn into invalid iterator. return Ok(())
      • Otherwise, store the new element. return Ok(())
      • The key from new element should strict greater than old key.
    • For valid iterator, if return Err(e)
      • The key, value, is_valid call should return same value as before, that means act like no-op.
      • For compound of iterators like MergeIterator, SstConcatIterator and TwoMergeIterator, it should remove the underlying iterator which cause the error. (I'm not sure whether is sound for SstConcatIterator and TwoMergeIterator)
      • For FuseIterator, it should turn into invalid iterator.

I'm not sure if it covers all situations.

2. Inconsistent function signature

As mentioned in #72. There is an inconsistent function signature in decode_block_meta.

/// Decode block meta from a buffer.
pub fn decode_block_meta(buf: impl Buf) -> Vec<BlockMeta> {

/// Decode block meta from a buffer.
pub fn decode_block_meta(mut buf: &[u8]) -> Result<Vec<BlockMeta>> {

Until week 2 day 7, the signature is fine, but it turns out that this is not sufficient for checksum functionality.

  • The original signature implied that this function is infallible. But checksum will cause an error, so the return type should be wrapped in Result,
  • &[u8] is more convenient than impl Buf when implementing checksum functionality.

3. key always non-empty

It seems that an empty key is considered invalid. So we can enforce it by adding ParsedKey type, we could use nutype crate to achieve this goal.

But it may not be necessary for an educational project, since it is convenient to use an empty key instead of Option wrapper.

4. apply_result does not point out what it should return

/// Apply the compaction result.
///
/// The compactor will call this function with the compaction task and the list of SST ids generated. This function applies the
/// result and generates a new LSM state. The functions should only change `l0_sstables` and `levels` without changing memtables
/// and `sstables` hash map. Though there should only be one thread running compaction jobs, you should think about the case
/// where an L0 SST gets flushed while the compactor generates new SSTs, and with that in mind, you should do some sanity checks
/// in your implementation.
pub fn apply_compaction_result(

As the document does point out that LsmStorageState should be the new state, but it does not point out what Vec<usize> means.
Only after I exam solution code, I figure out that should be sst_ids which need be removed.

5. Seedable level compaction simulator

The level compaction simulator uses rand to generate the key range, so each run will produce different output, making it difficult to verify that the output is identical to the reference solution by using command line tools such as diff cmp.

One solution is to add the seed flag to the simulator, and use rand::SeedableRng to generate random numbers.

6. println!() to log

In the reference solution, the implementation of apply_result contains some of println!() to log information, making it difficult to verify that the output is identical to the reference solution by using command line tools such as diff, cmp.

I think it should be replaced with eprintln!(), which will output to stderr and does not lose logging functionality.

7. Encourage to add test

In rust project, adding test is pretty easy! and adding tests is an easiest way to contribute to repository.

For example, when implementing manifest functionality, it is good to test serde_json library first.
So I can quickly write relative test within manifest.rs.

#[cfg(test)]
#[test]
fn test_record_serde() {
    let record1 = ManifestRecord::Flush(1);
    let json = serde_json::to_vec(&record1).unwrap();
    let record2 = serde_json::from_slice(&json).unwrap();
    assert_eq!(record1, record2);
}

Most of IDEs have the ability to run tests directly in Rust, which is different from C/C++ or Java, where you need some sort of test framework or use the main function to run a single simple test.

8. extract apply result logic to bind specific impl

When implementing apply_result function, I use a lot of related method of Vec in my solution,
e.g. Vec::splice, Vec::drain. But the levels Vec<(usize, Vec<usize>)> seems inefficient for doing compaction.

We could generalize levels impl like levels: Impl LevelsImpl and create a type alias type Levels = Vec<(usize, Vec<usize>)>;, and impl LevelsImpl trait for Vec<(usize, Vec<usize>)>, and we have the ability to easily switch to another implementation.

But it is of low priority, since this project is just for educational purposes.

9. immutable memtable is not immutated

Due to the interior mutability of MemTable, there is no way to forbid programmers from modifying immutable memtables in Vec<Arc<MemTable>>.

Which may (low probability) cause a nasty bug. But could be avoided with a little care.

The solution is quite simple: Use the Newtype pattern and expose only the read-only interface.

10. Migration to mvcc version is considered painful

It is kind of dirty work, and frustrating, to refactor code to use the key+ts representation.

The code that needs to be changed is scattered across many files.

Asking students to write unsafe code in Rust is awkward.

Learning on code that does not complied is quite inefficient.

11. use Arc_cycle to store weak pointer into LsmMvccInner

pub fn new_txn(&self, inner: Arc<LsmStorageInner>, serializable: bool) -> Arc<Transaction>

Strictly speaking, LsmMvccInner::new_txn can only be called with the LsmStorageInner that created it, but its semantics do not restrict this.

For example, if there are two different LsmStorageInner named storageA and storageB with associated LsmMvccInner named mvccA and mvccB,
nobody forbids us to call mvccB.new_txn(storageA) or mvccA.new_txn(storageB), which is obviously a logical error.

However, as project developers, we know that there is only one LsmStorageInner instance, so this won't be a critical issue.

But ideally, we can do better by using Arc::Arc_cycle to pass a weak pointer directly into LsmMvccInner::new.

12. TxnIterator, LsmIterator is not compatible on non-mvcc version

Currently TxnIterator uses TwoMergeIterator<TxnLocalIterator, LsmIterator> to iterate over the underlying storage.

When switching to the non-mvcc version, creating an empty transaction without mvcc can be tricky.

It might be better to use Option<A> in TwoMergeIterator and Option<Arc<Transaction>> in TxnIterator to handle this situation.


Again, thank you for such a great tutorial on LSM-Tree storage engine.

Nextest required but not included

Just a heads up: I followed the environment setup steps and was getting:

cargo nextest run
error: no such command: `nextest`

	Did you mean `test`?

	View all installed commands with `cargo --list`
	Find a package to install `nextest` with `cargo search cargo-nextest`
Error: command ["cargo", "nextest", "run"] exited with code 101

It looks like I had to cargo install cargo-nextest to get things working. Seems like this should be included in the environment setup docs.

WAL Atomiticy

As pointed out by someone in Discord, we do not guarantee atomicity for WAL, in the case of txn commit.

failed to load manifest for workspace member

When I run the command cargo x install-tools, I found the following error:

$ cargo x install-tools

error: failed to load manifest for workspace member `/Users/wjjiang/rustproject/mini-lsm/mini-lsm`

Caused by:
  failed to parse manifest at `/Users/wjjiang/rustproject/mini-lsm/mini-lsm/Cargo.toml`

Caused by:
  invalid type: map, expected a string for key `package.edition`

I don't know how to fix it, could someone help?

Scheme fork of mini-lsm

I am working on mini-lsm in Scheme. It support:

  • bigger than volatile memory;
  • in-range key count
  • in-range byte count;

Open for cooperation.

Confusing SST structure figure in W1D4

The SST stucture is described in W1D4 like this.

-------------------------------------------------------------------------------------------
|         Block Section         |          Meta Section         |          Extra          |
-------------------------------------------------------------------------------------------
| data block | ... | data block |            metadata           | meta block offset (u32) |
-------------------------------------------------------------------------------------------

I found "meta block offset (u32)" is included in meta data instead of Extra. Am I missing something? Or is there an error in this figure?

Wouldn't it be more clearer if it were described like this?

-------------------------------------------------------------------------------------------
|         Block Section         |          Meta Section         |          Extra          |
-------------------------------------------------------------------------------------------
| data block | ... | data block |  data meta | ... | data meta  |       checksum (u32)    |
-------------------------------------------------------------------------------------------

Feedback after coding day 1

Hey,
Discovered this project recently and I find it super interesting, I had a very superficial understanding of LSM engines having worked a little with Cassandra, but I never had the time and need to dig in and understand how it works in details. This tutorial sparked the motivation to do so and so far I find it really interesting! Thanks for taking your time to build such interesting learning resources.

I just finished the day 1 part for now but I feel that I have a bit of feedback to give:

  • I took me some time to understand that it's normal and expected that the BlockBuilder::add method is not responsible for sorting the entries of the block, but that instead the caller is responsible to insert entries in sorted order. After thinking quite a bit about it it finally made sense because blocks (and SSTables) are crafted from the in-memory tables and we therefore have all the data available when building a block, but it was not clear in the text and I spent some time wondering why we were implementing a bisect search in the block iterator while we didn't enforce any sorting in the block building.
  • Some things doesn't feel really "rusty". I'm thinking about the iterator interface you designed for BlockIterator, which looks really different from iterators we are used to have in rust. In particular, the idea of "invalidating" the iterator by using empty vecs as key and value when we consume all the iterator, looks really different from the pattern of the next method returning an Option<T>, with None marking end of iteration. This is not bad per se, this custom iterator having different requirements, but it's a bit surprising.
  • I think your unit tests could have more coverage, and also be split into smaller tests that do test a single property. For instance, when in the tutorial you specify a property like "Key length and value length are 2B, which means their maximum length is 65536.", I'm expecting to see a unit test that will cover this property and make sure it is enforced. I did some of this in my day 1 work in my fork if you wanna check it out (LeBoucEtMistere#1)
  • I think the tutorial steps could provide clearer entry points in code, i.e. "to solve this task, you will need to implement function in XXX".
  • I noticed you liberally use as conversions in your solution, they tend to be risky especially when you convert to a more restrictive type, e.g. usize to u16. In that case it's best to use try_into to handle the case in which the conversion fails. That's not a big deal but since that's a tutorial it's always nice to leave a note about this to teach people about the risks of as conversions :)
  • I never used the bytes crate before so I had to spend some time reading the doc and understanding how it works by looking at the solution code. I would have loved it if there was a note in the tutorial that mentioned it would be wise to leverage this crate to build the solution (because it is not mentioned anywhere except in the solution so I suspect many people might not know it's available to help), and maybe also giving a very quick example in the tutorial of how it can help playing with bytes.
  • In the part about the encoding of blocks, you give great schemas of the format, but I would also have loved seeing a concrete example to make it clearer, with actual example data.
  • I would have loved seeing some pointers to crates/algorithms in the extra tasks. I'm not very familiar with algos for compression/checksums, especially in the context of LSM engines, and it felt quite hard to explore this topic without a few pointers.

These are just a bunch of suggestions from my personal experience but I hope it helps you make your tutorial even better. Thanks for the great work, looking forward to working on day 2 tasks :D 🚀 !

question about `MergeIterator.next()`

// Otherwise, compare with heap top and swap if necessary.
if let Some(mut inner_iter) = self.iters.peek_mut() {
if *current < *inner_iter {
std::mem::swap(&mut *inner_iter, current);
}
}

I'm not familiar with rust, so I do not clearly understand the meaning of std::mem::swap(&mut *inner_iter, current) here in MergeIterator.next impl.

I guess it maybe means that heap.pop() first, and then heap.push(current)? And why we have to use swap function here?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.