Hi, glad do found your project. Seems that near to the same time frame we did somethin

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Guys, a couple of comments on the whole discussion: <p dir="a

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

It looks like we build similar database engine,about louischatriot/nedb

Comments (17)

user8614 commented on July 21, 2024 3

Here is a performance bench comparing NoSQL databases written in pure javascript (finaldb, lowdb, memory, nedb, nosql, pouchdb, tingodb).

Bench source code is available here:
https://github.com/ezpaarse-project/dbbench

Here are the results of the bench with nice graphs:
https://docs.google.com/spreadsheets/d/1oU6YG_7JK6_qBuTNdkXRw6GGiX3vQpzg3f47t84rJAA/edit#gid=1556857962

from nedb.

sergeyksv commented on July 21, 2024 1

@andyhu Well, I I don't think we can bet one to one with netdb due to following reasons:

We do not keep all data in memory because we want to process relatively big datasets. The cost for this is that full table scan (non indexed query) known to be slow. But with real database you should already learn that full scan is always slow and you should avoid it. Though mongodb can be fast with 1000 records it will degrade to what is shown on tingodb if file system will be involved.
MongoDB uses memory mapping and fully rely to OS chaching. We can't, our work around it is partial cache. It is useful to at least to avoid double data load when doing query and getting actual data. The problem with partial cache is keeping it within allowed size. Best known solution is LRU and we tried corresponding module, but found it SLOW due to complex expiration logic. We choose the simple but fastest option which is direct mapped cache though it is not 100% effective (if you allow it max size to 1000 it doesn't mean that it will fit real 1000 elements.

You can quickly mimics nedb behaviour in tingodb by making simple unlimited cache by replacing tcache.js content by this:

var _ = require('lodash');
var BPlusTree = require('./bplustree');

function tcache (tdb, size) {
    this._tdb = tdb;
    this.size = size || 1000;
    this._cache = {};
}

tcache.prototype.set = function (k,v) {
    this._cache[k] = this._tdb._cloneDeep(v);
}

tcache.prototype.unset = function (k) {
    delete this._cache[k];
}

tcache.prototype.get = function (k) {
    return this._cache[k];
}

module.exports = tcache;

Your "benchmark" will be much faster (for me it was 20x). You can play with other cache strategies (LRU and so forth). Probably we can provide option to choose not only size but cache strategy as well. Different apps might have different demands.

By the way, everybody knows to write "benchmarks" that prove how cool they are. This one is even uses authors own time measurement module :).

What we know on practice that we can replace mongodb to tingodb on middle level apps almost without performance penalty. We don't shut for more. I'd rather more concerned on better functionality level compatibility.

Our goal is to mimic MongoDB in its behaviour, this is challenging. This not only more complex queries that should work the same way, but another very interesting behaviours. For example cursor provided by mongo db is "final", i.e. it contains data available on time when you did query. Nothing special except that we not want to bind full data copy to cursor and there is some architectural that forces us to use different indexes approach and some other stuff.

In short NedDB is in memory db first but with optional filesystem persistance. TingoDB is file system first but with memory cache :)

from nedb.

louischatriot commented on July 21, 2024 1

Guys, a couple of comments on the whole discussion:

NeDB is indeed way faster than MongoDB on small datasets since it keeps the entire database in memory. This does limit the size of datasets you can hold but in my opinion if you expect your dataset to grow over 1GB+ then you need a "real", full-blown solution like MongoDB, a pure JS database will be lacking in speed anyway.
Related to #1, I don't think benchmarks and raw speed tell us much. They are always biased due to the kind of dataset you use and the operations you try. It is much better to think in terms of usage, not technical performance. The goal of NeDB is to provide Node devs with a database that you can get up and running in literally one line of code, with an expressive enough query language for most small applications allowing for rapid prototyping, while also providing a cross-browser version using the same API and with persistence capabilities. Speed is at best a secondary matter for me, my benchmarks are meant to show that NeDB is fast enough for all the datasets it can hold, so you don't need to worrt about speed. An NeDB-backed server can treat thousands of requests per second which is compeltely fast enough, if you want more as I said you will need MongoDB.

from nedb.

louischatriot commented on July 21, 2024

Hello,

Nice to see I'm not the only one with this idea. I'm pretty close to the MongoDB API but as you said there are a few small discrepancies where I thought it made sense (for example I didn't reimplement ObjectID). I'll gladly compare benchmarks so that we can improve both DBs, tell me when you have some. I think the best, if you don't have benchmark code yet, would be that you use the same code as mine for results to be comparable.

Also, it may be a good idea to collaborate on the same project since they are so similar. I would rather work on NeDB since it is already quite well known (even though it's early stage) and I would understand if you don't find that appealing but please tell me if you want to.

Cheers,
Louis

PS : I'm closing this issue since it's not a real NeDB issue but that doesn't prevent us from discussing here.

from nedb.

dzcpy commented on July 21, 2024

Here is the benchmark using the ported NeDB's benchmark scripts:
Tingodb

$ node find -n 1000
----------------------------
Test with 1000 documents
Don't use an index
Use a persistent datastore
----------------------------
FIND BENCH - Begin profiling
FIND BENCH - Begin inserting 1000 docs - 1ms (total: 1ms)
===== RESULT (insert) ===== 3831 ops/s
FIND BENCH - Finished inserting 1000 docs - 262ms (total: 263ms)
FIND BENCH - Finding 1000 documents - 1ms (total: 264ms)
===== RESULT (find) ===== 13 ops/s
FIND BENCH - Finished finding 1000 docs - 75s (total: 75.3s)
FIND BENCH - Benchmark finished - 0ms (total: 75.3s)

NeDB

$ node find -n 1000
----------------------------
Test with 1000 documents
Don't use an index
Use a persistent datastore
----------------------------
FIND BENCH - Begin profiling
FIND BENCH - Begin inserting 1000 docs - 1ms (total: 1ms)
===== RESULT (insert) ===== 1901 ops/s
FIND BENCH - Finished inserting 1000 docs - 527ms (total: 529ms)
FIND BENCH - Finding 1000 documents - 1ms (total: 532ms)
===== RESULT (find) ===== 471 ops/s
FIND BENCH - Finished finding 1000 docs - 2.1s (total: 2.6s)
FIND BENCH - Benchmark finished - 0ms (total: 2.6s)

MongoDB

$ node find -n 1000
----------------------------
Test with 1000 documents
Don't use an index
Use a persistent datastore
----------------------------
FIND BENCH - Begin profiling
FIND BENCH - Begin inserting 1000 docs - 0ms (total: 1ms)
===== RESULT (insert) ===== 1848 ops/s
FIND BENCH - Finished inserting 1000 docs - 542ms (total: 543ms)
FIND BENCH - Finding 1000 documents - 1ms (total: 544ms)
===== RESULT (find) ===== 960 ops/s
FIND BENCH - Finished finding 1000 docs - 1s (total: 1.5s)
FIND BENCH - Benchmark finished - 0ms (total: 1.5s)

I also tested EJDB, which is an embedded db engine coded in native C. https://github.com/Softmotions/ejdb-node
Here is the result on the same machine.
EJDB

$ node find -n 1000
----------------------------
Test with 1000 documents
Don't use an index
----------------------------
FIND BENCH - Begin profiling
FIND BENCH - Begin inserting 1000 docs - 711ms (total: 711ms)
===== RESULT (insert) ===== 3125 ops/s
FIND BENCH - Finished inserting 1000 docs - 320ms (total: 1s)
FIND BENCH - Finding 1000 documents - 0ms (total: 1s)
===== RESULT (find) ===== 68 ops/s
FIND BENCH - Finished finding 1000 docs - 14.6s (total: 15.6s)
FIND BENCH - Benchmark finished - 0ms (total: 15.6s)

It seems like for this specific find test, NeDB is about 30x faster than TingoDB. I haven't tried with the default option -n 10000 since TingoDB stuck for a few minutes and I had to stop it.

You can find the benchmark code which I ported to TingoDB here: https://github.com/andyhu/tingodb/tree/benchmark/benchmarks

from nedb.

ralyodio commented on July 21, 2024

Can you explain what tool you're using for these benchmarks?

On Wed, Jan 21, 2015 at 9:55 PM, andyhu [email protected] wrote:

Here is the benchmark:
Tingodb

$ node find -n 1000

Test with 1000 documents
Don't use an index

Use a persistent datastore

FIND BENCH - Begin profiling
FIND BENCH - Begin inserting 1000 docs - 1ms (total: 1ms)
===== RESULT (insert) ===== 3831 ops/s
FIND BENCH - Finished inserting 1000 docs - 262ms (total: 263ms)
FIND BENCH - Finding 1000 documents - 1ms (total: 264ms)
===== RESULT (find) ===== 13 ops/s
FIND BENCH - Finished finding 1000 docs - 75s (total: 75.3s)
FIND BENCH - Benchmark finished - 0ms (total: 75.3s)

NeDB

$ node find -n 1000

Test with 1000 documents
Don't use an index

Use a persistent datastore

FIND BENCH - Begin profiling
FIND BENCH - Begin inserting 1000 docs - 1ms (total: 1ms)
===== RESULT (insert) ===== 1901 ops/s
FIND BENCH - Finished inserting 1000 docs - 527ms (total: 529ms)
FIND BENCH - Finding 1000 documents - 1ms (total: 532ms)
===== RESULT (find) ===== 471 ops/s
FIND BENCH - Finished finding 1000 docs - 2.1s (total: 2.6s)
FIND BENCH - Benchmark finished - 0ms (total: 2.6s)

It seems like for this specific find test, NeDB is about 30x faster than
TingoDB. I haven't tried with the default option -n 10000 since TingoDB
stuck for a few minutes.

You can find the benchmark code which I ported to TingoDB here:
https://github.com/andyhu/tingodb/tree/benchmark/benchmarks

Reply to this email directly or view it on GitHub
#34 (comment).

Anthony Ettinger
http://about.me/anthony.ettinger

from nedb.

dzcpy commented on July 21, 2024

I'm on Windows 7 Home 32bit with Node.js v0.10.35

from nedb.

anthonyettinger commented on July 21, 2024

@andyhu i was curious about this command: node find -n 1000

from nedb.

dzcpy commented on July 21, 2024

@anthonyettinger find.js is a script in /benchmarks folder of nedb and my ported tingodb repos. So node find == node find.js and pwd == /path/to/nedb/benchmarks

from nedb.

sergeyksv commented on July 21, 2024

As a quick proof just run benchmark test on nedb with "-m" key (memory only) and without. Results will be the same. We didn't learn how to make node use more than 1.4GB of ram. So currenlty nedb can operate with databases < 1GB, we can handle more and mostly limited by cache and indexes size (quite similar to mongodb :) )

from nedb.

dzcpy commented on July 21, 2024

Not sure why NeDB is so fast. But in a real scenario, it's usually rare to search against fields which are not indexed. So the benchmark doesn't quite reflect the real performance. Will try to test the indexed fields then

from nedb.

dzcpy commented on July 21, 2024

I've tested again using -i option which ensures the index. NeDB is still significant faster.
NeDB

$ node find -i
----------------------------
Test with 10000 documents
Use an index
Use a persistent datastore
----------------------------
FIND BENCH - Begin profiling
FIND BENCH - Begin inserting 10000 docs - 3ms (total: 3ms)
===== RESULT (insert) ===== 2116 ops/s
FIND BENCH - Finished inserting 10000 docs - 4.7s (total: 4.7s)
FIND BENCH - Finding 10000 documents - 1ms (total: 4.7s)
===== RESULT (find) ===== 15974 ops/s
FIND BENCH - Finished finding 10000 docs - 627ms (total: 5.3s)
FIND BENCH - Benchmark finished - 0ms (total: 5.3s)

TingoDB

$ node find -i
----------------------------
Test with 10000 documents
Use an index
Use a persistent datastore
----------------------------
FIND BENCH - Begin profiling
FIND BENCH - Begin inserting 10000 docs - 2ms (total: 2ms)
===== RESULT (insert) ===== 4416 ops/s
FIND BENCH - Finished inserting 10000 docs - 2.2s (total: 2.2s)
FIND BENCH - Finding 10000 documents - 1ms (total: 2.2s)
===== RESULT (find) ===== 3631 ops/s
FIND BENCH - Finished finding 10000 docs - 2.7s (total: 5s)
FIND BENCH - Benchmark finished - 0ms (total: 5s)

EJDB

$ node find -i
----------------------------
Test with 10000 documents
Use an index
----------------------------
FIND BENCH - Begin profiling
FIND BENCH - Begin inserting 10000 docs - 1.2s (total: 1.2s)
===== RESULT (insert) ===== 4887 ops/s
FIND BENCH - Finished inserting 10000 docs - 2s (total: 3.3s)
FIND BENCH - Finding 10000 documents - 1ms (total: 3.3s)
===== RESULT (find) ===== 5157 ops/s
FIND BENCH - Finished finding 10000 docs - 1.9s (total: 5.2s)
FIND BENCH - Benchmark finished - 0ms (total: 5.2s)

MongoDB (32 bit)

$ node find -i
----------------------------
Test with 10000 documents
Use an index
Use a persistent datastore
----------------------------
FIND BENCH - Begin profiling
FIND BENCH - Begin inserting 10000 docs - 3ms (total: 3ms)
===== RESULT (insert) ===== 2189 ops/s
FIND BENCH - Finished inserting 10000 docs - 4.5s (total: 4.5s)
FIND BENCH - Finding 10000 documents - 1ms (total: 4.5s)
===== RESULT (find) ===== 2834 ops/s
FIND BENCH - Finished finding 10000 docs - 3.5s (total: 8.1s)
FIND BENCH - Benchmark finished - 0ms (total: 8.1s)

It's interesting. Why MongoDB is so slow? Is it because of I'm on windows?

from nedb.

sergeyksv commented on July 21, 2024

@andyhu I believe I wrote too many words :).... NeDB is fast because it keeps ALL data in memory, which means that DB can't be larger than something about 1GB or event half of that. If you curious you can find its "limit".

Also, benchmark is TOO simplistic, it uses document with single field with sequential access, this is mostly JOKE, it has no any practical meaning. In real word documents are bigger, queries are more complex, access is concurrent (parallel). Real servers are more optimized for this. I suggest you to use async.forEachLimit function in test iteration which will simulate parallel query I and almost SURE that on real databases you will see significant increase in ops/s and I hope on our too, but not on NeDB. Note, that any realistic app does not execute same query N times in sequence.

Same if you concerned use more complex queries, at least search of nested field, like {"address.state":"CA"} that assumes objects like {"address":{"state":"CA","zip:90000","city":"Santa Rosa"}. I'm pretty sure NeDB will significantly degrade on that.

Next to that, think about this from other side. What means even 200 ops/s, it mean reply in 5 ms. This is good enough unless you build Facebook :).

Finally, we also did some benchmarking during implementation, and found that it has a tendency to degrade fast. For example, lets assume objects scan and search for attribute in nested object. Simple check is return obj.address.state=="CA". And you can get lets say 15K ops/s with that. But what if some object miss nested address (why not). Then you should use something like return obj.address && obj.address.state=="CA" and, surprise, this will give you only lets say 5K ops/s. Is it safe to use non safe query? No, there is no restriction for data structure so it is normal to work with such a fuzzy logic. Again I not sure if NeDB is handling that. Is it possible to optimize query for first level attribute? Yes. Does it make sense for real word. No.

from nedb.

sergeyksv commented on July 21, 2024

@andyhu Some resources for you:

https://github.com/sergeyksv/tingodb/tree/master/test/rnd some our benches that we make to make some decisions
https://github.com/sergeyksv/tingodb/tree/master/test/sample-data realistic data
https://github.com/sergeyksv/tingodb/blob/master/test/import-test.js test that loads realistic data

from nedb.

dzcpy commented on July 21, 2024

@sergeyksv Thanks for the infomation! Actually for the last benchmark, MongoDB is the slowest, which is quite supprising

from nedb.

sergeyksv commented on July 21, 2024

@andyhu simulate parallel access, I believe this is key. Actually when working with MySQL we have some impression that it never allocate all resources for one query. Which means that we can load MySQL in full only by making queries in parallel. Why server can serve more queries in parallel than sequential? Because it can better balance CPU and IO resources. NedDB has nothing to balance, as it not uses any IO resources (because data is in memory). So from one side it already provides a maximum. But from other side if you manage to launch on it really slow query, more likely it will freezes for that time, and not only itself, but entire nodejs process, and will not serve other requests.

from nedb.

Ivshti commented on July 21, 2024

Hello, @sergeyksv and @louischatriot.
What I did on the issue is - I wrote a DB engine over NeDB / LevelUP which auto-indexes so that each query can run indexed and avoid scanning.

Performance is very similar to MongoDB, maybe a bit faster. It doesn't load the full datastore in memory, and with large datasets it's faster than NeDB because of full indexing.
On small datasets NeDB is much faster, of course.

https://github.com/Ivshti/linvodb3

from nedb.

It looks like we build similar database engine about nedb HOT 17 CLOSED

Comments (17)

$ node find -n 1000

Use a persistent datastore

$ node find -n 1000

Use a persistent datastore

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent