Code Monkey home page Code Monkey logo

Comments (12)

rakibtg avatar rakibtg commented on August 15, 2024

Hi @anthonyrossbach I really appreciate that you are willing to invest your time to explore the internals of SleekDB!

When I have started working for SleekDB my primary intention was to make it suitable for high read operation as if it's fetching data from a static file(The cached JSON files)!
So I quoted on the website that,

SleekDB works great as the database engine for low to medium traffic websites.

You might have already realised that for each insert operation it creates a new JSON file behind the scene. So, in your use case there will be 10k files being created behind the scene every second, and this might cause some inode issue.

For example, an inode is allocated to a file so, if you have this huge amount of files, all 1 byte each, you'll run out of inodes long before you run out of disk. Although I haven't faced it yet while using SleekDB, but this is what I have been keep thinking that might go wrong when a large amount of data set is being inserted.
So I would recommend you to do a similar experiment if possible :)

Besides that, I was also thinking about a solution for this problem as well! As a result I came up with this very simple and basic ideas: https://github.com/rakibtg/SleekDB/projects/1#card-25190411
Here are the lists for you,

Single file based store system

  • Each store should have at least one file and split them based on a memory limit eg: 200MB. So if we have 1GB of data then the store will have 5 files.
  • Each line should contain a document inline.
  • Cache files could also be a single file based on a memory limit. Indexed by the hash token for a particular query.
  • Instead of traversing through files we will traverse through lines, and the data file will be consumed line by line instead of having the full buffer in the memory.
  • Same logic goes to update/create/delete operations.

Or,

There could be a better alternative solution to the inode issue as well that I might not know!

However, implementing this features will be not that easy I think! If you have any ideas feel free to let me know and share them here.
It would be really exciting if we can use SleekDB to get to handle this large amount of request.

Oh, and for distributed system I think we can add some API to let them communicate!
What do you think?

Email me if required: [email protected]

from sleekdb.

justanthonylee avatar justanthonylee commented on August 15, 2024

I love the idea of this and I am willing to make modifications and push changes like this. I have used static file engines a lot early on and I know due to larger projects I don’t want to waste overhead on MySQL or engines when most data will be read once the original data is written.

I will do some digging, I have a few ways I can think of to add the features you suggested and maybe even a CRON que where inserts that are non importance can be added to a que for insert later when it’s free. This would stop file locks being a problem with large inserts.

I will see what I can think of and I might just add the features if that’s ok with you :).

from sleekdb.

rakibtg avatar rakibtg commented on August 15, 2024

That would be great! 💪

Please send PR and if you need to discuss about any part of the code or idea feel free to knock me in twitter or other platform of your choice.

Cheers

from sleekdb.

rakibtg avatar rakibtg commented on August 15, 2024

Hi @anthonyrossbach Do you have any update?

from sleekdb.

justanthonylee avatar justanthonylee commented on August 15, 2024

I am still playing around with the idea, got distracted with a lot of other projects. I may have a good use case here soon outside of the original.

from sleekdb.

rakibtg avatar rakibtg commented on August 15, 2024

Great! I'm going to keep this issue Open.

from sleekdb.

derit avatar derit commented on August 15, 2024

for distributed system it's possible, for example master node, second node and other node, we can clustering horizontal and make infinity storage, for inode, i'ts another issue for web hosting
i think have two options for combine storage as single file or default one per one

from sleekdb.

Steinweber avatar Steinweber commented on August 15, 2024

It is not a good idea to store medium / large texts or images in this type of database.

Extreme example:
500 objects like this
$obj->id = token(32);
$obj->content = text (length 5,000 characters)

When saving and loading, >95% unnecessary data is encoded, decoded, read and written.
Better:
$text = new SideLoadObject(type=text,text(5,000 characters));
$obj->id = token(32);
$obj->content = $text;

$obj->content = {
sl_type = text
//sl_type = text / image / json (big obj)
path = path/to/storage/
file = md5($id.$content).txt
}

This keeps the size of the important files small!
Furthermore, there is no reason to execute json de- or encode on such a long text if it is stored in a .txt file anyway.

The main document must contain everything that serves as index (filter criterium).
Simple example
$obj = new Document();
$obj->__id = autogenrated;
$obj->username = foobar Master
$obj->firstname = Foo
$obj->lastname = Master
$obj->ip = 1.1.1.1
$obj->log = new SideLoadDocument(json,[])
$obj->tokens = ['abc','def']

$log = [
ip => '1.1.1.1'
token => 'abc'
date => time()
user-agent => 'foobar v42'
]

$obj->log->insert($log);
$obj->save();

You can also apply a filter to SideLoad objects. However, ALWAYS first reduce the number of documents via the indexes and then run "subquery" on SideLoad JSON.
$table->filter()->where(ip).equal('1.1.1.1')->and(log)->child(date).between(time(),time()-500);

This allows you to pack gigabytes of data into small and performant indexes.

from sleekdb.

Timu57 avatar Timu57 commented on August 15, 2024

I wanted to know how to make non-blocking file writes in PHP and found a nice Article about the performance of non-blocking writes to a File with PHP:

https://grobmeier.solutions/performance-ofnonblocking-write-to-files-via-php-21082009.html

The author used a script that wrote 10000 times 100 characters in a freshly created log file.

The two best solutions he came up with are:

$fp = fopen($file, 'a+');

flock($fp, LOCK_UN);
while($count > $loop) {
  if (flock($fp, LOCK_EX)) {
    fwrite($fp, $text);
  }
  flock($fp, LOCK_UN);
}
fclose($fp);

0.148998975754 seconds

$fp = fopen($file, 'a+');
stream_set_blocking($fp, 0);

while($count > $loop) {
  if (flock($fp, LOCK_EX)) {
    fwrite($fp, $text);
  }
  flock($fp, LOCK_UN);
}
fclose($fp);

0.149605989456 seconds

I don't know if this can help implementing this feature but this feature sounds really nice and I hope it will be finished soon.

If you need any help feel free to contact me.

from sleekdb.

rakibtg avatar rakibtg commented on August 15, 2024

@Timu57 I think writing is not a big issue but deleting and updating data is the most difficult thing to handle if we target a single file based system.

from sleekdb.

BinZhiZhu avatar BinZhiZhu commented on August 15, 2024

Are distributed projects supported or are there plans to support them?

from sleekdb.

rennokki avatar rennokki commented on August 15, 2024

So I searched for "HTTP API" and found this issue, I assume that a PHP process that serves an HTTP API is going to fix distribution?

from sleekdb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.