infoforcefeed / olegdb Goto Github PK
View Code? Open in Web Editor NEWEnough works to use this in production
Home Page: http://olegdb.org
License: MIT License
Enough works to use this in production
Home Page: http://olegdb.org
License: MIT License
Doooo it.
Currently we just silently destroy existing hashes if there are collisions. That causes memory leaks (can be seen by running the jar test). We probably shouldn't be discarding data on collision anyway.
Verify that the memory is freed with valgrind.
Currently the makefile specifies a command line level macro that gets passed to erlc. If the beam files were compiled with make, they have the macro defined to look for libraries in ./build/lib
. This makes sense for debugging.
However, if make instal is specified on the commandline we need to recompile the beam files with the PREFIX specified by the user so that erlang knows where to look for the library.
This results in programs compiled in the standard make -> make install fashion to fail because they're searching for libraries in a directory that doesn't exist.
Segfault real fast
The functionality in the front end for this is there, but currently everything is just defaulted to an "oleg" db in /tmp. This needs to be configurable and interchangeable.
This needs to actually do something useful.
Anything else we can think of
All char *
instances should be replaced with structures that contain both length and the value. This will let people access the size.
[ git:master ]
$ ./run_server.sh
[-] Starting server.
[-] Listening on port 8080
[-] Continuing to read.
[-] Continuing to read.
[-] Continuing to read.
[ERROR](src/aol.c:99: errno: Resource temporarily unavailable) Error reading
[ERROR](src/aol.c:140: errno: None) Error reading
Feb 23 21:45:15 [x] Restore failed. Corrupt AOL?
Feb 23 21:45:15 [x] Error during AOL restore...
./run_server.sh: line 5: 26692 Segmentation fault erl -pa ./build/bin -noshell -s olegdb main -s init stop
[ OK kyle@insomnia:~ ]
$ curl -vvvv -X POST -d @kyleterry.com.html http://localhost:8080/kyleterry --header "Content-Type: text/html"
< HTTP/1.1 100 ContinuePOST /kyleterry HTTP/1.1
User-Agent: curl/7.35.0
Host: localhost:8080
Accept: /
Content-Type: text/html
Content-Length: 4805
Expect: 100-continue
@kyleterry.com.html payload is here: https://gist.github.com/kyleterry/9182593
This facilitates building FFIs on top of the library, as well as allowing other people to link against it to embed into their own projects.
Seems to happen specifically in the ol_scoop
call in ol_content_type
after a key has expired. Very odd.
[-] Requesting <<"%00%91f53%E7%5D%CE%9D%E5">>
DEBUG src/port_driver.c:149: Command from server: 2
DEBUG src/port_driver.c:120: Key: %00%91f53%E7%5D%CE%9D%E5
DEBUG src/port_driver.c:121: get_type klen: 24
DEBUG src/port_driver.c:127: Content type: application/octet-stream
DEBUG src/oleg.c:214: New key: %00%91f53%E7%5D%CE%9D%E5 Klen: 24
DEBUG src/oleg.c:326: Made Expiration: 1396509112
DEBUG src/oleg.c:214: New key: %00%91f53%E7%5D%CE%9D%E5 Klen: 24
DEBUG src/oleg.c:214: New key: %00%91f53%E7%5D%CE%9D%E5 Klen: 24
DEBUG src/oleg.c:326: Made Expiration: 1396509112
DEBUG src/oleg.c:214: New key: %00%91f53%E7%5D%CE%9D%E5 Klen: 24
*** Error in `/usr/lib/erlang/erts-5.10.4/bin/beam.smp': double free or corruption (fasttop): 0x00007fd8ec82f0a0 ***
======= Backtrace: =========
/usr/lib/libc.so.6(+0x731ff)[0x7fd91163b1ff]
/usr/lib/libc.so.6(+0x789ae)[0x7fd9116409ae]
/usr/lib/libc.so.6(+0x796b6)[0x7fd9116416b6]
./build/lib/liboleg.so(+0x4ce1)[0x7fd908f34ce1]
./build/lib/liboleg.so(ol_scoop+0x247)[0x7fd908f36203]
./build/lib/liboleg.so(ol_content_type+0xc0)[0x7fd908f36314]
./build/lib/libolegserver.so(+0x2a86)[0x7fd90913aa86]
/usr/lib/erlang/erts-5.10.4/bin/beam.smp(erts_port_output+0x117a)[0x4988fa]
/usr/lib/erlang/erts-5.10.4/bin/beam.smp(erts_port_command+0x4b9)[0x49a6b9]
/usr/lib/erlang/erts-5.10.4/bin/beam.smp(erl_send+0x84e)[0x4874fe]
/usr/lib/erlang/erts-5.10.4/bin/beam.smp(process_main+0x4359)[0x5599b9]
/usr/lib/erlang/erts-5.10.4/bin/beam.smp[0x4a283a]
/usr/lib/erlang/erts-5.10.4/bin/beam.smp[0x5dc595]
/usr/lib/libpthread.so.0(+0x80a2)[0x7fd911b800a2]
/usr/lib/libc.so.6(clone+0x6d)[0x7fd9116add1d]
======= Memory map: ========
00400000-00645000 r-xp 00000000 08:05 950201 /usr/lib/erlang/erts-5.10.4/bin/beam.smp
00844000-00845000 r--p 00244000 08:05 950201 /usr/lib/erlang/erts-5.10.4/bin/beam.smp
00845000-00897000 rw-p 00245000 08:05 950201 /usr/lib/erlang/erts-5.10.4/bin/beam.smp
00897000-008b9000 rw-p 00000000 00:00 0
02827000-02869000 rw-p 00000000 00:00 0 [heap]
7fd8cc000000-7fd8cc009000 rw-p 00000000 00:00 0
7fd8cc009000-7fd8d0000000 ---p 00000000 00:00 0
7fd8d0000000-7fd8d0009000 rw-p 00000000 00:00 0
7fd8d0009000-7fd8d4000000 ---p 00000000 00:00 0
7fd8d4000000-7fd8d4009000 rw-p 00000000 00:00 0
7fd8d4009000-7fd8d8000000 ---p 00000000 00:00 0
7fd8d8000000-7fd8d8009000 rw-p 00000000 00:00 0
7fd8d8009000-7fd8dc000000 ---p 00000000 00:00 0
7fd8dc000000-7fd8dc008000 rw-p 00000000 00:00 0
7fd8dc008000-7fd8e0000000 ---p 00000000 00:00 0
7fd8e4000000-7fd8e4008000 rw-p 00000000 00:00 0
7fd8e4008000-7fd8e8000000 ---p 00000000 00:00 0
7fd8ec000000-7fd8ec838000 rw-p 00000000 00:00 0
7fd8ec838000-7fd8f0000000 ---p 00000000 00:00 0
7fd8f0000000-7fd8f0008000 rw-p 00000000 00:00 0
7fd8f0008000-7fd8f4000000 ---p 00000000 00:00 0
7fd8f4000000-7fd8f4034000 rw-p 00000000 00:00 0
7fd8f4034000-7fd8f8000000 ---p 00000000 00:00 0
7fd8fc000000-7fd8fc008000 rw-p 00000000 00:00 0
7fd8fc008000-7fd900000000 ---p 00000000 00:00 0
7fd900000000-7fd900008000 rw-p 00000000 00:00 0
7fd900008000-7fd904000000 ---p 00000000 00:00 0
7fd904000000-7fd904008000 rw-p 00000000 00:00 0
7fd904008000-7fd908000000 ---p 00000000 00:00 0
7fd908d18000-7fd908d2d000 r-xp 00000000 08:05 919455 /usr/lib/libgcc_s.so.1
7fd908d2d000-7fd908f2d000 ---p 00015000 08:05 919455 /usr/lib/libgcc_s.so.1
7fd908f2d000-7fd908f2e000 rw-p 00015000 08:05 919455 /usr/lib/libgcc_s.so.1
7fd908f30000-7fd908f38000 r-xp 00000000 08:05 790407 /home/quinlan/src/Project-Oleg/build/lib/liboleg.so
7fd908f38000-7fd909137000 ---p 00008000 08:05 790407 /home/quinlan/src/Project-Oleg/build/lib/liboleg.so
7fd909137000-7fd909138000 rw-p 00007000 08:05 790407 /home/quinlan/src/Project-Oleg/build/lib/liboleg.so
7fd909138000-7fd90913f000 r-xp 00000000 08:05 799435 /home/quinlan/src/Project-Oleg/build/lib/libolegserver.so
7fd90913f000-7fd90933e000 ---p 00007000 08:05 799435 /home/quinlan/src/Project-Oleg/build/lib/libolegserver.so
7fd90933e000-7fd90933f000 rw-p 00006000 08:05 799435 /home/quinlan/src/Project-Oleg/build/lib/libolegserver.so
7fd909440000-7fd909540000 rw-p 00000000 00:00 0
7fd909640000-7fd909780000 rw-p 00000000 00:00 0
7fd909840000-7fd909980000 rw-p 00000000 00:00 0
7fd9099c0000-7fd909e80000 rw-p 00000000 00:00 0
7fd909eb7000-7fd909eb8000 ---p 00000000 00:00 0
7fd909eb8000-7fd90a6b8000 rw-p 00000000 00:00 0 [stack:13682]
7fd90a6b8000-7fd90a6b9000 ---p 00000000 00:00 0
7fd90a6b9000-7fd90aeb9000 rw-p 00000000 00:00 0 [stack:13681]
7fd90aeb9000-7fd90aeba000 ---p 00000000 00:00 0
7fd90aeba000-7fd90b6ba000 rw-p 00000000 00:00 0 [stack:13680]
7fd90b6ba000-7fd90b6bb000 ---p 00000000 00:00 0
7fd90b6bb000-7fd90bebb000 rw-p 00000000 00:00 0 [stack:13679]
7fd90bebb000-7fd90bebc000 ---p 00000000 00:00 0
7fd90bebc000-7fd90c6bc000 rw-p 00000000 00:00 0 [stack:13678]
7fd90c6bc000-7fd90c6bd000 ---p 00000000 00:00 0
7fd90c6bd000-7fd90cebd000 rw-p 00000000 00:00 0 [stack:13677]
7fd90cebd000-7fd90cebe000 ---p 00000000 00:00 0
7fd90cebe000-7fd90d6be000 rw-p 00000000 00:00 0 [stack:13676]
7fd90d6be000-7fd90d6bf000 ---p 00000000 00:00 0
7fd90d6bf000-7fd90debf000 rw-p 00000000 00:00 0 [stack:13675]
7fd90debf000-7fd90dec0000 ---p 00000000 00:00 0
7fd90dec0000-7fd90e900000 rw-p 00000000 00:00 0 [stack:13674]
7fd90e92f000-7fd90e930000 ---p 00000000 00:00 0
7fd90e930000-7fd90e951000 rw-p 00000000 00:00 0 [stack:13672]
7fd90e951000-7fd90e952000 ---p 00000000 00:00 0
7fd90e952000-7fd90e973000 rw-p 00000000 00:00 0 [stack:13671]
7fd90e973000-7fd90e974000 ---p 00000000 00:00 0
7fd90e974000-7fd90e995000 rw-p 00000000 00:00 0 [stack:13670]
7fd90e995000-7fd90e996000 ---p 00000000 00:00 0
7fd90e996000-7fd90e9b7000 rw-p 00000000 00:00 0 [stack:13669]
7fd90e9b7000-7fd90e9b8000 ---p 00000000 00:00 0
7fd90e9b8000-7fd90e9d9000 rw-p 00000000 00:00 0 [stack:13668]
7fd90e9d9000-7fd90e9da000 ---p 00000000 00:00 0
7fd90e9da000-7fd90e9fb000 rw-p 00000000 00:00 0 [stack:13667]
7fd90e9fb000-7fd90e9fc000 ---p 00000000 00:00 0
7fd90e9fc000-7fd90ea1d000 rw-p 00000000 00:00 0 [stack:13666]
7fd90ea1d000-7fd90ea1e000 ---p 00000000 00:00 0
7fd90ea1e000-7fd90ea3f000 rw-p 00000000 00:00 0 [stack:13665]
7fd90ea3f000-7fd90ea40000 ---p 00000000 00:00 0
7fd90ea40000-7fd90f680000 rw-p 00000000 00:00 0 [stack:13662]
7fd90f69d000-7fd90f69e000 ---p 00000000 00:00 0
7fd90f69e000-7fd90f6bf000 rw-p 00000000 00:00 0 [stack:13664]
7fd90f6bf000-7fd90f6c0000 ---p 00000000 00:00 0
7fd90f6c0000-7fd90ff00000 rw-p 00000000 00:00 0 [stack:13661]
7fd90ff1c000-7fd9115c8000 rw-p 00000000 00:00 0
7fd9115c8000-7fd911766000 r-xp 00000000 08:05 920809 /usr/lib/libc-2.19.so
7fd911766000-7fd911966000 ---p 0019e000 08:05 920809 /usr/lib/libc-2.19.so
The library broke shit.
A single LICENSE file should suffice. Add useful documentation to the top instead.
http://beej.us/guide/bgipc/output/html/singlepage/bgipc.html#fork
Basically fork() creates a child process, which has a snapshot of the parent process's memory. Then when that child process fucks around with the new memory it doesn't get changed on the parent's side. Now we have to do IPC and whatever to communicate between the threads to make sure the data is synchronized.
When do we rewrite in Erlang?
Would be cool to be able to differentiate updated/created on a POST.
It would be a good idea to store keys in a way that is seperate from the hash table. With a splay tree, we get amortized time of O(log n) time, with the added advantage that recently accessed elements are easier to get again.
In addition to having a tree to store extra keys, this will allow us to iterate through them (in case people want to aggregate multiple records) and cleanly get keys when cleaning up the database. ol_close
currently just loops through every slot in the bucket list to see if there is an ol_bucket there.
Alternate data structures encourages, but I figured this one could be fun because B+-trees are too boring.
ol_restore reads in commands from the AOL file and then replicates them by calling the corresponding ol_* functions (JAR -> ol_jar, SCOOP -> ol_scoop). These in turn call ol_aol_write_cmd, which results in more data being written to the AOL file. This is unnecessary.
We should take note that the database is currently doing a restore, and that commands should not be written to the aol.
Commandline or otherwise. Right now I'm not sure how to interpret commandline things from erlang.
Things we want to be able to configure:
I'm getting segfaults on "test_lots_of_deletes" under FreeBSD:
Apr 23 13:24:18 [-] ----- test_lots_of_deletes -----
Apr 23 13:24:18 [-] Opened DB: 0x801407380.
Apr 23 13:24:20 [-] Records inserted: 1000000.
Apr 23 13:24:20 [-] Saw 516494 collisions.
./run_tests.sh: line 4: 98102 Segmentation fault (core dumped) ./build/bin/oleg_test test
gmake: *** [test] Error 139
OpenBSD seems to be more verbose:
Apr 23 15:29:47 [-] ----- test_lots_of_deletes -----
Apr 23 15:29:47 [-] Opened DB: 0xf48590ee800.
Apr 23 15:30:02 [-] Records inserted: 1000000.
Apr 23 15:30:02 [-] Saw 516494 collisions.
oleg_test(14280) in free(): error: chunk is already free 0xf4843086140
./run_tests.sh: line 4: 14280 Abort trap (core dumped) ./build/bin/oleg_test test
Makefile:89: recipe for target 'test' failed
gmake: *** [test] Error 134
$ curl -v -X POST -d @test_file.txt http://localhost:8080/turtles/test_book
* Hostname was NOT found in DNS cache
* Trying ::1...
* connect to ::1 port 8080 failed: Connection refused
* Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> POST /turtles/test_book HTTP/1.1
> User-Agent: curl/7.35.0
> Host: localhost:8080
> Accept: */*
> Content-Length: 1215848
> Content-Type: application/x-www-form-urlencoded
> Expect: 100-continue
>
< HTTP/1.1 404 Not Found
< Status: 404 Not Found
* Server OlegDB/fresh_cuts_n_jams is not blacklisted
< Server: OlegDB/fresh_cuts_n_jams
< Content-Length: 26
< Connection: close
< Content-Type: text/plain
<
These aren't your ghosts.
* Closing connection 0
Data is currently returned from the port driver as some random list of integers. Needs to be sent back to the client.
Doing an ol_jar
with a key that already exists will just append a new structure. This ain't right.
Keys inserted, uptime, collisions, db size, etc.
Right now we return a pointer to the data stored in the DB. Thats all we do. This has some limitations:
Refactoring this to return an integer (1 if not found, 0 if found, -1 on error or something similar) and then copy the datasize, value, content type and content type size into a passed argument is a much cleaner way to handle this.
Somehow bucket->next
is being set, only sometimes, to a bogus value that doesn't equal NULL. This breaks checks when traversing the linked lists and causes Oleg to panic.
Possibly related to ol_spoil?
Edit: It's probably do to the fact that the stack-allocated struct tm
pointers expire when they go out of scope, so Oleg is trying to read bad regions of memory. Will probably be fixed with the new key expiration stuff in feature/HEAD_expiration_time.
Doing some testing with wrk I was able to hit the system socket limit fairly easily, I think this means I'm not closing and getting rid of sockets when I'm done using them.
http://docs.couchdb.org/en/latest/maintenance/performance.html
This makes responses easier to parse by everybody else. If I put JSON in, I want JSON out.
Right now the message buffer is 1000 chars, make it UNLIMITED
Currently we have unjar, ol_content_type and ol_unjar_ds. Maybe we should just return ol_bucket objects.
Use case:
In the server frontend, I need to do two seperate gets to retrieve the object/data size, and the content type. All of this data is necessary when responding to a request.
Currently SLAUGHTER, CONSUME, CARESS etc. are unused.
parse_header(Data, Record) ->
...
LowercaseHeader = binary:list_to_bin(
string:to_lower(
binary:bin_to_list(Header)
)
),
...
This is slow as hell. Converting form a binary to a list and back again is not efficient. Find some way to convert bytearrays to lowercase natively without doing this conversion.
Modulo = 2 slow, gotta go fast
I use OlegDB to cache http://merveill.es/ and after about a day or so of running, a key is requested, expired and then hangs OlegDB. This is not awesome. Figure it out.
Full log here, includes several restarts: http://a.pomf.se/wtkugz.log
We need like, a website. With tarballs and SHA256 sums n' stuff.
We are terrible at docs
http://ericholscher.com/blog/2014/feb/27/how-i-judge-documentation-quality/
I am seeing a lot of buckets with the incorrect content-type. They're just empty strings sometimes, but the ctype_size
member on the bucket struct reflects the correct value. Odd.
Hide all _ol functions when compiling library:
http://gcc.gnu.org/wiki/Visibility
We don't need to export functions like _ol_trunc
that are used internally.
Also make the header CPP compatible.
People want pub/sub or evented data.
Had a user say that 4.2 on OS X 10.6 doesn't have strnlen. Need to work around it.
gen_server has a lot of niceties and industry Best Practices that we should probably use for the HTTP server.
Theres no reason OlegDB won't work on 32-bit platforms. Just debug the weird issues.
We need docs. Auto generate or something? Man I don't know.
https://code.google.com/p/lz4/source/browse/#svn%2Ftrunk
@Hamcha says it's super fast. We can just gank the .c/.h files and get to it.
See SophiaDB's sp_get or our very own ol_unjar_ds for an example of this paradigm. This is an architecture problem that was not foreseen (like we have any foresight...)
Fixing the functions in this way will help three-fold:
yo, make sure the test cares
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.