mapbox / carmen-cache Goto Github PK
View Code? Open in Web Editor NEWC++ protobuf cache used by carmen
License: BSD 2-Clause "Simplified" License
C++ protobuf cache used by carmen
License: BSD 2-Clause "Simplified" License
In spatialmatch
branch I'm seeing:
CXX(target) Release/obj.target/carmen/src/binding.o
../src/binding.cpp:1107:22: warning: declaration shadows a local variable [-Wshadow]
Local<Array> array = vectorToArray(it->second);
^
../src/binding.cpp:1089:18: note: previous declaration is here
Local<Array> array = Local<Array>::Cast(args[0]);
^
Tests all pass for me currently, but fail after fixing this variable clash:
not ok 50 should be equivalent
---
operator: deepEqual
expected:
{ '11/441/770': undefined, '11/441/771': undefined, '11/442/770': undefined, '11/442/771': undefined, '11/498/724': undefined, '11/610/758': undefined, '11/610/759': undefined, '11/611/758': undefined, '9/151/188': undefined, '9/151/189': undefined, '9/152/188': undefined, '9/152/189': undefined, '9/152/190': undefined, '9/153/188': undefined, '9/153/189': undefined, '9/153/190': undefined, '9/154/188': undefined, '9/154/189': undefined, '9/154/190': undefined, '9/154/191': undefined, '9/155/188': undefined, '9/155/189': undefined, '9/155/190': undefined, '9/155/191': undefined, '9/156/189': undefined, '9/156/190': undefined, '9/156/191': undefined }
actual:
{ '11/441/770': '100007711', '11/441/771': '100007711', '11/442/770': '100007711', '11/442/771': '100007711', '11/498/724': '100131599', '11/610/758': '100014180-495', '11/610/759': '100014180-495', '11/611/758': '100014180-495', '9/151/188': '495-496', '9/151/189': '495', '9/152/188': '495', '9/152/189': '495', '9/152/190': '495', '9/153/188': '495', '9/153/189': '495', '9/153/190': '495', '9/154/188': '495', '9/154/189': '495', '9/154/190': '495', '9/154/191': '495', '9/155/188': '495', '9/155/189': '495', '9/155/190': '495', '9/155/191': '495', '9/156/189': '495', '9/156/190': '495', '9/156/191': '495' }
at: Test.<anonymous> (/Users/dane/projects/carmen-cache/test/coalesceZooms.test.js:131:7)
...
While looking at #45 I noticed two places with combinations of while()
and break
:
https://github.com/mapbox/carmen-cache/blob/master/src/binding.cpp#L110-L111
This can probably just be while(item.next(1)) {
and
https://github.com/mapbox/carmen-cache/blob/master/src/binding.cpp#L372-L375
Shouldn't that just be if (item.next(1)) {
?
We are currently using a protozero
header at some git hash. No problems or urgency with upgrading. But I'll hit this once the first official tag is out mapbox/protozero#4.
With https://github.com/mapbox/carmen/commit/c16b49098463065a when running: bench/phrasematch.js
the process segfaults.
I've been debugging this for many hours without finding a clear culprit, so I'll document what I'm seeing and thinking:
master
, v0.16.3
, and v0.15.0
, v0.14.0
, and v0.12.1
cache::_get
and cache::has
std::string
constructorstd::string
constructor and std::strlen
tries to compute its length. This can be reduced to: const char * ptr = NULL;
std::string shard = ptr; // will segfault on osx (libc++) and throw 'std::logic_error' on linux (libstdc++)
console.log
before the call to _get
or has
in JS slows down the benchmark and 99% of the time avoids the crash.console.log
in to count how many times phrasematch
had been called in the benchmark. This helped me see that phrasematch
had run successfully ~ 11,000 before crashing.cache::has
here by throwing an error on a empty string argument var->ToString().IsEmpty()
moves the crash predictably to cache::get
cache::has
and cache::_get
(by throwing a JS error on an empty string) leads to no segfault AND also no JS errors being thrown. Crazy, right?std::string
from a NULL appears to stop there ever being a NULL???dawg
cache is because the program runs faster and the problem condition happens predictably._get
or has
) then the problem goes away, another indication that the program needs to run fast to hit the problemI think there are multiple bugs in play. One bug is that we don't protect from invalid/null strings from being sent to std::string
. However, while we should protect from this, my sense is that a null ptr should not likely happen under normal conditions. So I suspect memory corruption or a race condition somewhere else in carmen cache loading. This then creates another condition where other bugs like the lack of NULL checking, manifest.
In carmen-cache we frequently convert integers from JS to C++ types. During this conversion we often need to "narrow" the integer types by casting from a type that supports a larger range (e.g int
) to a type that supports a smaller range (e.g. short
).
To ensure we can cast values with predictable results in the face of invalid input we should move away from static_cast
to using something like boost's numeric_cast
which will throw an exception when a value cannot be safely cast to fit in a new type.
Since we don't want to depend on something as large as boost for carmen-cache we should look for a header-only, small library to do this for us.
About a year ago @KaiBot3000, @aarthykc, and me put carmen under load, profiled with perf
, and noticed a single, significant bottleneck in carmen-cache.
The bottleneck was the sorting of Context
objects, specifically contextSortByRelev
. This makes sense given Context
objects are large (copying them is expensive), there may be a lot of them collected in memory, and the sort function is not simple.
After #93 we should see a slight speedup / reduction in the cost of this sorting because Context
objects are now moved rather than copied. However I would not be surprised if the top bottleneck in carmen is still this sorting in carmen-cache. So this ticket stands to:
std::partial_sort
Details:
After successfully running make
, where is the binary placed?
We're running into some issues with shard-carmen
that may be caused by rocksdb. I don't know where to start digging into this issue because I haven't worked with carmen-cache yet ๐
Let's get on the most recent stable version as an introduction to how we use it in carmen-cache.
cc'ing @aarthykc who may want to join in and @springmeyer who's agreed to spirit-guide.
To catch memory leaks, invalid memory access, and overflows before they cause bugs we should start running the clang++ sanitizers per commit on travis. The main ones to focus on are:
-fsanitize=address
) (which on linux also enables the Leak Sanitizer if ASAN_OPTIONS=detect_leaks=1
)-fsanitize=integer
) which warns about integer overflows (you need to watch the logs to see if they happen)The way these work is you build with the given flag in the CXXFLAGS and LDFLAGS to instrument the binaries. And then you run the tests. At runtime the sanitizer watches the program and either throws (in the case of address/leak sanitizer) or warns (in the case of the integer sanitizer).
One gocha however is that for a node module loaded with dlopen (how node C++ modules are loaded) the sanitizers need to also be built into the node binary. I've rolled a custom node.js binary that is instrumented this way: https://github.com/mapbox/mason/tree/master/scripts/node_asan/4.4.4.
TODO:
-fsanitize=integer
Note: due to leaks in node v4.4.4 itself the travis jobs will need to set export ASAN_OPTIONS=detect_leaks=0
while building carmen-cache and then again will want to set export ASAN_OPTIONS=detect_leaks=1
before running the tests.
@yhahn pushed a few minor fixes to https://github.com/mapbox/carmen-cache/commits/big-pants. All tests pass, but feel free to revert if these get in your way (meant to push to a branch, but jfdi)
For this task you'll be getting familiar with
binding.cpp
fileencodeGrid()
cpp function
uint64_t
Cover
struct (https://github.com/mapbox/carmen-cache/blob/master/src/binding.cpp#L1077-L1089)grid.encode()
method in JavaScript (https://github.com/mapbox/carmen-cache/blob/master/test/grid.js#L16-L27)encodeGrid()
cpp function to JS as a MemoryCache methodcache.encodeGrid()
behaves the same grid.encode()
For this task you'll be getting familiar with
Create a new test file for your tests at test/cache-setgrid.test.js
Start by adding the example API usage from the main quest ticket (#99) to your file:
var carmenCache = require('carmen-cache');
var grids = new carmenCache.MemoryCache('grids');
var a = 7036891598684201;
var b = 4785263596273714;
var c = 7036994682093628;
// store the data in the cache
grids._set('paris', [a,b,c]);
Because your test file is actually within the carmen-cache
module, you'll need to reference it in your require using a relative path like this:
var carmenCache = require('../index.js');
Run your test file using node test/cache-setgrid.test.js
to make sure there are no syntax errors.
Sketch out basic unit tests for the existing API:
_set()
using paris
as the cache key and with a list of numbers_get()
using paris
as the cache key and get a list of numbers backTo do this, you can make use of the tape API for defining tests:
var tape = require('tape');
tape('basic API', function(assert) {
// code and assertions run here
var foo = 5 + 7;
assert.equal(foo, 12, 'addition works!');
// when your test is done you must call assert.end() explicitly
assert.end();
});
Sketch out the test flow you'd like for respresenting the new API:
_set()
using paris
as the cache key and with a list of grid objects_get()
using paris
as the cache key and get a list of numbers backFor verifying that the conversion occurred correctly, you can make use of the test/grid.js
library which provides utility encode/decode functions for converting between grid properties and their numeric equivalents.
Make a Pull Request against the setgrid
branch with your work!
Pushing a commit with [publish binary]
along with a tag referencing that commit leads to build failures, since Travis will run duplicate build jobs that both try to publish their binaries to the same place. Whichever one finishes first will succeed, the other will fail. This causes many commits to be incorrectly flagged as failing CI, though it doesn't cause really any substantive problems beyond that.
This issue was reported in mapbox/node-cpp-skel#108, and seems to have been fixed with mapbox/node-cpp-skel#139, so I think the next steps are:
carmen-cache/scripts/publish.sh
We are already using protozero for fast parsing of protobuf messages.
Now that [protozero]
has a robust and well tested interface for writing (docs at https://github.com/mapbox/protozero/blob/master/tutorial.md#writing-protobuf-encoded-messages) we can drop the external dependency on libprotobuf
, which is currently only used in carmen::Cache::pack
.
TODO:
Line 21 in 5840b1f
Lines 51 to 58 in 126f7d3
carmen::Cache::pack
to use protozero writer (drop usage of carmen::proto
): https://github.com/mapbox/carmen-cache/blob/master/src/binding.cpp#L196-L258Now that the pieces are in place let's look into wiring it all up:
_set()
functioncc @fredngeorge
Transferring braindump notes to here. This issue affects both carmen-cache and carmen, but the problem originates mostly from the grid cache outward so capturing here for now.
Feature storage is sharded right now not just for historical reasons but because the grid storage in our PBF format currently limits us 20-bit feature IDs.
Even with a larger feature ID space (e.g. if we went from 53 bits => 64 bits) we would probably encounter feature ID collisions at which point we'd need to store them in a nested fashion with some kind of secondary factor (like cover zxy) to disambiguate where possible.
Example:
feature A 120951912 \
+----> store together, distinguish by id + zxy
feature B 120951912 /
I think these would all be very positive changes, just a pretty solid, invasive refactor
cc @mapbox/geocoding-gang
For later, but I think we can nuke the type concept in the cache. The calling module can basically handle type segmentation by using multiple cache instances.
I think once the type convention is gone this can basically be a protobuf-powered sharded one-to-many 32-bit unsigned int cache. Not sure how generically useful that is though : )
cc @springmeyer
@springmeyer I merged the Node 8/10 support PR (#122) and have been working on getting Carmen running on Node 8, but am hitting an Illegal instruction
crasher on circle. SSHing into the Circle environment, it looks like it's in carmen-cache -- just running node index.js
within carmen-cache triggers the crash:
root@ae104a9c3e0f:~/project/node_modules/@mapbox/carmen-cache# lldb node index.js
(lldb) target create "node"
Current executable set to 'node' (x86_64).
(lldb) settings set -- target.run-args "index.js"
(lldb) run
Process 8818 launched: '/root/.nvm/versions/node/v8.11.3/bin/node' (x86_64)
Process 8818 stopped
* thread #1, name = 'node', stop reason = signal SIGILL: illegal instruction operand
frame #0: 0x00007ffff461cfb2 carmen.node`_GLOBAL__sub_I_env_posix.cc + 6434
carmen.node`_GLOBAL__sub_I_env_posix.cc:
-> 0x7ffff461cfb2 <+6434>: vpbroadcastq %xmm0, %xmm0
0x7ffff461cfb7 <+6439>: vmovdqu %xmm0, 0x451519(%rip) ; rocksdb::(anonymous namespace)::lockedFiles + 24
0x7ffff461cfbf <+6447>: vzeroupper
0x7ffff461cfc2 <+6450>: callq 0x7ffff45df080 ; symbol stub for: __cxa_atexit
with full (not very helpful because of lack of debug symbols) backtrace:
(lldb) bt
* thread #1, name = 'node', stop reason = signal SIGILL: illegal instruction operand
* frame #0: 0x00007ffff461cfb2 carmen.node`_GLOBAL__sub_I_env_posix.cc + 6434
frame #1: 0x00007ffff7de5733 ld-linux-x86-64.so.2`___lldb_unnamed_symbol50$$ld-linux-x86-64.so.2 + 259
frame #2: 0x00007ffff7dea1ff ld-linux-x86-64.so.2`___lldb_unnamed_symbol81$$ld-linux-x86-64.so.2 + 1087
frame #3: 0x00007ffff6bdc2df libc.so.6`_dl_catch_exception + 111
frame #4: 0x00007ffff7de97ca ld-linux-x86-64.so.2`___lldb_unnamed_symbol79$$ld-linux-x86-64.so.2 + 186
frame #5: 0x00007ffff7bd1f96 libdl.so.2`___lldb_unnamed_symbol4$$libdl.so.2 + 86
frame #6: 0x00007ffff6bdc2df libc.so.6`_dl_catch_exception + 111
frame #7: 0x00007ffff6bdc36f libc.so.6`_dl_catch_error + 47
frame #8: 0x00007ffff7bd2735 libdl.so.2`___lldb_unnamed_symbol11$$libdl.so.2 + 101
frame #9: 0x00007ffff7bd2051 libdl.so.2`dlopen + 113
frame #10: 0x0000000001414bf7 node`uv_dlopen(filename="/root/project/node_modules/@mapbox/carmen-cache/lib/carmen.node", lib=0x00007fffffffb8f0) at dl.c:36
frame #11: 0x00000000008c7d75 node`node::DLOpen(v8::FunctionCallbackInfo<v8::Value> const&) + 245
frame #12: 0x0000000000a5ff63 node`v8::internal::FunctionCallbackArguments::Call(void (*)(v8::FunctionCallbackInfo<v8::Value> const&)) + 403
frame #13: 0x0000000000ad710c node`v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) + 332
frame #14: 0x0000000000ad7d5f node`v8::internal::Builtin_HandleApiCall(int, v8::internal::Object**, v8::internal::Isolate*) + 175
frame #15: 0x00003f87a27842fd
frame #16: 0x00003f87a283d1d6
frame #17: 0x00003f87a283d1d6
frame #18: 0x00003f87a283d1d6
frame #19: 0x00003f87a283d1d6
frame #20: 0x00003f87a283d1d6
frame #21: 0x00003f87a283d1d6
frame #22: 0x00003f87a283d1d6
frame #23: 0x00003f87a283d1d6
frame #24: 0x00003f87a283d1d6
frame #25: 0x00003f87a283d1d6
frame #26: 0x00003f87a283d1d6
frame #27: 0x00003f87a283d1d6
frame #28: 0x00003f87a283d1d6
frame #29: 0x00003f87a283d1d6
frame #30: 0x00003f87a283d1d6
frame #31: 0x00003f87a2784239
frame #32: 0x00003f87a2784101
frame #33: 0x0000000000d6f94a node`v8::internal::Execution::Call(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*) + 266
frame #34: 0x0000000000a428f3 node`v8::Function::Call(v8::Local<v8::Context>, v8::Local<v8::Value>, int, v8::Local<v8::Value>*) + 355
frame #35: 0x00000000008ca306 node`node::LoadEnvironment(node::Environment*) + 566
frame #36: 0x00000000008d42cd node`node::Start(uv_loop_s*, int, char const* const*, int, char const* const*) + 829
frame #37: 0x00000000008cc8fd node`node::Start(int, char**) + 365
frame #38: 0x00007ffff6a96b97 libc.so.6`__libc_start_main + 231
frame #39: 0x000000000089b1b1 node`_start + 41
So, looks like an AVX instruction (vbroadcastq
maybe?) in RocksDB that's doing it. The CPUs we're running on on Circle look like this:
processor : 31
vendor_id : GenuineIntel
cpu family : 6
model : 62
model name : Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
stepping : 4
microcode : 0x42c
cpu MHz : 2800.092
cache size : 25600 KB
physical id : 1
siblings : 16
core id : 7
cpu cores : 8
apicid : 47
initial apicid : 47
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm kaiser fsgsbase smep erms xsaveopt
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass
bogomips : 5675.20
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
so I think should support this instruction? But I'm uncertain as to why this would be happening now, since if I'm not mistaken both the Node 6 and Node 8 builds use the same precompiled RocksDB static binary from Mason, so I wouldn't expect them to be any different?
Anyhow, I'm pretty unsure as to how to proceed from here. Here's the Circle build if you have a sec to take a look.
Something is odd with tests on travis. Different numbers of tests are running (fewer) on travis that I see locally. And I don't see the expected output from test/coalesce.bench.test.js
of ok 116 coalesceSingle + proximity @ ...
. It seems as if this critical test is not running on travis....
/cc @apendleton @yhahn to see if they have any clues.
While testing coalesce
I ran into an odd crash. Turns out is was API misusage on my part. Nevertheless we should protect against invalid Cache
objects being sent in the array arg to coalesce
. We can do this by calling:
if (!Nan::New(Cache::constructor)->HasInstance(obj)) {
.. throw ..
}
This reduced testcase:
var coalesce = require('.').Cache.coalesce;
coalesce([{}]);
Triggers:
FATAL ERROR: v8::Object::GetAlignedPointerFromInternalField() Not a Smi
Abort trap: 6
We should try estimating the memory usage of carmen cache and setting v8::AdjustAmountOfExternalAllocatedMemory()
. This should help with scenarios where:
Using v8::AdjustAmountOfExternalAllocatedMemory()
will likely slow down the performance of repeated calls to cache.loadSync
but may allow node to avoid using too much memory for temporary objects created in the same code that is calling cache.loadSync
.
refs https://github.com/nodejs/nan/blob/master/doc/v8_internals.md#nanadjustexternalmemory
Consultation/hint/tips on this stumper pls ...
There is a test fail after this fix that has me stumped. https://travis-ci.org/mapbox/carmen-cache/builds/46816335. This change attempts to preserve the order of spatialMatch results when equal relevs are compared -- either the order of the results going into the sort is different or the sort itself is behaving differently:
Tips on where/how to debug here?
As a part of a larger set of contributions @apendleton and I are attempting to refactor the repo with the goal to make it more readable and useful overall.
API.md
from code commentsbinding.cpp
file out into dedicated source files:
.travis.yml
If a PhrasematchSubq array, that is missing a property, is passed to the coalesce
function we incorrectly attempt to cast the value to a C++ type.
This manifests as a crash (SIGILL) when using a node binary build in Debug mode.
# coalesceSingle
#
# Fatal error in ../deps/v8/src/objects-inl.h, line 1117
# Check failed: IsNumber().
#
==== C stack trace ===============================
1: V8_Fatal
2: v8::internal::Object::Number()
3: v8::Value::IntegerValue(v8::Local<v8::Context>) const
4: v8::Value::IntegerValue() const
5: carmen::Cache::coalesce(Nan::FunctionCallbackInfo<v8::Value> const&)
6: Nan::imp::FunctionCallbackWrapper(v8::FunctionCallbackInfo<v8::Value> const&)
7: v8::internal::FunctionCallbackArguments::Call(void (*)(v8::FunctionCallbackInfo<v8::Value> const&))
8: v8::internal::MaybeHandle<v8::internal::Object> v8::internal::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::(anonymous namespace)::BuiltinArguments<(v8::internal::BuiltinExtraArguments)1>&)
9: v8::internal::Builtin_Impl_HandleApiCall(v8::internal::(anonymous namespace)::BuiltinArguments<(v8::internal::BuiltinExtraArguments)1>, v8::internal::Isolate*)
10: v8::internal::Builtin_HandleApiCall(int, v8::internal::Object**, v8::internal::Isolate*)
11: 0x3c03c66068fb
12: 0x3c03c68f1b33
Illegal instruction: 4
I noticed this because I saw this bug uncovered via a node debug build and then ran carmen-cache with this debug node build: mapbox/mason#355. In release mode on OS X it looks like the value silently returns as zero. It is possible memory is corrupted in release mode, but it may be implementation dependent whether something terrible or undefined will happen.
I'm working on a PR that I'll queue up shortly that:
mask
here: carmen-cache/test/coalesce.bench.test.js
Lines 13 to 17 in e44699d
The scoredist
method claims to be equivalient of the method of the same name in carmen, but it really is not.
scoredist
in carmen-cache Lines 111 to 123 in fd7398e
scoredist
in carmen https://github.com/mapbox/carmen/blob/f831f2dfc9eb965752939f5809b18315b38b7659/lib/util/proximity.js#L81-L90We should most likely update the version here to actually be equivalent of what is over in carmen.
@apendleton provided a cache_bench
benchmark which exercises cxxcache.RocksDBCache
and cxxcache.coalesce
.
Using latest master at fec8605 (using node v8.11.3) here are the things I notice when running the benchmark locally on my OS X machine:
cxxcache.coalesce
using queue(1)
and therefore is not able to utilize the threadpool muchuv_work_done
uv_work_done
is garbage collection of v8 objects (initiated by node::NodePlatform::FlushForegroundTasksInteral())uv_work_done
is due to the task of returning data from the threadpool to the main event loop, which is done in the carmen::coalesceAfter
function.
carmen::coalesceAfter
looks to be bottlenecked on v8::Object::Set
which is called inside carmen::contextToArray
and carmen::coverToObject
to convert the carmen data into JS objects.worker
threads in the C++ threadpool (where carmen::coalesceMulti
does its critical work) appear to be not very busy, which is surprising. Ideally the main event loop is mostly idle (the profiler mostly sees the CPU executing in uv__io_poll
which signifies waiting for work) and the threads are nearly 100% busy. But in this case the opposite is true: the main event loop is busier than the threads.One hypothesis is that the threads are idle due to the queue(1);
being used in the bench.js script causing not much work to be sent to the threads meaning that each time work is sent it gets sent to a different thread and the work is fanned out and returns quickly. This appears to not be the case. If I run the benchmark with a threadpool size of the 1 the profiling output (and the script timing) is similar:
UV_THREADPOOL_SIZE=1 node cache_bench/bench.js
So, rather it seems like the problem is that the main event loop is doing too much work. Normally you want the hard work to be done in the threads to avoid blocking the main loop, but it appears that is not happening here (on balance). The fact that this balance is so off in the benchmark is concerning because in real-world usage a lot of other functions are going to need to execute on the main event loop, so carmen-cache taking up so much time is of great concern.
Profiling in Activity monitor shows the significant cost of the JS object conversion with carmen::coalesceAfter
:
If I change the queue(1);
to queue();
in the benchmark (to allow more async usage of the threadpool), this seems to further confirm that we've got a bad bottleneck on the JS conversion in the main event loop. Running UV_THREADPOOL_SIZE=16 node cache_bench/bench.js
finishes faster (in 14s) but at the cost of nearly entirely jamming up the main event loop with JS object conversion, with > 93% in carmen::coalesceAfter
. In all my profiling of async C++ code I've never seen this much of the main event loop taken up by object conversion from C++ to JS (even in a micro benchmark like this):
At the time this profile was taken the threads were completely idle without any work to do, likely because the main event loop could not dispatch it, given it was blocked on object conversion. This is bad.
I don't know. Surely not as severely as this micro benchmark, but profiling in production would show the level of severity of this bottleneck, so I think that is worth doing.
A solution to this problem would involve:
As far as ๐ - we currently limit to 40 features to reduce this cost. Could we possibly limit further? To 20 features? Looks like the 40 cutoff was added in #65 perhaps?
As far as ๐ธ two ideas come to mind. Instead of converting C++ objects to JS JSON-like objects (which is clearly crazy expensive) we could either:
context.getRelev()
instead of context.relev
in JS-landAll C++ code at Mapbox should be able to run warning free. We should follow the approach of mapbox/node-cpp-skel#42 to ensure that any warnings in the Carmen Cache code become errors, we fix them, and then going forward we'll be protected from bugs that might be found if we were paying attention to all warnings from various compilers.
By running carmen-cache with -fsanitze=integer
I see we only have one test that hits the if (zDist == 0)
condition at
Line 934 in cd20c47
It is the test/geocode-unit.proximity.test.js
(specifically # forward province - multilayer) in carmen as of https://github.com/mapbox/carmen/commit/dbe167d609a6cc5374a20ddd7accb81ba62edf34.
Before the if (zDist == 0)
condition the zMult
variable overflows. This is harmless because it is not used, but it still ends up being 4294967295
because target_z
and z
are both 6
and so zDist
because 0
at
Line 932 in cd20c47
Tasks:
if (zDist == 0)
condition were every accidentally removedProposed fix:
diff --git a/src/binding.cpp b/src/binding.cpp
index b6bb856..a424111b 100644
--- a/src/binding.cpp
+++ b/src/binding.cpp
@@ -903,12 +903,12 @@ ZXY pxy2zxy(unsigned z, unsigned x, unsigned y, unsigned target_z) {
// Interval between parent and target zoom level
unsigned zDist = target_z - z;
- unsigned zMult = zDist - 1;
if (zDist == 0) {
zxy.x = x;
zxy.y = y;
return zxy;
}
+ unsigned zMult = zDist - 1;
// Midpoint length @ z for a tile at parent zoom level
unsigned pMid_d = static_cast<unsigned>(std::pow(2,zDist) / 2);
Our usage pattern for coalesce involves calling coalesce from carmen multiple (up to 30) times for each query. As currently structured, there's no explicit work sharing between the coalesce calls -- they don't know about each other and are often running at the same time, and in particular, each coalesce call is responsible for retrieving its own data from the underlying store, via calls to __getmatching
. We know from profiling in #126 that __getmatching
is a major contributor to total runtime of coalesce.
The thing is, though, that multiple stacks being submitted for coalesce operations over the course of the same query will often share the some stack items. For example, for "springfield, missouri, united states", if "missouri" and "united states" both come from global indexes (as they both happen to be in our current production architecture), we'll generate a separate stack for each place index with a "springfield" in it, but with identical "missouri" and "united states" stack components, and each coalesce run will perform identical __getmatching
calls for those components, and duplicate the work of retrieving, decoding, combining, and sorting them. We do potentially reap some implicit savings because RocksDB has an internal cache that probably shields us from some disk reads on calls after the first one, but the decoding and sorting steps are entirely duplicated.
A possible change in approach would be to switch from doing each coalesce call separately to a consolidated, two-phase multiple-coalesce approach, like this:
__getmatching
calls we're going to need to doRisks:
__getmatching
results in memory at once -- these are potentially bulky, and we probably won't ever hold them all simultaneously in the current implementation because the size of the current threadpool implicitly limits how many coalesces we'll ever be doing at once. As a consequence, the max memory usage during coalesce will potentially increase; no idea if the increase will potentially be big enough to be concerningFlagging that I noticed the asan job is failing currently (or at least did fail once with #109 in a way that is unrelated to that PR). I don't have time right now to triage so I'm creating this ticket to document.
https://travis-ci.org/mapbox/carmen-cache/jobs/266152025
I can see at least 2 things happening:
pgrep
that I'm not familiar with / have not seen before: https://travis-ci.org/mapbox/carmen-cache/jobs/266152025#L1148. It is almost as if some program is shelling out to pgrep
and that tool has a leak. I wonder if what changed is that pgrep
started to be called by npm or some testing tool and this just needs to be suppressed (as it seems unrelated to carmen-cache).Both of these issues should be fixed but are not urgent / are safe to ignore. /cc @yhahn @apendleton
The ldictcache
is a set of all phrase IDs in an index. It's essentially a low-memory index to represent what can be loaded from the actual phrase => grid
index, e.g.
ldictcache
phrase => grid
index for that idPer discussion we could look into using truncated 32-bit ids in this cache. We'll have some ID collisions but it may be worth the memory tradeoff.
Goal: try to make this change that is mostly transparent to carmen downstream.
@apendleton - I've merged #42 into master. So the next version of carmen cache should be lighter (less binaries to package) and a bit faster for writes (carmen.pack
).
I'll let you decide when to release. There is no need to get #42 deployed soon, but it should be solid whenever something else triggers a release.
/cc @yhahn
Tests are failing on linux and crashing on osx with 9b86b03. One guess is memory corruption in sortRelevReason
(
Lines 1266 to 1276 in ce7b3bf
Goal: we're going to switch gears and start using your work as a dependency in a branch of carmen so you can start testing out your new API.
package.json
filesetgrid
package.json
and update the version
property to be for a prerelease of the next version.
CONTRIBUTING.md
which has instructions for triggering a binary build (https://github.com/mapbox/carmen-cache/blob/master/CONTRIBUTING.md)Next up: creating a branch in carmen
and pointing at your new build
One of the core data types in carmen-cache is a grid - a record of a particular location feature (e.g. New York City or Amsterdam) that includes its x,y coordinates and its feature ID. For some time we've packed these properties into 64 bits for efficient storage, but to make this work we've made various tradeoffs. One tradeoff is that we've had to truncate feature IDs to fit with all the other data within 64 bits. This makes looking up a feature from a grid quite challenging because the truncated feature ID may no longer be unique within a dataset.
For your sprint project we'll refactor a specific part of the carmen-cache API to prepare for a more flexible storage format in the future. We'll focus on the core get/set storage functions in carmen-cache.
A grid represents a geographic area (x, y) covered by a single feature (id) with information corresponding to how relevant and important it might be to a certain textual query (relev, score).
In JavaScript, we work with grids in the carmen codebase as objects, e.g.
var grid = {
x: 3,
y: 17,
relev: 1.0,
score: 5,
id: 15
};
This would be a valid grid object.
To store grids efficiently we have a little recipe for converting the grid object into a single number value and back:
Grids are encoded as 53-bit integers (largest possible JS integer) storing the following information:
info bits description x 14 x tile cover coordinate, up to z14 y 14 y tile cover coordinate, up to z14 relev 2 relev between 0.4 and 1.0 (possible values: 0.4, 0.6, 0.8, 1.0) score 3 score scaled to a value between 0-7 id 20 feature id, truncated to 20 bits
From https://github.com/mapbox/carmen/blob/master/README.md#grid
The idea of storing multiple properties in a single number is an encoding technique that is often used for optimization purposes. A simple example of this concept: suppose you've surveyed a bunch of people on their age and also whether they are happy about getting older. You might have two fields as a result:
name | age | happy_about_it |
---|---|---|
Jane | 31 | n |
John | 7 | y |
Joan | 91 | y |
Jeff | 45 | n |
A human-friendly formula to encode these two values into a single number might be:
You would then end up with a single value that captures both pieces of information (if you know the recipe):
name | age_and_happy_about_it |
---|---|
Jane | 310 |
John | 71 |
Joan | 911 |
Jeff | 450 |
From a computer's perspective this recipe is pretty inefficient though - you'll never end up using digits 2-9 in the ones column. Since happy_about_it
is a boolean (only two possible values), you might use a recipe more oriented toward binary like:
name | age_and_happy_about_it |
---|---|
Jane | 62 |
John | 15 |
Joan | 183 |
Jeff | 90 |
Harder to read as a human but just as easy to reverse as a computer.
_set()
APIOur current carmen cache set API works like this:
var carmenCache = require('carmen-cache');
var grids = new carmenCache.MemoryCache('grids');
var a = 7036891598684201;
var b = 4785263596273714;
var c = 7036994682093628;
grids._set('paris', [a,b,c]);
It expects you to pass an array of numbers, where you've done the hard work of converting your grids to numbers. Because of this API design, it also means that changing how we store this data isn't a decision carmen-cache
can make alone - we'd need to update all the code that calls _set()
as well.
_set()
do the encodingThe goal of this sprint is to get to the point where _set()
can optionally accept an array of grid objects as well and do the conversion internally itself. So our new API will look like:
var carmenCache = require('carmen-cache');
var grids = new carmenCache.MemoryCache('grids');
var a = { x: 3, y: 17, relev: 1.0, score: 5, id: 15 };
var b = { x: 4, y: 16, relev: 0.8, score: 3, id: 16 };
var c = { x: 3, y: 10, relev: 0.6, score: 2, id: 17 };
grids._set('paris', [a,b,c]);
And it will be up to _set()
to do the appropriate conversion to how it wants to store the data. For now we'll use the exact same storage strategy (converting to a single number) but this API change will give us room in the future to experiment with more flexible ways of storing the data seamlessly.
For each item youโll create a new branch that youโll turn into a pull request.
_set()
where the data passed is an array of objects with grid properties instead of integers.
_set()
API from the JS sideencodeGrid()
cpp method and expose it to the JS API for testing
grid.encode()
JS method so that _set()
can convert grid properties to integers in cpp_set()
to be able to handle JS object input and convert those objects to integers
cc @fredngeorge
I've emailed support requesting. Will post back here soon.
After #116 lands, because #166 splits into multiple .cpp
files, we should enable LTO for the build.
Learn more about LTO via https://github.com/mapbox/cpp/blob/lto-glossary-entry/glossary.md#link-time-optimization
refs mapbox/cpp#49
More poking around carmen with an -fsanitize=undefined
binary of carmen-cache has revealed a potential overflow. No idea yet if this is harmless, spurious, or an actual problem.
node ./scripts/carmen-analyze.js tiles/01-ne.country.mbtiles
Analyzing tiles/01-ne.country.mbtiles ...
../src/binding.cpp:122:43: runtime error: unsigned integer overflow: 5498430554125 - 6048152813581 cannot be represented in type 'unsigned long long'
SUMMARY: AddressSanitizer: undefined-behavior ../src/binding.cpp:122:43 in
{ total: 5126,
degen: 3407,
ender: 1719,
byScore:
{ '0': 534,
'1': 279,
'2': 298,
'3': 325,
'4': 431,
'5': 627,
'6': 756,
'7': NaN },
byRelev:
{ '0.4': 271,
'0.6': 0,
'0.8': 0,
'1.0': 0,
'1638.6': NaN,
'1638.4': NaN,
'1638.2': NaN,
'1638.0': NaN,
'1637.8': NaN,
'1637.6': NaN,
'1637.4': NaN,
'1637.2': NaN,
'1637.0': NaN } }
at
Line 122 in bd529ff
Long term, we will need a mapping of literal index order to "logical" index order in cpp land to make intelligent decisions in setRelevance, spatialMatch.
Should be straightforward to add + pass back as a std::vector of ints.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.