Code Monkey home page Code Monkey logo

carmen-cache's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

carmen-cache's Issues

Variable shadowing bug?

In spatialmatch branch I'm seeing:

  CXX(target) Release/obj.target/carmen/src/binding.o
../src/binding.cpp:1107:22: warning: declaration shadows a local variable [-Wshadow]
        Local<Array> array = vectorToArray(it->second);
                     ^
../src/binding.cpp:1089:18: note: previous declaration is here
    Local<Array> array = Local<Array>::Cast(args[0]);
                 ^

Tests all pass for me currently, but fail after fixing this variable clash:

not ok 50 should be equivalent
  ---
    operator: deepEqual
    expected:
      { '11/441/770': undefined, '11/441/771': undefined, '11/442/770': undefined, '11/442/771': undefined, '11/498/724': undefined, '11/610/758': undefined, '11/610/759': undefined, '11/611/758': undefined, '9/151/188': undefined, '9/151/189': undefined, '9/152/188': undefined, '9/152/189': undefined, '9/152/190': undefined, '9/153/188': undefined, '9/153/189': undefined, '9/153/190': undefined, '9/154/188': undefined, '9/154/189': undefined, '9/154/190': undefined, '9/154/191': undefined, '9/155/188': undefined, '9/155/189': undefined, '9/155/190': undefined, '9/155/191': undefined, '9/156/189': undefined, '9/156/190': undefined, '9/156/191': undefined }
    actual:
      { '11/441/770': '100007711', '11/441/771': '100007711', '11/442/770': '100007711', '11/442/771': '100007711', '11/498/724': '100131599', '11/610/758': '100014180-495', '11/610/759': '100014180-495', '11/611/758': '100014180-495', '9/151/188': '495-496', '9/151/189': '495', '9/152/188': '495', '9/152/189': '495', '9/152/190': '495', '9/153/188': '495', '9/153/189': '495', '9/153/190': '495', '9/154/188': '495', '9/154/189': '495', '9/154/190': '495', '9/154/191': '495', '9/155/188': '495', '9/155/189': '495', '9/155/190': '495', '9/155/191': '495', '9/156/189': '495', '9/156/190': '495', '9/156/191': '495' }
    at: Test.<anonymous> (/Users/dane/projects/carmen-cache/test/coalesceZooms.test.js:131:7)
  ...

Segfault in cache::get/has

With https://github.com/mapbox/carmen/commit/c16b49098463065a when running: bench/phrasematch.js the process segfaults.

I've been debugging this for many hours without finding a clear culprit, so I'll document what I'm seeing and thinking:

What I'm seeing:

  • I can replicate with every carmen-cache version I've tried so far: master, v0.16.3, and v0.15.0, v0.14.0, and v0.12.1
  • The crash alternates between cache::_get and cache::has
  • The crash does not happen if the dawg cache is not dumped and reloaded like done in https://github.com/mapbox/carmen/commit/4a1b748360ec984f6e79c080a85a7d4b505adc77.
  • The crash is in the std::string constructor
  • It happens because a null ptr is passed to the std::string constructor and std::strlen tries to compute its length. This can be reduced to:
  const char * ptr = NULL;
  std::string shard = ptr; // will segfault on osx (libc++) and throw 'std::logic_error' on linux (libstdc++)
  • Putting a console.log before the call to _get or has in JS slows down the benchmark and 99% of the time avoids the crash.
  • One time the process did still crash after putting a console.log in to count how many times phrasematch had been called in the benchmark. This helped me see that phrasematch had run successfully ~ 11,000 before crashing.
  • Protecting against the crash in cache::has here by throwing an error on a empty string argument var->ToString().IsEmpty() moves the crash predictably to cache::get
  • Protecting against the crash in both cache::has and cache::_get (by throwing a JS error on an empty string) leads to no segfault AND also no JS errors being thrown. Crazy, right?
  • So, protecting against creating a std::string from a NULL appears to stop there ever being a NULL???

What I'm thinking:

  • I think the reason this happens with a compact dawg cache is because the program runs faster and the problem condition happens predictably.
  • I've noticed that if I put a print statement (like a console log before _get or has) then the problem goes away, another indication that the program needs to run fast to hit the problem

I think there are multiple bugs in play. One bug is that we don't protect from invalid/null strings from being sent to std::string. However, while we should protect from this, my sense is that a null ptr should not likely happen under normal conditions. So I suspect memory corruption or a race condition somewhere else in carmen cache loading. This then creates another condition where other bugs like the lack of NULL checking, manifest.

Use `numeric_cast` for narrowing conversions

Context

In carmen-cache we frequently convert integers from JS to C++ types. During this conversion we often need to "narrow" the integer types by casting from a type that supports a larger range (e.g int) to a type that supports a smaller range (e.g. short).

Preventing overflows/wrap arounds

To ensure we can cast values with predictable results in the face of invalid input we should move away from static_cast to using something like boost's numeric_cast which will throw an exception when a value cannot be safely cast to fit in a new type.

Since we don't want to depend on something as large as boost for carmen-cache we should look for a header-only, small library to do this for us.

Speed up sorting / reduce overhead of sorting

About a year ago @KaiBot3000, @aarthykc, and me put carmen under load, profiled with perf, and noticed a single, significant bottleneck in carmen-cache.

The bottleneck was the sorting of Context objects, specifically contextSortByRelev. This makes sense given Context objects are large (copying them is expensive), there may be a lot of them collected in memory, and the sort function is not simple.

After #93 we should see a slight speedup / reduction in the cost of this sorting because Context objects are now moved rather than copied. However I would not be surprised if the top bottleneck in carmen is still this sorting in carmen-cache. So this ticket stands to:

  • remind us that this is worthwhile of more investigation
  • document what we saw in case optimization work around sorting is picked up again
  • reference #93 which includes some ideas for optimizations not yet attempted, including using std::partial_sort

Details:

  • The sort function:
    inline bool contextSortByRelev(Context const& a, Context const& b) noexcept {
    if (b.relev > a.relev)
    return false;
    else if (b.relev < a.relev)
    return true;
    else if (b.coverList[0].scoredist > a.coverList[0].scoredist)
    return false;
    else if (b.coverList[0].scoredist < a.coverList[0].scoredist)
    return true;
    else if (b.coverList[0].idx < a.coverList[0].idx)
    return false;
    else if (b.coverList[0].idx > a.coverList[0].idx)
    return true;
    return (b.coverList[0].id > a.coverList[0].id);
    }
  • The place the sort takes place: (
    std::sort(contexts.begin(), contexts.end(), contextSortByRelev);
    ) was identified as a bottleneck during profiling of carmen with perf. This is no surprise, sorting is expensive
  • The perf output that previously highlighted sorting as the primary bottleneck:

0707a524-f469-11e6-9296-302ac929396b

Place of binary

After successfully running make, where is the binary placed?

Update RocksDB: 4.13 -> 5

Context:

We're running into some issues with shard-carmen that may be caused by rocksdb. I don't know where to start digging into this issue because I haven't worked with carmen-cache yet ๐ŸŽ‰

Quest:

Let's get on the most recent stable version as an introduction to how we use it in carmen-cache.

capturing from chat:
image

cc'ing @aarthykc who may want to join in and @springmeyer who's agreed to spirit-guide.

Run clang sanitizers on travis

To catch memory leaks, invalid memory access, and overflows before they cause bugs we should start running the clang++ sanitizers per commit on travis. The main ones to focus on are:

  • Address Sanitizer (-fsanitize=address) (which on linux also enables the Leak Sanitizer if ASAN_OPTIONS=detect_leaks=1)
  • Integer Sanitizer (-fsanitize=integer) which warns about integer overflows (you need to watch the logs to see if they happen)

The way these work is you build with the given flag in the CXXFLAGS and LDFLAGS to instrument the binaries. And then you run the tests. At runtime the sanitizer watches the program and either throws (in the case of address/leak sanitizer) or warns (in the case of the integer sanitizer).

One gocha however is that for a node module loaded with dlopen (how node C++ modules are loaded) the sanitizers need to also be built into the node binary. I've rolled a custom node.js binary that is instrumented this way: https://github.com/mapbox/mason/tree/master/scripts/node_asan/4.4.4.

TODO:

  • node_asan package has a glitch where npm is broken due to a hardcoded path (needs fixing in the mason package)
  • Create a travis linux job that builds in Debug mode, installs the custom node, and build and tests with `-fsanitize=address in the CXXFLAGS and LDFLAGS.
  • Do the same for -fsanitize=integer

Note: due to leaks in node v4.4.4 itself the travis jobs will need to set export ASAN_OPTIONS=detect_leaks=0 while building carmen-cache and then again will want to set export ASAN_OPTIONS=detect_leaks=1 before running the tests.

/cc @mikemorris @apendleton

[setgrid] Add an encodeGrid() cpp method

For this task you'll be getting familiar with

  • Defining JS API methods in C++
  • Converting JS values into C++ types and vice versa
  • Creating C++ functions in the binding.cpp file

  • Ping @apendleton or @yhahn for a walkthrough of
    • Defining a new JS method in C++
    • Grabbing an argument from JS in C++
    • Converting it to a C++ type
    • Rebuilding your C++ code and testing it from JS
  • Create an encodeGrid() cpp function
  • Expose your encodeGrid() cpp function to JS as a MemoryCache method
  • Add JS unit tests that confirm cache.encodeGrid() behaves the same grid.encode()
  • Make a Pull Request against the setgrid branch with your work!

[setgrid] Add unit tests for _set() with grid objects

For this task you'll be getting familiar with

  • Requiring modules in node and calling methods from them
  • Unit testing in node using its native assert methods
  • Defining tests using the tape module

  • Create a new test file for your tests at test/cache-setgrid.test.js

  • Start by adding the example API usage from the main quest ticket (#99) to your file:

    var carmenCache = require('carmen-cache');
    var grids = new carmenCache.MemoryCache('grids');
    
    var a = 7036891598684201;
    var b = 4785263596273714;
    var c = 7036994682093628;
    
    // store the data in the cache
    grids._set('paris', [a,b,c]);

    Because your test file is actually within the carmen-cache module, you'll need to reference it in your require using a relative path like this:

    var carmenCache = require('../index.js');

    Run your test file using node test/cache-setgrid.test.js to make sure there are no syntax errors.

  • Sketch out basic unit tests for the existing API:

    • You should be able to call _set() using paris as the cache key and with a list of numbers
    • You should be able to call _get() using paris as the cache key and get a list of numbers back

    To do this, you can make use of the tape API for defining tests:

    var tape = require('tape');
    
    tape('basic API', function(assert) {
        // code and assertions run here
        var foo = 5 + 7;
        assert.equal(foo, 12, 'addition works!');
    
        // when your test is done you must call assert.end() explicitly
        assert.end();
    });
  • Sketch out the test flow you'd like for respresenting the new API:

    • You should be able to call _set() using paris as the cache key and with a list of grid objects
    • You should be able to call _get() using paris as the cache key and get a list of numbers back
    • You should be able to verify that the numbers returned are the grid objects converted to single numbers properly

    For verifying that the conversion occurred correctly, you can make use of the test/grid.js library which provides utility encode/decode functions for converting between grid properties and their numeric equivalents.

  • Make a Pull Request against the setgrid branch with your work!

Don't publish binaries for git tags

Pushing a commit with [publish binary] along with a tag referencing that commit leads to build failures, since Travis will run duplicate build jobs that both try to publish their binaries to the same place. Whichever one finishes first will succeed, the other will fail. This causes many commits to be incorrectly flagged as failing CI, though it doesn't cause really any substantive problems beyond that.

This issue was reported in mapbox/node-cpp-skel#108, and seems to have been fixed with mapbox/node-cpp-skel#139, so I think the next steps are:

  • add the node-cpp-skel updates to carmen-cache/scripts/publish.sh
  • make sure it works
  • merge

cc @springmeyer @apendleton

Drop dependency on libprotobuf

We are already using protozero for fast parsing of protobuf messages.

Now that [protozero] has a robust and well tested interface for writing (docs at https://github.com/mapbox/protozero/blob/master/tutorial.md#writing-protobuf-encoded-messages) we can drop the external dependency on libprotobuf, which is currently only used in carmen::Cache::pack.

TODO:

  • FIRST: upgrade protozero to version that includes writing support (#34)
  • remove building of libprotobuf code here
  • remove references to libprotobuf dependency here
  • remove protoc generated header include:
    #include "index.pb.h"
  • remove need to install libprotobuf via mason:

    carmen-cache/.travis.yml

    Lines 51 to 58 in 126f7d3

    - git clone --depth 1 https://github.com/mapbox/mason.git ./.mason
    - export MASON_DIR=$(pwd)/.mason
    - export PATH=$(pwd)/.mason:$(pwd)/mason_packages/.link/bin:$PATH
    - ./.mason/mason install protobuf 2.6.1
    - ./.mason/mason link protobuf 2.6.1
    - export PKG_CONFIG_PATH=$(pwd)/mason_packages/.link/lib/pkgconfig/
    - export CXXFLAGS="-I$(pwd)/mason_packages/.link/include"
    - export LDFLAGS="-L$(pwd)/mason_packages/.link/lib"
  • port carmen::Cache::pack to use protozero writer (drop usage of carmen::proto): https://github.com/mapbox/carmen-cache/blob/master/src/binding.cpp#L196-L258

[setgrid] Wire up encodeGrid() and _set()

server rack

Now that the pieces are in place let's look into wiring it all up:

  • Read through the current _set() function
  • Before making changes in the code, sketch out some notes on how you want to approach this
    • How and where in the code will you optionally handle the grid objects coming in from JavaScript?
    • How will you convert that data into covers?
    • @apendleton or @yhahn can give you a good gutcheck on approach
  • Apply those changes and work through any issues as you toward getting the original setgrid unit tests to pass
  • Make a Pull Request against the setgrid branch with your work!

cc @fredngeorge

Upstream: undo feature id sharding

Transferring braindump notes to here. This issue affects both carmen-cache and carmen, but the problem originates mostly from the grid cache outward so capturing here for now.

Why the problem exists at????!!!

Feature storage is sharded right now not just for historical reasons but because the grid storage in our PBF format currently limits us 20-bit feature IDs.

Even with a larger feature ID space (e.g. if we went from 53 bits => 64 bits) we would probably encounter feature ID collisions at which point we'd need to store them in a nested fashion with some kind of secondary factor (like cover zxy) to disambiguate where possible.

Example:

feature A 120951912 \
                     +----> store together, distinguish by id + zxy
feature B 120951912 /

Rough braindump of how to untangle

  • We'd want unlimited feature ID storage space
  • We'd want enforcement of feature ID uniqueness (per index) at indexing time
  • We'd want to undo "sharded" feature storage
  • We'd want to stop using uint-64 as a way to transfer cover data into and out of carmen-cache -- we'd probably define a new PBF message format (variable length, allowing for any size feature IDs) for encoding this information

I think these would all be very positive changes, just a pretty solid, invasive refactor

cc @mapbox/geocoding-gang

Nuke type concept

For later, but I think we can nuke the type concept in the cache. The calling module can basically handle type segmentation by using multiple cache instances.

I think once the type convention is gone this can basically be a protobuf-powered sharded one-to-many 32-bit unsigned int cache. Not sure how generically useful that is though : )

cc @springmeyer

Illegal instruction on Node 8

@springmeyer I merged the Node 8/10 support PR (#122) and have been working on getting Carmen running on Node 8, but am hitting an Illegal instruction crasher on circle. SSHing into the Circle environment, it looks like it's in carmen-cache -- just running node index.js within carmen-cache triggers the crash:

root@ae104a9c3e0f:~/project/node_modules/@mapbox/carmen-cache# lldb node index.js
(lldb) target create "node"
Current executable set to 'node' (x86_64).
(lldb) settings set -- target.run-args  "index.js"
(lldb) run
Process 8818 launched: '/root/.nvm/versions/node/v8.11.3/bin/node' (x86_64)
Process 8818 stopped
* thread #1, name = 'node', stop reason = signal SIGILL: illegal instruction operand
    frame #0: 0x00007ffff461cfb2 carmen.node`_GLOBAL__sub_I_env_posix.cc + 6434
carmen.node`_GLOBAL__sub_I_env_posix.cc:
->  0x7ffff461cfb2 <+6434>: vpbroadcastq %xmm0, %xmm0
    0x7ffff461cfb7 <+6439>: vmovdqu %xmm0, 0x451519(%rip)     ; rocksdb::(anonymous namespace)::lockedFiles + 24
    0x7ffff461cfbf <+6447>: vzeroupper
    0x7ffff461cfc2 <+6450>: callq  0x7ffff45df080            ; symbol stub for: __cxa_atexit

with full (not very helpful because of lack of debug symbols) backtrace:

(lldb) bt
* thread #1, name = 'node', stop reason = signal SIGILL: illegal instruction operand
  * frame #0: 0x00007ffff461cfb2 carmen.node`_GLOBAL__sub_I_env_posix.cc + 6434
    frame #1: 0x00007ffff7de5733 ld-linux-x86-64.so.2`___lldb_unnamed_symbol50$$ld-linux-x86-64.so.2 + 259
    frame #2: 0x00007ffff7dea1ff ld-linux-x86-64.so.2`___lldb_unnamed_symbol81$$ld-linux-x86-64.so.2 + 1087
    frame #3: 0x00007ffff6bdc2df libc.so.6`_dl_catch_exception + 111
    frame #4: 0x00007ffff7de97ca ld-linux-x86-64.so.2`___lldb_unnamed_symbol79$$ld-linux-x86-64.so.2 + 186
    frame #5: 0x00007ffff7bd1f96 libdl.so.2`___lldb_unnamed_symbol4$$libdl.so.2 + 86
    frame #6: 0x00007ffff6bdc2df libc.so.6`_dl_catch_exception + 111
    frame #7: 0x00007ffff6bdc36f libc.so.6`_dl_catch_error + 47
    frame #8: 0x00007ffff7bd2735 libdl.so.2`___lldb_unnamed_symbol11$$libdl.so.2 + 101
    frame #9: 0x00007ffff7bd2051 libdl.so.2`dlopen + 113
    frame #10: 0x0000000001414bf7 node`uv_dlopen(filename="/root/project/node_modules/@mapbox/carmen-cache/lib/carmen.node", lib=0x00007fffffffb8f0) at dl.c:36
    frame #11: 0x00000000008c7d75 node`node::DLOpen(v8::FunctionCallbackInfo<v8::Value> const&) + 245
    frame #12: 0x0000000000a5ff63 node`v8::internal::FunctionCallbackArguments::Call(void (*)(v8::FunctionCallbackInfo<v8::Value> const&)) + 403
    frame #13: 0x0000000000ad710c node`v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) + 332
    frame #14: 0x0000000000ad7d5f node`v8::internal::Builtin_HandleApiCall(int, v8::internal::Object**, v8::internal::Isolate*) + 175
    frame #15: 0x00003f87a27842fd
    frame #16: 0x00003f87a283d1d6
    frame #17: 0x00003f87a283d1d6
    frame #18: 0x00003f87a283d1d6
    frame #19: 0x00003f87a283d1d6
    frame #20: 0x00003f87a283d1d6
    frame #21: 0x00003f87a283d1d6
    frame #22: 0x00003f87a283d1d6
    frame #23: 0x00003f87a283d1d6
    frame #24: 0x00003f87a283d1d6
    frame #25: 0x00003f87a283d1d6
    frame #26: 0x00003f87a283d1d6
    frame #27: 0x00003f87a283d1d6
    frame #28: 0x00003f87a283d1d6
    frame #29: 0x00003f87a283d1d6
    frame #30: 0x00003f87a283d1d6
    frame #31: 0x00003f87a2784239
    frame #32: 0x00003f87a2784101
    frame #33: 0x0000000000d6f94a node`v8::internal::Execution::Call(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*) + 266
    frame #34: 0x0000000000a428f3 node`v8::Function::Call(v8::Local<v8::Context>, v8::Local<v8::Value>, int, v8::Local<v8::Value>*) + 355
    frame #35: 0x00000000008ca306 node`node::LoadEnvironment(node::Environment*) + 566
    frame #36: 0x00000000008d42cd node`node::Start(uv_loop_s*, int, char const* const*, int, char const* const*) + 829
    frame #37: 0x00000000008cc8fd node`node::Start(int, char**) + 365
    frame #38: 0x00007ffff6a96b97 libc.so.6`__libc_start_main + 231
    frame #39: 0x000000000089b1b1 node`_start + 41

So, looks like an AVX instruction (vbroadcastq maybe?) in RocksDB that's doing it. The CPUs we're running on on Circle look like this:

processor	: 31
vendor_id	: GenuineIntel
cpu family	: 6
model		: 62
model name	: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
stepping	: 4
microcode	: 0x42c
cpu MHz		: 2800.092
cache size	: 25600 KB
physical id	: 1
siblings	: 16
core id		: 7
cpu cores	: 8
apicid		: 47
initial apicid	: 47
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm kaiser fsgsbase smep erms xsaveopt
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass
bogomips	: 5675.20
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

so I think should support this instruction? But I'm uncertain as to why this would be happening now, since if I'm not mistaken both the Node 6 and Node 8 builds use the same precompiled RocksDB static binary from Mason, so I wouldn't expect them to be any different?

Anyhow, I'm pretty unsure as to how to proceed from here. Here's the Circle build if you have a sec to take a look.

546 locally, 542 tests on travis

Something is odd with tests on travis. Different numbers of tests are running (fewer) on travis that I see locally. And I don't see the expected output from test/coalesce.bench.test.js of ok 116 coalesceSingle + proximity @ .... It seems as if this critical test is not running on travis....

/cc @apendleton @yhahn to see if they have any clues.

Abort in V8 if invalid cache object is passed to coalesce

While testing coalesce I ran into an odd crash. Turns out is was API misusage on my part. Nevertheless we should protect against invalid Cache objects being sent in the array arg to coalesce. We can do this by calling:

if (!Nan::New(Cache::constructor)->HasInstance(obj)) {
  .. throw ..
}

This reduced testcase:

var coalesce = require('.').Cache.coalesce;
coalesce([{}]);

Triggers:

FATAL ERROR: v8::Object::GetAlignedPointerFromInternalField() Not a Smi
Abort trap: 6

v8::AdjustAmountOfExternalAllocatedMemory()

We should try estimating the memory usage of carmen cache and setting v8::AdjustAmountOfExternalAllocatedMemory(). This should help with scenarios where:

  • huge caches are loaded quickly into the C++ heap
  • v8 needs to garbage collect its JS objects to keep memory down...
  • But v8 may decide not to garbage collect since it does not see the C++ heap and thinks it has ample available memory

Using v8::AdjustAmountOfExternalAllocatedMemory() will likely slow down the performance of repeated calls to cache.loadSync but may allow node to avoid using too much memory for temporary objects created in the same code that is calling cache.loadSync.

refs https://github.com/nodejs/nan/blob/master/doc/v8_internals.md#nanadjustexternalmemory

gcc 4.8 vs 4.9 vs clang

Consultation/hint/tips on this stumper pls ...

There is a test fail after this fix that has me stumped. https://travis-ci.org/mapbox/carmen-cache/builds/46816335. This change attempts to preserve the order of spatialMatch results when equal relevs are compared -- either the order of the results going into the sort is different or the sort itself is behaving differently:

Tips on where/how to debug here?

Refactor Plan

As a part of a larger set of contributions @apendleton and I are attempting to refactor the repo with the goal to make it more readable and useful overall.

  • API Documentation using documentation.js
    • Outputs API markdown
      • memorycache
      • rocksdbcache
      • normalizationcache
      • coalesce
      • node-cpp-utils
      • cpp-utils
    • Generated API.md from code comments
    • And run via package.json or the Makefile
  • File & directory structure. See (#58)
    • split most source code from binding.cpp file out into dedicated source files:
      • one for rocksdbcache
      • one for memorycache
      • one for normalizationcache
      • one for coalesce
      • one for pure-C++ (non-Node) util functions
      • one for Node util functions
    • ensure 1:1 parity between cpp files and hpp files
    • ensure that all functions, type definitions, struct definitions, etc., are represented in header files
  • Tests
    • ensure division of tests into files mirrors organization of C++ functionality into files
    • add documentation to tests to ensure that they serve a useful purpose
  • Remaining planned improvements from the core-tech team
    • Benchmarks - use node-cpp benchmark tests
    • Least Privileges
      • Remove current keys in .travis.yml
      • Use cloudformation instead
  • Consider dropping Node 4 from Travis build matrix

Protect from invalid args passed to coalesce

If a PhrasematchSubq array, that is missing a property, is passed to the coalesce function we incorrectly attempt to cast the value to a C++ type.

This manifests as a crash (SIGILL) when using a node binary build in Debug mode.

# coalesceSingle

#
# Fatal error in ../deps/v8/src/objects-inl.h, line 1117
# Check failed: IsNumber().
#

==== C stack trace ===============================

 1: V8_Fatal
 2: v8::internal::Object::Number()
 3: v8::Value::IntegerValue(v8::Local<v8::Context>) const
 4: v8::Value::IntegerValue() const
 5: carmen::Cache::coalesce(Nan::FunctionCallbackInfo<v8::Value> const&)
 6: Nan::imp::FunctionCallbackWrapper(v8::FunctionCallbackInfo<v8::Value> const&)
 7: v8::internal::FunctionCallbackArguments::Call(void (*)(v8::FunctionCallbackInfo<v8::Value> const&))
 8: v8::internal::MaybeHandle<v8::internal::Object> v8::internal::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::(anonymous namespace)::BuiltinArguments<(v8::internal::BuiltinExtraArguments)1>&)
 9: v8::internal::Builtin_Impl_HandleApiCall(v8::internal::(anonymous namespace)::BuiltinArguments<(v8::internal::BuiltinExtraArguments)1>, v8::internal::Isolate*)
10: v8::internal::Builtin_HandleApiCall(int, v8::internal::Object**, v8::internal::Isolate*)
11: 0x3c03c66068fb
12: 0x3c03c68f1b33
Illegal instruction: 4

I noticed this because I saw this bug uncovered via a node debug build and then ran carmen-cache with this debug node build: mapbox/mason#355. In release mode on OS X it looks like the value silently returns as zero. It is possible memory is corrupted in release mode, but it may be implementation dependent whether something terrible or undefined will happen.

I'm working on a PR that I'll queue up shortly that:

Update scoredist implementation

The scoredist method claims to be equivalient of the method of the same name in carmen, but it really is not.

  • scoredist in carmen-cache
    double scoredist(unsigned zoom, double distance, double score, double radius) {
    if (zoom < 6) zoom = 6;
    if (distance == 0.0) distance = 0.01;
    double scoredist = 0;
    // Since distance is in tiles we calculate scoredist by converting the miles into
    // a tile unit value at the appropriate zoom first.
    //
    // 32 tiles is about 40 miles at z14, use this as our mile <=> tile conversion.
    scoredist = ((radius * (32.0 / 40.0)) / _pow(1.5, 14 - static_cast<int>(zoom))) / distance;
    return score > scoredist ? score : scoredist;
    }
  • scoredist in carmen https://github.com/mapbox/carmen/blob/f831f2dfc9eb965752939f5809b18315b38b7659/lib/util/proximity.js#L81-L90

We should most likely update the version here to actually be equivalent of what is over in carmen.

Documenting most recent bottlenecks detected

@apendleton provided a cache_bench benchmark which exercises cxxcache.RocksDBCache and cxxcache.coalesce.

Using latest master at fec8605 (using node v8.11.3) here are the things I notice when running the benchmark locally on my OS X machine:

  • The benchmark takes about 22-24s
  • CPU usage hovers around 80-110% (so using all of one core) which makes since given the benchmark synchronizes calls to cxxcache.coalesce using queue(1) and therefore is not able to utilize the threadpool much
  • Memory usage consistently grows after the initial data load during the bench run, from 400 -> 600 MB over the course of 15-20 seconds and until exciting
  • The main event loop (aka main thread) is very busy, with 44% in uv_work_done
  • 10% of the time spent in uv_work_done is garbage collection of v8 objects (initiated by node::NodePlatform::FlushForegroundTasksInteral())
  • The majority of the 44% busyness of the main loop of uv_work_done is due to the task of returning data from the threadpool to the main event loop, which is done in the carmen::coalesceAfter function.
    • The carmen::coalesceAfter looks to be bottlenecked on v8::Object::Set which is called inside carmen::contextToArray and carmen::coverToObject to convert the carmen data into JS objects.
  • The worker threads in the C++ threadpool (where carmen::coalesceMulti does its critical work) appear to be not very busy, which is surprising. Ideally the main event loop is mostly idle (the profiler mostly sees the CPU executing in uv__io_poll which signifies waiting for work) and the threads are nearly 100% busy. But in this case the opposite is true: the main event loop is busier than the threads.

Why are the threads so idle?

One hypothesis is that the threads are idle due to the queue(1); being used in the bench.js script causing not much work to be sent to the threads meaning that each time work is sent it gets sent to a different thread and the work is fanned out and returns quickly. This appears to not be the case. If I run the benchmark with a threadpool size of the 1 the profiling output (and the script timing) is similar:

UV_THREADPOOL_SIZE=1 node cache_bench/bench.js

So, rather it seems like the problem is that the main event loop is doing too much work. Normally you want the hard work to be done in the threads to avoid blocking the main loop, but it appears that is not happening here (on balance). The fact that this balance is so off in the benchmark is concerning because in real-world usage a lot of other functions are going to need to execute on the main event loop, so carmen-cache taking up so much time is of great concern.

Profiling in Activity monitor shows the significant cost of the JS object conversion with carmen::coalesceAfter:

screen shot 2018-08-10 at 12 16 40 pm

If I change the queue(1); to queue(); in the benchmark (to allow more async usage of the threadpool), this seems to further confirm that we've got a bad bottleneck on the JS conversion in the main event loop. Running UV_THREADPOOL_SIZE=16 node cache_bench/bench.js finishes faster (in 14s) but at the cost of nearly entirely jamming up the main event loop with JS object conversion, with > 93% in carmen::coalesceAfter. In all my profiling of async C++ code I've never seen this much of the main event loop taken up by object conversion from C++ to JS (even in a micro benchmark like this):

screen shot 2018-08-10 at 12 22 23 pm

At the time this profile was taken the threads were completely idle without any work to do, likely because the main event loop could not dispatch it, given it was blocked on object conversion. This is bad.

How likely is this to be impacting production?

I don't know. Surely not as severely as this micro benchmark, but profiling in production would show the level of severity of this bottleneck, so I think that is worth doing.

Potential solutions

A solution to this problem would involve:

  • ๐Ÿ’ Reducing the # of objects sent back from C++ to JS
  • ๐ŸŒธ Reducing the cost of passing objects from C++ to JS

As far as ๐Ÿ’ - we currently limit to 40 features to reduce this cost. Could we possibly limit further? To 20 features? Looks like the 40 cutoff was added in #65 perhaps?

As far as ๐ŸŒธ two ideas come to mind. Instead of converting C++ objects to JS JSON-like objects (which is clearly crazy expensive) we could either:

  • Pass C++ objects directly into JS. Those C++ objects would use pointers internally to avoid copying of the data. They would expose methods on them to get access to the underlying feature properties. So you would need to do context.getRelev() instead of context.relev in JS-land
  • Serialize using some small, efficient format like protobuf or the like. This way we'd serialize to protobuf in C++ and then decode lazily in JS to get access to the properties. The protobuf message could be efficiently passed from C++ to JS using this extremely fast and efficient method: mapbox/node-cpp-skel#69

Use `-Werror` and `-isystem`

All C++ code at Mapbox should be able to run warning free. We should follow the approach of mapbox/node-cpp-skel#42 to ensure that any warnings in the Carmen Cache code become errors, we fix them, and then going forward we'll be protected from bugs that might be found if we were paying attention to all warnings from various compilers.

Overflow in `pxy2zxy`

By running carmen-cache with -fsanitze=integer I see we only have one test that hits the if (zDist == 0) condition at

if (zDist == 0) {
.

It is the test/geocode-unit.proximity.test.js (specifically # forward province - multilayer) in carmen as of https://github.com/mapbox/carmen/commit/dbe167d609a6cc5374a20ddd7accb81ba62edf34.

Before the if (zDist == 0) condition the zMult variable overflows. This is harmless because it is not used, but it still ends up being 4294967295 because target_z and z are both 6 and so zDist because 0 at

unsigned zDist = target_z - z;

Tasks:

  • Ensure we have test coverage in carmen-cache directly that triggers this condition (to protect from a regression if the if (zDist == 0) condition were every accidentally removed
  • Fix the harmless overflow

Proposed fix:

diff --git a/src/binding.cpp b/src/binding.cpp
index b6bb856..a424111b 100644
--- a/src/binding.cpp
+++ b/src/binding.cpp
@@ -903,12 +903,12 @@ ZXY pxy2zxy(unsigned z, unsigned x, unsigned y, unsigned target_z) {
 
     // Interval between parent and target zoom level
     unsigned zDist = target_z - z;
-    unsigned zMult = zDist - 1;
     if (zDist == 0) {
         zxy.x = x;
         zxy.y = y;
         return zxy;
     }
+    unsigned zMult = zDist - 1;
 
     // Midpoint length @ z for a tile at parent zoom level
     unsigned pMid_d = static_cast<unsigned>(std::pow(2,zDist) / 2);

Refactor coalesce to reduce duplicate get operations

Our usage pattern for coalesce involves calling coalesce from carmen multiple (up to 30) times for each query. As currently structured, there's no explicit work sharing between the coalesce calls -- they don't know about each other and are often running at the same time, and in particular, each coalesce call is responsible for retrieving its own data from the underlying store, via calls to __getmatching. We know from profiling in #126 that __getmatching is a major contributor to total runtime of coalesce.

The thing is, though, that multiple stacks being submitted for coalesce operations over the course of the same query will often share the some stack items. For example, for "springfield, missouri, united states", if "missouri" and "united states" both come from global indexes (as they both happen to be in our current production architecture), we'll generate a separate stack for each place index with a "springfield" in it, but with identical "missouri" and "united states" stack components, and each coalesce run will perform identical __getmatching calls for those components, and duplicate the work of retrieving, decoding, combining, and sorting them. We do potentially reap some implicit savings because RocksDB has an internal cache that probably shields us from some disk reads on calls after the first one, but the decoding and sorting steps are entirely duplicated.

A possible change in approach would be to switch from doing each coalesce call separately to a consolidated, two-phase multiple-coalesce approach, like this:

  • we submit all of the coalesce jobs we're going to need to do over the course of a search task once, as a unit
  • on the c++ side, we deduplicate the submissions, and identify all the unique __getmatching calls we're going to need to do
  • we fan those calls out across the threadpool, and retain the results
  • we join on completion of those gets and regroup in the main thread -- that's phase 1
  • phase 2 is that we fan back out again to do all the actual coalesces, but pass them references to already-retrieved data instead of just the things they're going to need to look up for themselves
  • join again, and return all the results together as a set
  • bonus: if I'm reading the current code right, at present, we return up to 40 results for each coalesce call, then combine/sort them together on the carmen side, and cut that list down eventually to 20 results across all the coalesce calls. If we do this on the native side, we can copy dramatically less stuff back to carmen -- that's another major identified bottleneck in #126

Risks:

  • the extra fork/join step will probably introduce some inefficiencies during which threads will be idle waiting for other threads to finish, which might erode some of the benefits of deduplicating
  • writing safe threaded code in c++ is fraught and error-prone and we're proposing introducing a bunch of new complexity here; if we waited to do this until after this functionality were visible to Rust, we could consider writing coalesce 2.0 in that instead, which might be easier to accomplish
  • we'll end up holding all of the __getmatching results in memory at once -- these are potentially bulky, and we probably won't ever hold them all simultaneously in the current implementation because the size of the current threadpool implicitly limits how many coalesces we'll ever be doing at once. As a consequence, the max memory usage during coalesce will potentially increase; no idea if the increase will potentially be big enough to be concerning

Flapping test failures in asan travis job

Flagging that I noticed the asan job is failing currently (or at least did fail once with #109 in a way that is unrelated to that PR). I don't have time right now to triage so I'm creating this ticket to document.

https://travis-ci.org/mapbox/carmen-cache/jobs/266152025

I can see at least 2 things happening:

Both of these issues should be fixed but are not urgent / are safe to ignore. /cc @yhahn @apendleton

Truncate to 32-bit ids in dictionary cache

The ldictcache is a set of all phrase IDs in an index. It's essentially a low-memory index to represent what can be loaded from the actual phrase => grid index, e.g.

  1. First we check whether an ID even exists to be loaded via ldictcache
  2. If it exists, we move on to actually loading the phrase => grid index for that id

Per discussion we could look into using truncated 32-bit ids in this cache. We'll have some ID collisions but it may be worth the memory tradeoff.

Suggested approach

Goal: try to make this change that is mostly transparent to carmen downstream.


Next release

@apendleton - I've merged #42 into master. So the next version of carmen cache should be lighter (less binaries to package) and a bit faster for writes (carmen.pack).

I'll let you decide when to release. There is no need to get #42 deployed soon, but it should be solid whenever something else triggers a release.

/cc @yhahn

Crashing "real" test

Tests are failing on linux and crashing on osx with 9b86b03. One guess is memory corruption in sortRelevReason (

carmen-cache/src/binding.cpp

Lines 1266 to 1276 in ce7b3bf

bool sortRelevReason(SetRelev const& a, SetRelev const& b) {
if (a.idx > b.idx) return true;
else if (a.idx < b.idx) return false;
else if (a.relev > b.relev) return true;
else if (a.relev < b.relev) return false;
else if (a.reason > b.reason) return true;
else if (a.reason < b.reason) return false;
else if (a.id < b.id) return true;
else if (a.id > b.id) return false;
return true;
}
). This guess is because I've only seen crashes inside a C++ sort function once before. Prof @kkaefer explains: mapnik/mapnik#1087 (comment).

[setgrid] Building dev binaries

Goal: we're going to switch gears and start using your work as a dependency in a branch of carmen so you can start testing out your new API.

  • You'll get familiar with semver and creating dev versions of a project
  • You'll learn how to trigger a binary build of a project
  • You'll learn how to point at dev dependency of another module from a package.json file

  • Create a branch off of setgrid
  • Check out the semver docs and how npm handles semver versioning for especially for prerelease builds
  • In your new branch, edit package.json and update the version property to be for a prerelease of the next version.
    • For example, since you're adding API functionality (but not breaking anything) you might increase the minor version number.
    • Since this isn't the official release yet, add a prerelease suffix to your version
    • Gutcheck from another member of @mapbox/geocoding-gang
  • Read over CONTRIBUTING.md which has instructions for triggering a binary build (https://github.com/mapbox/carmen-cache/blob/master/CONTRIBUTING.md)
  • When ready, let's trigger a binary build with a commit

Next up: creating a branch in carmen and pointing at your new build

[setgrid] Set Grid Quest

One of the core data types in carmen-cache is a grid - a record of a particular location feature (e.g. New York City or Amsterdam) that includes its x,y coordinates and its feature ID. For some time we've packed these properties into 64 bits for efficient storage, but to make this work we've made various tradeoffs. One tradeoff is that we've had to truncate feature IDs to fit with all the other data within 64 bits. This makes looking up a feature from a grid quite challenging because the truncated feature ID may no longer be unique within a dataset.

For your sprint project we'll refactor a specific part of the carmen-cache API to prepare for a more flexible storage format in the future. We'll focus on the core get/set storage functions in carmen-cache.


What is a grid?

A grid represents a geographic area (x, y) covered by a single feature (id) with information corresponding to how relevant and important it might be to a certain textual query (relev, score).

In JavaScript, we work with grids in the carmen codebase as objects, e.g.

var grid = {
  x: 3,
  y: 17,
  relev: 1.0,
  score: 5,
  id: 15
};

This would be a valid grid object.


Encoding grids as numbers

To store grids efficiently we have a little recipe for converting the grid object into a single number value and back:

Grids are encoded as 53-bit integers (largest possible JS integer) storing the following information:

info bits description
x 14 x tile cover coordinate, up to z14
y 14 y tile cover coordinate, up to z14
relev 2 relev between 0.4 and 1.0 (possible values: 0.4, 0.6, 0.8, 1.0)
score 3 score scaled to a value between 0-7
id 20 feature id, truncated to 20 bits

From https://github.com/mapbox/carmen/blob/master/README.md#grid

The idea of storing multiple properties in a single number is an encoding technique that is often used for optimization purposes. A simple example of this concept: suppose you've surveyed a bunch of people on their age and also whether they are happy about getting older. You might have two fields as a result:

name age happy_about_it
Jane 31 n
John 7 y
Joan 91 y
Jeff 45 n

A human-friendly formula to encode these two values into a single number might be:

  1. Multiply age by 10
  2. If the person is happy about their age, add 1

You would then end up with a single value that captures both pieces of information (if you know the recipe):

name age_and_happy_about_it
Jane 310
John 71
Joan 911
Jeff 450

From a computer's perspective this recipe is pretty inefficient though - you'll never end up using digits 2-9 in the ones column. Since happy_about_it is a boolean (only two possible values), you might use a recipe more oriented toward binary like:

  1. Multiply age by 2
  2. If the person is happy about their age, add 1
name age_and_happy_about_it
Jane 62
John 15
Joan 183
Jeff 90

Harder to read as a human but just as easy to reverse as a computer.


Our current _set() API

Our current carmen cache set API works like this:

var carmenCache = require('carmen-cache');
var grids = new carmenCache.MemoryCache('grids');

var a = 7036891598684201;
var b = 4785263596273714;
var c = 7036994682093628;

grids._set('paris', [a,b,c]);

It expects you to pass an array of numbers, where you've done the hard work of converting your grids to numbers. Because of this API design, it also means that changing how we store this data isn't a decision carmen-cache can make alone - we'd need to update all the code that calls _set() as well.


Goal: let _set() do the encoding

The goal of this sprint is to get to the point where _set() can optionally accept an array of grid objects as well and do the conversion internally itself. So our new API will look like:

var carmenCache = require('carmen-cache');
var grids = new carmenCache.MemoryCache('grids');

var a = { x: 3, y: 17, relev: 1.0, score: 5, id: 15 };
var b = { x: 4, y: 16, relev: 0.8, score: 3, id: 16 };
var c = { x: 3, y: 10, relev: 0.6, score: 2, id: 17 };

grids._set('paris', [a,b,c]);

And it will be up to _set() to do the appropriate conversion to how it wants to store the data. For now we'll use the exact same storage strategy (converting to a single number) but this API change will give us room in the future to experiment with more flexible ways of storing the data seamlessly.


Backlog

For each item youโ€™ll create a new branch that youโ€™ll turn into a pull request.

  • #100 - Add tests for _set() where the data passed is an array of objects with grid properties instead of integers.
    • Language: JavaScript
    • Goal: sketch out the new _set() API from the JS side
  • #101 - Add an encodeGrid() cpp method and expose it to the JS API for testing
    • Language: C++
    • Goal: port the grid.encode() JS method so that _set() can convert grid properties to integers in cpp
  • #103 - Adjust _set() to be able to handle JS object input and convert those objects to integers
    • Language: C++
    • Goal: wire it all together and get tests passing

cc @fredngeorge

Integer overflow in delta encoding in `__get`

More poking around carmen with an -fsanitize=undefined binary of carmen-cache has revealed a potential overflow. No idea yet if this is harmless, spurious, or an actual problem.

node ./scripts/carmen-analyze.js tiles/01-ne.country.mbtiles
Analyzing tiles/01-ne.country.mbtiles ...
../src/binding.cpp:122:43: runtime error: unsigned integer overflow: 5498430554125 - 6048152813581 cannot be represented in type 'unsigned long long'
SUMMARY: AddressSanitizer: undefined-behavior ../src/binding.cpp:122:43 in 
{ total: 5126,
  degen: 3407,
  ender: 1719,
  byScore: 
   { '0': 534,
     '1': 279,
     '2': 298,
     '3': 325,
     '4': 431,
     '5': 627,
     '6': 756,
     '7': NaN },
  byRelev: 
   { '0.4': 271,
     '0.6': 0,
     '0.8': 0,
     '1.0': 0,
     '1638.6': NaN,
     '1638.4': NaN,
     '1638.2': NaN,
     '1638.0': NaN,
     '1637.8': NaN,
     '1637.6': NaN,
     '1637.4': NaN,
     '1637.2': NaN,
     '1637.0': NaN } }

at

lastval = lastval - *it;

Pass idx => group idx mapping to cpp

Long term, we will need a mapping of literal index order to "logical" index order in cpp land to make intelligent decisions in setRelevance, spatialMatch.

Should be straightforward to add + pass back as a std::vector of ints.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.