Code Monkey home page Code Monkey logo

nativejit's Introduction

BitFunnel

This repo contains the code for the BitFunnel index used by Bing's super-fresh, news, and media indexes. The algorithm is described in BitFunnel: Revisiting Signatures for Search, a paper presented at SIGIR 2017. This video gives a good overview of the algorithm.

The codebase here was published to allow the research community to replicate results from the SIGIR paper. The documentation is pretty thin, but we encourage you to look at the following :

Build status Build status

Dependencies

In order to build BitFunnel you will need CMake (2.8.11+), and a modern C++ compiler (gcc 5+, clang 3.5+, or VC 2015+). You can run CMake directly to generate the appropriate build setup for your platform. Alternately, we have some scripts that have the defaults that we use available.

*nix

For *nix platforms (including OS X),

./Configure_Make.sh
cd build-make
make
make test

Note that while these instructions are for a make build, it's also possible to build using ninja by changing the cmake command to create ninja files instead of Makefiles. These aren't listed in the instructions because ninja requires installing an extra dependency for some developers, but if you want to use ninja it's available via apt-get, brew, etc., and is susbtantially faster than make.

Ubuntu

If you're on Ubuntu 15+, you can install dependencies with:

sudo apt-get install clang cmake

On Ubuntu 14 and below, you'll need to install a newer version of CMake. To install a new-enough CMake, see this link. If you're using gcc, you'll also need to make sure you have gcc-5 (sudo apt-get install g++-5).

To override the default compiler, set the CXX and CC environment variables. For example, if you have clang-3.8 installed as clang-3.8 and are using bash:

export CXX="clang++-3.8"
export CC="clang-3.8"

OS X

Install XCode and then run the following command to install required packages using Homebrew (http://brew.sh/):

brew install cmake

BitFunnel can be built on OS X using either standard *nix makefiles or XCode. In order to generate and build makefiles, in the root BitFunnel directory run:

If you want to create an Xcode project instead of using Makefiles, run:

./Configure_XCode.sh

If you use XCode, you'll have to either re-run Configure_XCode or run the ZERO_CHECK target when the CMakeLists changes, e.g., when source files are added or removed.

Windows

You will need these tools:

Note: If you install Visual Studio for the first time and select the default install options, you won't get a C++ compiler. To force the install of the C++ compiler, you need to either create a new C++ project or open an existing C++ project.

Clone the BitFunnel repository and then run the following command in BitFunnel's root folder:

.\Configure_MSVC.bat

Note: You will need to modify the CMake -G option if you use a different version of Visual Studio. Bitfunnel must be built as a 64-bit program, so 'Win64' must be part of the specified G option text.

At this point, you can open the generated solution BitFunnel_CMake.sln from Visual Studio and then build it. Alternatively, you can build from the command line using cmake --build build-MSVC.

nativejit's People

Contributors

danluu avatar hausdorff avatar jondgoodwin avatar mikehopcroft avatar nabijaczleweli avatar nibroc avatar theantony avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nativejit's Issues

Variadic templated `Function` (instead of only allowing 0-4 params)

This was originally implemented the way it's done in the existing code because variadic templates weren't available in VC++ when the project was started, and on Windows the first four params are passed as register, so adding more than four params is a new code path.

VC++ is now at least semi-modern so this should be possible now.

Use COMBINE_FILE_LISTS in cmakelists.txt

This strategy is used in BitFunnel. Just need to define COMBINE_FILE_LISTS() macro in the top level CMakeLists.txt and then use in the lower level CMakeLists.txt files to replace

if (NATIVEJIT_PLATFORM_WINDOWS)
set(CPPFILES ${CPPFILES} ${WINDOWS_CPPFILES})
set(PUBLIC_HFILES ${PUBLIC_HFILES} ${WINDOWS_PUBLIC_HFILES})
set(PRIVATE_HFILES ${PRIVATE_HFILES} ${WINDOWS_PRIVATE_HFILES})
else (NATIVEJIT_PLATFORM_WINDOWS)
set(CPPFILES ${CPPFILES} ${POSIX_CPPFILES})
set(PUBLIC_HFILES ${PUBLIC_HFILES} ${POSIX_PUBLIC_HFILES})
set(PRIVATE_HFILES ${PRIVATE_HFILES} ${POSIX_PRIVATE_HFILES})
endif (NATIVEJIT_PLATFORM_WINDOWS)

Travis CI flakiness from failed apt-get install

https://travis-ci.org/BitFunnel/NativeJIT/builds/129207321

https://travis-ci.org/BitFunnel/NativeJIT/builds/130683483

https://travis-ci.org/BitFunnel/NativeJIT/builds/130904044

The 2nd of those three builds fails with:

E: Failed to fetch http://llvm.org/apt/precise/pool/main/l/llvm-toolchain-3.8/libllvm3.8_3.8~svn269657-1~exp1_amd64.deb  404  Not Found
E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
apt-get.diagnostics
apt-get install failed

Despite no change in the travis config file.

Protect against orphan nodes

inc/NativeJIT/Nodes/Node.h

        // Increments the number of node's parents that will evaluate the node
        // through the Node<T>::CodeGen() method. The node will be evaluated
        // only once, but the result will also be stored in cache with a
        // matching number of references. The cache will be released once all
        // parents evaluate the node.
        // IMPORTANT: Currently, there's an assumption that if a node is created,
        // it must be placed inside the tree. TODO: Remove this assumption and
        // allow for optimizing away unused nodes.
        void IncrementParentCount();

Fix: maintain count of nodes created and count of nodes in tree. Assert fail if they don't match.

Note that not fixing this will only cause more code to get generated than is necessary. In our first pass, if we have an orphan root with children, code will get generated that can't be reached through any reasonable means.

Speed up Travis CI by removing sudo

We use trusty in order to get a new enough cmake without installing cmake. This presently requires sudo, which prevents us from using dockerized Travis, which is faster. This is ok for now since we have other priorities and low CI load, but we should fix this at some point.

FreeList displacement???

    // TODO: Perhaps this should return a Data*.
    // If nothing is displaced, returns nullptr.
    // Otherwise returns Data* for displaced.
    template <unsigned SIZE>
    void ExpressionTree::FreeList<SIZE>::Allocate(unsigned id)
    {

We believe this comment is irrelevant because this function doesn't displace anything. It's possible comment became obsolete when the concept of storage was added.

Static initialization order fiasco

Register.h

    // TODO: Need to avoid "static initialization order fiasco" for register definitions.
    // See http://www.parashift.com/c++-faq/static-init-order.html.
    // Plan is to use constexpr when VS2013 becomes available to Bing build.

Constexpr is available for us.

Packed should be moved

inc/NativeJIT/Packed.h

// TODO: This class and PackedMinMaxNode should probably move from NativeJIT to BitFunnel.Library.

Inject LICENSEs

I'm personally not a fan of this idea, but since we agreed to do this I don't mind taking this one.

Use constexpr for register constants

inc/NativeJIT/Nodes/ParameterNode.h

        // Integer parameters are passed in RCX, RDX, R8, and R9.
        // Use constants to encode registers. See #31.
#ifdef NATIVEJIT_PLATFORM_WINDOWS
        const uint8_t idMap[] = {1, 2, 8, 9};
#else
        const uint8_t idMap[] = {7, 6, 2, 1, 8, 9};
#endif

See also #24

Mismatched new[], delete

I've fixed a few of these in BitFunnel, but I don't think I was on the lookout for these with NativeJIT. valgrind can find these, though:

==13082== Mismatched free() / delete / delete []
==13082==    at 0x4C2F24B: operator delete(void*) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==13082==    by 0x538D3A: std::default_delete<char>::operator()(char*) const (unique_ptr.h:76)
==13082==    by 0x538AC2: std::unique_ptr<char, std::default_delete<char> >::~unique_ptr() (unique_ptr.h:236)
==13082==    by 0x538833: NativeJIT::Allocator::~Allocator() (Allocator.cpp:48)
==13082==    by 0x488005: NativeJIT::TestFixture::~TestFixture() (in /home/leah/dev/NativeJIT/build-make/test/CodeGen/CodeGenTest)
==13082==    by 0x487F94: NativeJIT::CodeGenUnitTest::CodeGen::~CodeGen() (in /home/leah/dev/NativeJIT/build-make/test/CodeGen/CodeGenTest)
==13082==    by 0x487DC4: NativeJIT::CodeGenUnitTest::CodeGen_JCC_Test::~CodeGen_JCC_Test() (in /home/leah/dev/NativeJIT/build-make/test/CodeGen/CodeGenTest)
==13082==    by 0x487DE8: NativeJIT::CodeGenUnitTest::CodeGen_JCC_Test::~CodeGen_JCC_Test() (in /home/leah/dev/NativeJIT/build-make/test/CodeGen/CodeGenTest)
==13082==    by 0x568B3A: testing::Test::DeleteSelf_() (in /home/leah/dev/NativeJIT/build-make/test/CodeGen/CodeGenTest)
==13082==    by 0x57E7C9: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (in /home/leah/dev/Nativ
eJIT/build-make/test/CodeGen/CodeGenTest)
==13082==    by 0x568256: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest.cc:2438)
==13082==    by 0x54D688: testing::TestInfo::Run() (gtest.cc:2661)
==13082==  Address 0x5cedf80 is 0 bytes inside a block of size 8,192 alloc'd
==13082==    at 0x4C2E80F: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==13082==    by 0x53875B: NativeJIT::Allocator::Allocator(unsigned long) (Allocator.cpp:40)
==13082==    by 0x53826A: NativeJIT::TestFixture::TestFixture(unsigned int, unsigned int, std::ostream*) (TestSetup.cpp:52)
==13082==    by 0x5381B5: NativeJIT::TestFixture::TestFixture() (TestSetup.cpp:39)
==13082==    by 0x487F5E: NativeJIT::CodeGenUnitTest::CodeGen::CodeGen() (CodeGenTest.cpp:40)
==13082==    by 0x487F1E: NativeJIT::CodeGenUnitTest::CodeGen_JCC_Test::CodeGen_JCC_Test() (CodeGenTest.cpp:46)
==13082==    by 0x487ECD: testing::internal::TestFactoryImpl<NativeJIT::CodeGenUnitTest::CodeGen_JCC_Test>::CreateTest() (in /home/leah/dev/NativeJIT/build-make/test/CodeGen/CodeGenTest)
==13082==    by 0x57E8F9: testing::Test* testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::TestFactoryBase, testing::Test*>(testing::internal::TestFactoryBase*, te
sting::Test* (testing::internal::TestFactoryBase::*)(), char const*) (in /home/leah/dev/NativeJIT/build-make/test/CodeGen/CodeGenTest)
==13082==    by 0x5688A6: testing::Test* testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::TestFactoryBase, testing::Test*>(testing::internal::TestFactoryBase*, testi
ng::Test* (testing::internal::TestFactoryBase::*)(), char const*) (gtest.cc:2438)
==13082==    by 0x54D61B: testing::TestInfo::Run() (gtest.cc:2647)
==13082==    by 0x54DE26: testing::TestCase::Run() (gtest.cc:2774)
==13082==    by 0x556093: testing::internal::UnitTestImpl::RunAllTests() (gtest.cc:4649)

It looks like we're newing up an array, putting a pointer to that array in a unique_ptr, which doesn't know to call delete[] on destruction.

See also BitFunnel/BitFunnel#57.

Signed testing

Should we have better coverage for signed? Most of our tests are against uint.

Non-volatile register save & restore testing

Should we have better coverage?

        //   Test register save/restore across call.
        //      Caller save volatiles
        //   Test call register indirect.
        //   Test that tries all permutations of parameter from Function to Call.

Remove SuiteCpp

We should remove SuiteCpp and then rename all of the test asserts to use the GTest syntax (e.g. TestEqual will become ASSERT_EQ). Also remove any shims that allow SuiteCpp to be used in place of GTest.

Avoid extra MOV for Return(Immediate(foo))

It would be nice if ReturnNode::CompileAsRoot could generate into RAX instead of moving.

Consider example of a tree that is just Return(Immediate(5)). Right now this generates

mov r15, 5
mov rax, r15

Decide what to do with files in inc/Temporary

These are allocator interfaces that will also be used in BitFunnel. Options seem to be

  1. Maintain separate versions, one in the NativeJIT namespace, and one in the BitFunnel namespace.
  2. Make both NativeJIT and BitFunnel take a dependency on a 3rd repo that provides these interfaces.
  3. Make BitFunnel depend on NativeJIT for these interfaces.

Replace test asserts with appropriate comparisons

We have, for example TestAssert(foo == bar), which should be replaced with TestEqual(foo, bar) so that we get better debug info on failures.

We should be able to do all of these replacements in one shot with a simple script. I'm not doing this right now because there's an outstanding PR that adds tests in a style that matches the "local" style in the file and I want to wait until that's merged so that we don't have a merge conflict.

Register allocator uses unnecessary registers

Consider the following C++ code:

    Function<float, float> expression(allocator, code);
    auto & rsquared = expression.Mul(expression.GetP1(), expression.GetP1());
    const float  PI = 3.14159265358979f;
    auto & area = expression.Mul(rsquared, expression.Immediate(PI));
    auto function = expression.Compile(area);

On Windows this code generates something like

sub    rsp, 28h                     
mov    qword ptr[rsp], rbp
movaps xmmword ptr[rsp + 10h], xmm15
lea    rbp, [rsp + 28h]

movss  xmm15, xmm0                  
mulss  xmm15, xmm0                  
mulss  xmm15, dword ptr[PI_CONSTANT]
movss  xmm0, xmm15                  

movaps xmm15, xmmword ptr[rsp + 10h]
mov    rbp, qword ptr[rsp]
add    rsp, 28h
ret

This code is inefficient because it uses xmm15 when it could have used xmm0. Would have expected

mulss  xmm0, xmm0                  
mulss  xmm0, dword ptr[PI_CONSTANT]

Better documentation system

Our current system is a very simple manual hack. We should have some way to automatically extract known-good code and put it into our docs.

CI broken because of gcc 4.9

https://travis-ci.org/BitFunnel/NativeJIT/jobs/141203696

I grabbed the CI setup from BitFunnel because that has a fix for a problem here. However, it also reverts gcc from gcc5 to gcc 4.9, which causes this failure:

/home/travis/build/BitFunnel/NativeJIT/src/CodeGen/FunctionBuffer.cpp: In member function ‘virtual void NativeJIT::FunctionBuffer::Reset()’:
/home/travis/build/BitFunnel/NativeJIT/src/CodeGen/FunctionBuffer.cpp:243:27: error: missing initializer for member ‘RUNTIME_FUNCTION::BeginAddress’ [-Werror=missing-field-initializers]
         m_runtimeFunction = {};
                           ^
/home/travis/build/BitFunnel/NativeJIT/src/CodeGen/FunctionBuffer.cpp:243:27: error: missing initializer for member ‘RUNTIME_FUNCTION::EndAddress’ [-Werror=missing-field-initializers]
/home/travis/build/BitFunnel/NativeJIT/src/CodeGen/FunctionBuffer.cpp:243:27: error: missing initializer for member ‘RUNTIME_FUNCTION::UnwindData’ [-Werror=missing-field-initializers]

I think Alex inserted some junk in BitFunnel to fix this gcc 4.9 issue in BitFunnel and that we can probably do something similar here.

JumpTable::PatchCallSites() dangerous casts

    // WARNING: Non portable. Assumes little endian machine architecture.                                                                                                                
    // WARNING: Non portable. Assumes that fixup value is labelAddress - siteAddress - size.Size().                                                                                      
    void JumpTable::PatchCallSites()
    {                                                                                                                                                                            
        for (size_t i=0; i < m_callSites.size(); ++i)
        {                                                                                                                                                                        
            const CallSite& site = m_callSites[i];
            const uint8_t* labelAddress = AddressOfLabel(site.GetLabel());
            uint8_t* siteAddress = site.Site();
            size_t delta = labelAddress - siteAddress - site.Size();

            // TODO: Evaluate whether special cases for size == 2 and size == 4 actually improve performance.                                                                            
            size_t size = site.Size();
            if (size == 2)
            {
                *((int16_t*)siteAddress) = (int16_t)delta;
                siteAddress += size;
            }
            else if (size == 4)
            {
                *((int32_t*)siteAddress) = (int32_t)delta;
                siteAddress += size;
            }
            else
            {
                while (size > 0)
                {
                    *siteAddress++ = (uint8_t)delta;
                    delta = delta >> 8;
                    size--;
                }
            }
        }
    }

I'm enabling -Wold-style-cast (see BitFunnel/BitFunnel#37) and converting these to C++ style casts, but this function still seems dangerous.

Remove sethi-ullman LabelSubtree

In CallNode.h

                // TODO: Need to account for the fact the extra registers may be used
                // when the specific parameter home is different than the one returned
                // duing code generation.

IosMiniStateRestorer

X64CodeGenerator.h:

    // TODO: Move this class to a separate header

Note that Ios stands for iostream

Set up CI

Filing bugs on our action items because our action items from the last two meetings have a very large intersection. We should just have bugs to refer to instead of having to refresh the list in meetings so it's not forgotten. Bug assignment is somewhat arbitrary. Please feel free to re-assign.

It sounded like we were going to look into Travis and Appveyor?

Yet another bogus Travis CI failure

See travis-ci/travis-ci#6120 (comment) for a workaround. Heading out, will look at this later.

More generally, the majority (all?) of the Travis-CI errors I've caused so far have been due to bogus fails. Is there a real alternative to Travis CI we can use or is this as good as it gets if we don't set up our own CI infrastructure?

Determine strategy for register allocation on conditionals

In ConditionalNode.h

        // TODO: Evaluating both expressions in advance of the test is
        // sub-optimal, but it is currently required to guarantee consistent
        // state: the execution in NativeJIT has a continuous flow regardless
        // of the outcome of the (runtime) condition whereas the generated x64
        // code has two branches and each of them can have independent impact on
        // allocated and spilled registers. To have each expression evaluated
        // only if its branch is hit, the NativeJIT code would have to ensure
        // that the state of the allocated/spilled registers (i.e. all Storages)
        // is consistent once those two x64 branches converge back.
        //
        // A related note: if this is addressed, then it may be possible to
        // also address the fact that common subexpression (CSE) code is
        // inefficient in some cases because some CSEs may not need to be
        // evaluated. For example:
        //   (v == 1)? a : ((v == 2) ? a + b : b + c)
        //
        // Depending on the value of v, either a or c may not need to be
        // evaluated. One possibility is to lazily evaluate CSEs *at runtime*
        // as they are needed.

Determine which VC++ warning removal pragmas we don't need

This code was originally written for an old version of VC++ which had a lot more quirks than 2015. Some of the warning removal pragmas can probably be removed, and many of the ones that can't be removed should probably have better documentation.

It shouldn't be that hard to write a script that removes the pragmas one-by-one to see which ones are no longer necessary.

Also, in some cases, we might be able to just change the code to work cross-platform without some kind of platform-specific #ifdef'd pragma.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.