Code Monkey home page Code Monkey logo

librapid's Introduction

C++ Version License Discord


Continuous Integration Documentation Status


Documentation


Simple Demo

What is LibRapid?

LibRapid is an extremely fast, highly-optimised and easy-to-use C++ library for mathematics, linear algebra and more, with an extremely powerful multidimensional array class at it's core. Every part of LibRapid is designed to provide the best possible performance without making the sacrifices that other libraries often do.

Everything in LibRapid is templated, meaning it'll just work with almost any datatype you throw at it. In addition, LibRapid is engineered with compute-power in mind, meaning it's easy to make the most out of the hardware you have. All array operations are vectorised with SIMD instructions, parallelised via OpenMP and can even be run on external devices via CUDA and OpenCL. LibRapid also supports a range of BLAS libraries to make linear algebra operations even faster.

GPU Array

What's more, LibRapid provides lazy evaluation of expressions, allowing us to perform optimisations at compile-time to further improve performance. For example, dot(3 * a, 2 * transpose(b)) will be compiled into a single GEMM call, with alpha=6, beta=0, transA=false and transB=true.

Why use LibRapid?

If you need the best possible performance and an intuitive interface that doesn't sacrifice functionality, LibRapid is for you. You can fine-tune LibRapid's performance via the CMake configuration and change the device used for a computation by changing a single template parameter (e.g. librapid::backend::CUDA for CUDA compute).

Additionally, LibRapid provides highly-optimised vectors, complex numbers, multiprecision arithmetic (via custom forks of MPIR and MPFR) and a huge range of mathematical functions that operate on all of these types. LibRapid also provides a range of linear algebra functions, machine learning activation functions, and more.

When to use LibRapid

  • When you need the best possible performance
  • When you want to write one program that can run on multiple devices
  • When you want to use a single library for all of your mathematical needs
  • When you want a simple interface to develop with

When not to use LibRapid

  • When you need a rigorously tested and documented library
    • LibRapid is still in early development, so it's not yet ready for production use. That said, we still have a wide range of tests which are run on every push to the repository, and we're working on improving the documentation.
  • When you need a well-established library.
    • LibRapid hasn't been around for long, and we've got a very small community.
  • When you need a wider range of functionality.
    • While LibRapid implements a lot of functions, there are some features which are not yet present in the library. If you need these features, you may want to look elsewhere. If you would still like to use LibRapid, feel free to open an issue and I'll do my best to implement it.

Documentation

Latest Documentation
Develop Branch Docs

LibRapid uses Doxygen to parse the source code and extract documentation information. We then use a combination of Breathe, Exhale and Sphinx to generate a website from this data. The final website is hosted on Read the Docs.

The documentation is rebuilt every time a change is made to the source code, meaning it is always up-to-date.

Current Development Stage

At the current point in time, LibRapid C++ is being developed solely by me (pencilcaseman).

I'm currently a student in my first year of university, so time and money are both tight. I'm working on LibRapid in my spare time, and I'm not able to spend as much time on it as I'd like to.

If you like the library and would like to support its development, feel free to create issues or pull requests, or reach out to me via Discord and we can chat about new features. Any support is massively appreciated.

The roadmap is a rough outline of what I want to get implemented in the library and by what point, but please don't count on features being implemented quickly -- I can't promise I'll have the time to implement everything as soon as I'd like... (I'll try my best though!)

If you have any feature requests or suggestions, feel free to create an issue describing it. I'll try to get it working as soon as possible. If you really need something implemented quickly, a small donation would be appreciated, and would allow me to bump it to the top of my to-do list.

Dependencies

LibRapid has a few dependencies to improve functionality and performance. Some of these are optional, and can be configured with a CMake option. The following is a list of the external dependencies and their purpose (these are all submodules of the library -- you don't need to install anything manually):

Submodules
External
  • OpenMP - Multi-threading library
  • CUDA - GPU computing library
  • OpenCL - Multi-device computing library
  • OpenBLAS - Highly optimised BLAS library
  • MPIR - Arbitrary precision integer arithmetic
  • MPFR - Arbitrary precision real arithmetic
  • FFTW - Fast(est) Fourier Transform library

Star History

Contributors

Support

Thanks to JetBrains for providing LibRapid with free licenses for their amazing tools!

JetBrains

librapid's People

Contributors

athulmekkoth avatar dependabot[bot] avatar nervousnullptr avatar pencilcaseman avatar tcmetzger avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

librapid's Issues

Array BLAS Functions

Add support for calling gemm or dot, for example, directly on Array objects. This could dramtically simplify the implementation of the code as well :)

Test Issue

This is a test issue with some code which should be Carbonited

for i in range(100):
    print("Hello, World")

print("Goodbye, World")

Does this work?

Matrix Transposition

Add support for matrix transpositions.

This should also allow the conversion of a vector array into a column vector with dimensions Nx1.

Can this be lined with matrix multiplication (#141) to provide higher performance matrix operations? Transposing a matrix and then doing a matrix product with it automatically uses the gemm with transposed arguments?

Test Issue

Below is a code block which should be turned into an image :)

for i in range(123):
    print("Hello, World!")

Matrix Transpose Bugs

Matrix transposition works fine if all the arrays are the correct size, but it's possible that memory errors could arise if the arrays are not correctly sized.

Array Manipulation Error

Not sure exactly where the error is, but the following code doesn't work under the current Development and Master branches.

lrc::Array<float> val(lrc::Shape({2, 2}));
lrc::Array<float> lower1(lrc::Shape({2, 2}));
lrc::Array<float> upper1(lrc::Shape({2, 2}));
lrc::Array<float> lower2(lrc::Shape({2, 2}));
lrc::Array<float> upper2(lrc::Shape({2, 2}));

val << 1, 2, 3, 4;
lower1 << 0, 0, 0, 0;
upper1 << 10, 10, 10, 10;
lower2 << 0, 0, 0, 0;
upper2 << 100, 100, 100, 100;

fmt::print("{}\n\n", val);
fmt::print("{}\n\n", lower1);
fmt::print("{}\n\n", upper1);
fmt::print("{}\n\n", lower2);
fmt::print("{}\n\n", upper2);
fmt::print("{}\n", lrc::map(val, lower1, upper1, lower2, upper2));

Wait for more powerful hosted-runners

Hopefully, by the end of Q3 2022, GitHub Actions will support more powerful, custom runners. This will dramatically reduce the build-times for LibRapid wheels as well as allowing for a more fully-featured package due to fewer limitations on RAM and CPU power

github/roadmap#161

Matrix product

Support 2D gemm functionality, as well as higher-dimensional products

CUDA support is also required

Array Copying

Create a function for strided array copying, otherwise the entire array library will crash with non-trivial arrays.

The main issue will occur within the multiarray_operations.hpp file, as makeSameAccelerator does not take into account strides, which will lead to some form of segfault or simply using the wrong values.

Array Slicing

Is your feature request related to a problem? Please describe.
There is currently no way of accessing sub-arrays without manually iterating over them, which is sometimes difficult and inconvenient.

Describe the solution you'd like
Some sort of ArraySlice object that can be used as a strided view of an Array object.

Describe alternatives you've considered

  • Use a strided array to begin with? Doesn't feel optimal
  • Do not allow strided access, but allow for sub-array access (parent data pointer)

Benchmarks

Benchmark the Array type and other classes and helpers

Compare against

  • Numpy
  • XTensor
  • Eigen
  • Boost multidimensional array

Code Cleanup

The code is becoming increasingly fragmented -- this makes development more difficult and leads to unwanted and unexpected bugs.

Each file should implement only one thing, and header files should not include any other headers. Instead, files should be included in librapid.hpp in the correct order, and STL includes should be at the top of config.hpp

Matrix Transposition not working

Tested in Python


View raw code

import librapid as lrp

x = lrp.Array(lrp.Extent((1000, 1000)), "f32")
x.transpose() # This doesn't work -- list type is invalid
x.transpose(lrp.Extent()) # This gives the wrong output. Possibly a missing copy?


Remove "patch" definitions

Describe the bug
The patch() functions defined in the multiprecision source files when multiprecision is NOT enabled should be removed/fixed

To Reproduce
Steps to reproduce the behavior:

  1. Include LibRapid without LIBRAPID_USE_MULTIPREC

Minimal Reproducable Example

#include <librapid>

int main() {
    fmt::print("Hello, World\n");
    return 0;
}

Expected behavior
No warnings

Stack Traces

multiprecModAbs.cpp.obj : warning LNK4006: "int __cdecl patch(int)" (?patch@@YAHH@Z) already defined in multiprecTrig.cpp.obj; second definition ignored
multiprecHypot.cpp.obj : warning LNK4006: "int __cdecl patch(int)" (?patch@@YAHH@Z) already defined in multiprecTrig.cpp.obj; second definition ignored
multiprecFloorCeil.cpp.obj : warning LNK4006: "int __cdecl patch(int)" (?patch@@YAHH@Z) already defined in multiprecTrig.cpp.obj; second definition ignored
multiprecExpLogPow.cpp.obj : warning LNK4006: "int __cdecl patch(int)" (?patch@@YAHH@Z) already defined in multiprecTrig.cpp.obj; second definition ignored
multiprecCasting.cpp.obj : warning LNK4006: "int __cdecl patch(int)" (?patch@@YAHH@Z) already defined in multiprecTrig.cpp.obj; second definition ignored

Matrix Multiplication

Add support for:

  • Matrix-matrix multiplication
  • Matrix-vector multiplication
  • Vector product

Matrix Transposition is too Slow

Matrix Transposition is currently trivially implemented, and it quite slow compared to other libraries. This should be optimized for all dimensions of array, though specifically for matrices where the matrix transpose is performance critical in many applications.

Ideas:

  1. OpenBlas *omatcopy
  2. Hard-coded matrix transpose for 2D
  3. Vectorised transpose (See transpose.hpp)
  4. Help me...

Optimise Array Slice Performance

Array slicing is functional, but is not optimised to the extent required by LibRapid. Additionally, while it does work with CUDA, there are no specific routines for it and hence device->host->device copies are required for every value, making it incredibly slow.

Optimisation + Simplification

Can the code in multiarray_operations.hpp at the top of the functions (to ensure everything is in the same place) be altered to use only malloc and memcpy?

Another Test Issue

This is some code:


View raw code

# -*- coding: utf-8 -*-
import os
import platform
import shutil
import sys
import site
from packaging.version import LegacyVersion
from skbuild import setup
from skbuild.cmaker import get_cmake_version
from skbuild.exceptions import SKBuildError

# Copy OpenBLAS build if present in the root directory
if os.path.exists("openblas_install") and not os.path.exists(os.path.join("src", "librapid", "openblas_install")):
    shutil.copytree("openblas_install", os.path.join("src", "librapid", "openblas_install"))

# Remove _skbuild directory if it already exists. It can lead to issues
if os.path.exists("_skbuild"):
    shutil.rmtree("_skbuild")

# Remove the _librapid_python_cmake directory if it's present. This can cause more issues...
if os.path.exists("_librapid_python_cmake"):
    shutil.rmtree("_librapid_python_cmake")

# If the directory "src/librapid/blas" is empty and "src/librapid/openblas_install" is empty,
# run CMake to automatically detect BLAS before installing the Python library
if not os.path.exists(os.path.join("src", "librapid", "blas")) and not os.path.exists(os.path.join("src", "librapid", "openblas_install")):
    out = os.system("mkdir _librapid_python_cmake && cd _librapid_python_cmake && cmake ..")
    if out != 0:
        print("\nCMake failed to run correctly, so it is likely that BLAS will not be installed with LibRapid")

# Add CMake as a build requirement if cmake is not installed or is too low a version
setup_requires = []
install_requires = []

try:
    if LegacyVersion(get_cmake_version()) < LegacyVersion("3.10"):
        setup_requires.append('cmake')
        install_requires.append("cmake")
except SKBuildError:
    setup_requires.append('cmake')
    install_requires.append("cmake")


Array to String

Add functionality for converting arrays to strings for printing purposes.

Conan Support

Hey! This library looks really cool. For your awareness, I've raised a request for a recipe for this library to be created for the Conan Center Index . Conan is a C++ package manager that I think would help distribute your library with dependency management, especially with respect to pulling openblas in. If you're interested in supporting a conan recipe, I'm sure they would love to receive a pull request.

Vector to string

Allow the creation of a string representation of a vector object (ideally with formatting options)

CUDA support in Python Wheels

Have a look into cuda-toolkit and think about getting CUDA support in Python Wheels. This would require some sort of pip install librapid_cuda_11_5, for example, which would also need to be set up

Test Issue

This is a test issue


View raw code

while True:
    print("This is going to be made into an image!")

print("Hehehe this won't run")


Python Iterator Memory Leak

Arrays do not get freed (or get over-freed) when iterating in python.

To replicate:


View raw code

import librapid as lrp
x = lrp.Array([1, 2, 3])
for val in x:
    print(val)


cuBLAS Handles not Initialized in Python Library

When using cuBLAS functions in the python library, the cuBLAS handles are not being initialised and hence an error is thrown.

Example:


View raw code

import librapid as lrp

x = lrp.Array(lrp.Extent(1000, 1000), "f32", "gpu")
res = x.dot(x)


Should be fixable by changing where the handle initialisation occurs.

Scalar operations assignment operator

need to have a closer look, but this feels wrong:


View raw code

ScalarSum<LHS, RHS> &operator=(const ScalarSum<LHS, RHS> &other) { return *this; }


Fix support for Boolean arrays

Currently, Boolean arrays pose a few issues, the first of which is that each element is stored as a single bit, not a whole byte. This dramatically improves memory efficiency and increases performance, however it also causes a lot of problems with algorithm design and interoperability with other array datatypes.

Currently, the major issue is that any form of logical operation will result in a Vc::Mask being returned from the SIMD packet instruction, whereas the result will be expecting a Vc::Vector type. This will, of course, produce a compile-time error.

Current steps to solve problem

  1. Logical operation factory -- define a logical operation which automatically returns a Boolean array, regardless of input type
  2. Rewrite Boolean array class to operate more nicely with Vc::Mask datatypes
    • This includes adding a loadFrom function which will accept a Vc::Mask to allow for direct loading from a Vc logical comparison
  3. Casting support to and from Vc::Mask datatypes?

Other ideas:

  • Can Boolean arrays be stored directly as arrays of Vc::Mask?

IN THE MEANTIME, SUPPORT FOR BOOLEAN ARRAYS IN THE PYTHON LIBRARY WILL BE DISABLED

Assigning non-equal length array to boolean array

Once again, more problems with the boolean array.

This must be fixed by #102

Reproducable example:


View raw code

lrc::Array<int, lrc::device::CPU> arr(lrc::Extent(10));
arr = lrc::Array<int>(lrc::Extent(100));
arr.fill(123);
fmt::print("Array: {}\n", arr);


Test Issue


View raw code

print("This is a test issue")
# Does this get formatted?
for i in range(10):
    print("Hello, World!")


Error copying from GPU to CPU

For some reason, the following code errors after 4 iterations. Most likely a missing copy somewhere, but not quite sure.


View raw code

lrc::Array<float, lrc::device::GPU> gpuArray(lrc::Extent(3, 4));
lrc::Array<float, lrc::device::CPU> cpuArray(lrc::Extent(3, 4));
for (lrc::i32 i = 0; i < 10000; i++) {
    fmt::print("Doing thing {}\n", i);
    cpuArray = (gpuArray * 2);
}


Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.