jwetzl / cudalbfgs Goto Github PK

This is a cross-platform, CUDA-based C++ library for general-purpose, unconstrained nonlinear optimization on the GPU. It implements the L-BFGS (“Limited-memory Broyden-Fletcher-Goldfarb-Shanno“) method, a popular Quasi-Newton variant with a low memory footprint.

C 0.05% C++ 1.15% Cuda 1.71% Python 0.02% CMake 0.34% Objective-C 96.72%

cudalbfgs's Introduction

NOTE: This library was only tested with CUDA 4.x and
5.x and may not work with more recent versions. We
do not currently have the time to update it for more
recent CUDA versions, but would gladly accept pull
requests addressing this issue.

====================================================
   ___ _   _ ___   _     _       ___ ___ ___ ___ 
  / __| | | |   \ /_\   | |  ___| _ ) __/ __/ __|
 | (__| |_| | |) / _ \  | |_|___| _ \ _| (_ \__ \
  \___|\___/|___/_/ \_\ |____|  |___/_| \___|___/
                                                2012
     by Jens Wetzl           ([email protected])
    and Oliver Taubmann ([email protected])

This work is licensed under a Creative Commons
Attribution 3.0 Unported License. (CC-BY)
http://creativecommons.org/licenses/by/3.0/
====================================================

The CUDA L-BFGS library offers GPU based nonlinear
minimization implementing the L-BFGS method in CUDA.

There is no publication available that covers this 
library exclusively, but you may consider citing the 
paper it was introduced in:

Wetzl, J., Taubmann, O., Haase, S., Köhler, T., 
Kraus, M., and Hornegger, J. (2013). GPU-Accelerated 
Time-of-Flight Super-Resolution for Image-Guided 
Surgery. In Meinzer, H.-P., Deserno, T. M., Handels, 
H., and Tolxdorff, T., editors, Bildverarbeitung für 
die Medizin 2013, Informatik aktuell, pages 21–26. 
Springer Berlin Heidelberg.

====================================================
  BUILDING
====================================================

To build (and, if desired, install) the library,
you will need CMake (http://cmake.org). The default
settings should be fine for regular use, but there
are lots of options, e.g. you can

- build a reference implementetation on CPU with
  either float or double precision (requires Eigen),

- build test cases,

- enable error checking, verbose output and timing

- build example projects that demonstrate how the
  library is used (cf. /projects directory).

====================================================
  INCLUDING THE LIBRARY IN YOUR PROJECTS
====================================================

If you use CMake for your project, including the
CudaLBFGS library is jaw-droppingly easy. In your 
CMakeLists.txt file, add:

  find_package(CudaLBFGS REQUIRED)
  include_directories(${CUDALBDFS_INCLUDE_DIRS})
  # ...
  target_link_libraries(YourExecutable
                        ${CUDALBFGS_LIBRARIES})

If you installed the CudaLBFGS library in a non-
standard location, you may also have to set 
either the environment variable CMAKE_PREFIX_PATH
or the CMake variable CUDALBFGS_DIR.

====================================================
  USAGE
====================================================

The basic approach can be described as follows:

1. Implement your cost function in a class that
   inherits from the appropiate base class
   declared in cost_function.h

2. Create an object of class lbfgs (lbfgs.h)
   passing an object of your cost function class
   in the constructor. Adjust settings of lbfgs
   to your liking.
   
3. Run minimization providing an initial guess
   for the solution. Check the return code
   to know which stopping criterion was fulfilled.

cudalbfgs's People

Contributors

Stargazers

Watchers

Forkers

bebekifis guowt richychen chagge maydaygmail khs26 zhmz90 chhshen moushuai cucdn mingliangfu soledad89 caomw harsh-nod zuoyan007 minhpvo romainbrault alpc72 kareon77 codeaudit tomerwei songye-cui ibragim cedricartigue mikewerth1 chomolungma x-ma nsridhar1 mardukbp jameskeaveney ktsumura csdrs zeta1999 danhlephuoc borongyuan msnh2012 guowu-mcgill phyboyzhang efuchey rikurantanen

cudalbfgs's Issues

The script ignores Compute v2.1

There is a problem with cuda_compute_capability.c. It is caused by:

    if (major == 2 && minor == 1)
    {
        // There is no --arch compute_21 flag for nvcc, so force minor to 0
        minor = 0;
    }

See, the problem is that some Fermi cards do support Compute v2.1. In fact, Compute v2.1 exists (see https://en.wikipedia.org/wiki/CUDA#Supported_GPUs) but the way to activate that would be through setting the flags as -arch compute_20 -code sm_21.

The script currently assumes that whatever compute_xx is, sm should be also followed by the same number and be set as sm_xx. I've ran into problems with sm_20 on a machine that supports sm_21 before. For instance, I vaguely recall that numerical computations were more accurate with sm_21 than sm_20 (on the Caffe library if I recall). Considering the large number of CMake scripts out there that rely on this script, I hope the issue is fixed :)

Unfortunately, my knowledge about CMake is rather limited, otherwise I would've fixed it and submitted a pull request.

Results were different with different compute compatability

I wrote a program with cudalbfgs and tested it with 550Ti and GTX 760. The result with former card looks like normal, but result with GTX 760 is incorrect(most of the values are zero). So I am wondering do I have to be aware of something when using different card with different compute compatibility?

build a reference implementetation on CPU with
either float or double precision (requires Eigen),
build test cases,
enable error checking, verbose output and timing
build example projects that demonstrate how the
library is used (cf. /projects directory).

jwetzl / cudalbfgs Goto Github PK

cudalbfgs's Introduction

cudalbfgs's People

Contributors

Stargazers

Watchers

Forkers

cudalbfgs's Issues

The script ignores Compute v2.1

Results were different with different compute compatability

Line search failed

Uninitialized state variable

need more details on how to building?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent