Code Monkey home page Code Monkey logo

libgpuvm's People

Contributors

canonizer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

libgpuvm's Issues

arrays initially located on device

Add API support for arrays initially located on device. Such arrays are not copied to device during their first on-device use, and initially have their memory protected. If they are modified on host before being used on device, however, then they are unprotected and treated as other regular arrays

block mono GC signal during writer lock

otherwise, the thread holding the write lock will be suspended until GC is over. And since handling SIGSEGV during GC requires reader lock, the situation becomes deadlock-prone

one-sided transfer for arrays

sometimes, arrays are used for one-sided communication only, e.g. device -> host or host -> device. As they can be still modified on both sides, however, general way of handling them will require copying in both directions. It is possible, though, to add a flag indicating that arrays are copied one side only:

GPUVM_HOST_TO_DEVICE // copied host -> device
GPUVM_DEVICE_TO_HOST // copied device -> host

...
gpuvm_link(harr, sz, idev, darr, GPUVM_ON_HOST | GPUVM_HOST_TO_DEVICE);

support for mapping host arrays to device images

typically, host arrays are mapped into device buffers. However, performance tests indicate that sometimes more performance is provided by textures (images). An API support is therefore needed to map host arrays to images, with the same set of features as supported for mapping arrays into buffers

support for arrays used read-only on device

Some kernels use some of the arrays read-only, i.e. they do not modify them. While this is difficult to detect at runtime, this can be easily detected at compile time by the language. The information is then passed to runtime via API flags.

Read-only can be handled in a more relaxed way:

  • they are protected from writing only, reading on host does not cause SIGSEGV and data copying
  • they can be used simultaneously by multiple devices

linux ati app sdk 2.5 - segfault (periodically)

Linux, AMD APP SDK 2.5, HD 5830 segfaults from time to time when running a single-threaded sample on a single GPU. Note that it has never segfaulted under NUDA (large arrays?), and there are no problems when running 2 threads simultaneously on a single GPU. There are also no problems on Linux with NVidia GPUs

discarding data on host or device

sometimes, an array is used to transfer data one way only, host -> device or device -> host. In this case, there's no need to copy data in opposite direction; however, this copying is still done. In order to avoid unnecessary copying, it is possible to add ability to discard data, e.g.:

void gpuvm_discard(void *ptr, int flags, unsigned dev);

if data is discarded on device, then protection on host is removed, and host data is considered actual; device data is not copied back. If data is going to be overwritten on host, it saves one data copy

do not stop OpenCL worker threads

in multi-threaded version, stopping OpenCL working threads results in application deadlock. Avoid doing it by noticing which threads are created by OpenCL runtime, and not stopping those threads

use balancing trees for maps used inside libgpuvm

use balancing trees (i.e., AVL trees) for maps inside libgpuvm. Such maps include mapping ranges to regions and mapping thread ids to per-thread locking semaphores. AVL trees will increase performance mildly

lion apple sdk ati - 2-threaded test deadlocks (rarely)

Mac OS X Lion, Apple SDK (OpenCL 1.1), ATI HD4670 - 2-threaded version of the test rarely deadlocks. Note that there are no problems with single-GPU test.

Also, there is a problem with CPU OpenCL. This one, however, may be caused entirely by stopping currently running OpenCL threads, and thus can hardly be fixed.

use getdents() to get thread list on linux

the older 2 readdir() system call (not 3 readdir(), which is C library call), which is currently used, is not available on 64-bit linux systems, so use the supported call

switch to portable semaphore implementation

it will use unnamed POSIX semaphores on linux, and mach semaphores on Darwin (because, surprisingly, there are no unnamed POSIX semaphores on Darwin!). Hope it will work fast...

deadlocks in some samples on some devices

There appear to be deadlocks in some samples on some devices:

  1. add-arrays-ngpu (NVidia, rather often)
  2. add-arrays-ondev (NVidia rarely, AMD almost immediately)

I was not able to find any bugs on NVidia for add-arrays or add-arrays-ro samples.

All bugs appear in the form of deadlock, and never as false data read. As to the causes of the bug, I suspect linux futexes, i.e. deadlock always appears as a futex wait. Sometimes, however, a deadlock appears as a suspended tgkill() syscall. It might be an error in Linux futex() call, although my own code is worth checking first

true device-side caching of arrays

Implement true device-side caching of arrays. That is, automatically allocating support buffers in GPU when host array is first used on-GPU, and deallocating them in case of memory shortage

non-racing thread stopping

make thread stopping non-racing. Currently, stopping is implemented with a single semaphore; for a thread to stop, it is sent a signal. In the signal handler, the thread waits on a semaphore, which is raised when the threads can be resumed. However, if there are 2 threads, say, 1 and 2, it is possible for thread 1 to unblock, block again and then "eat" semaphore raising which was intended for thread 2. Thus, thread 1 does not get blocked when it is assumed to, and incorrect results follow.

To avoid this, a different scheme needs to be implemented for thread stopping/resuming, so that a stopped thread cannot "eat" a resume directed to another thread.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.