Comments (3)
@LTOctum That sounds like an interesting problem. CopyFrom
and CopyTo
do not differ much in their functionality and are not known to cause significant overhead. I recommend allocating memory in advance with a CPUAccelerator
instance for best performance. In this case, the memory buffer is allocated in native memory and the pointer is automatically pinned to reduce the time it takes to copy data to the CPU.
As far as I can see, you are using a managed memory buffer that causes a GCHandle
to be allocated to pin the memory pointers. This can lead to significant runtime performance overhead. Furthermore, I recommend using AcceleratorStream
s to perform async copy operations.
May be you can create a simplified example that reproduces this issue?
from ilgpu.
@m4rs-mt The way with the CPUAccelerator sounds interesting, do you have a code snipped how to use the buffer allocated by CPUAccelerator withing CudaAccelerator?
var item = new ArrayPoolItem(packedBufferSize, _bytePool);
_lastOutputBuffer8Bit.CopyTo(item.Value, 0, 0, packedBufferSize);
Is not using any GCHandle, it's just renting the same array from an pool (a quick optimization as LOH would fragmentate over time with arrays bigger then 85kb (the array has a size of ~34mb)).
The time is the same with:
_lastOutputBuffer8Bit.GetAsArray()
I will provide you with an simplified example.
from ilgpu.
@m4rs-mt I did find the bottleneck in my code, but maybe it is still interesting for you.
If i use a new Array, the CopyTo (and also CopyFrom) method will take significant longer then expected (see the comment in the code).
Reusing the same buffer will cut the time from 11-15ms to 4-7ms.
using System;
using System.Diagnostics;
using System.Linq;
using ILGPU;
using ILGPU.Runtime;
namespace ilgpuTest
{
class Program
{
private static Action<Index, int, int, int, ArrayView<byte>, ArrayView<byte>> _kernel;
static void Main(string[] args)
{
var width = 3840;
var height = 2160;
var bytesPerPixel = 3;
var pixels = width * height;
var length = width * height * bytesPerPixel;
var buffer = CreateBuffer(width, height, bytesPerPixel);
var buffer2 = new byte[length];
using (var context = new Context())
{
var cudaId = Accelerator.Accelerators.First(a => a.AcceleratorType.Equals(AcceleratorType.Cuda));
using (var accelerator = Accelerator.Create(context, cudaId))
{
for (int x = 0; x < 10; x++)
{
_kernel = accelerator.LoadAutoGroupedStreamKernel<Index, int, int, int, ArrayView<byte>, ArrayView<byte>>(KernelMethod);
var inputBuffer = accelerator.Allocate<byte>(length);
var ouputBuffer = accelerator.Allocate<byte>(length);
var swFrom = Stopwatch.StartNew();
inputBuffer.CopyFrom(buffer, 0, 0, length);
swFrom.Stop();
Console.WriteLine($"{x} - CopyFrom: {swFrom.ElapsedMilliseconds}");
_kernel(pixels, width, height, bytesPerPixel, inputBuffer.View, ouputBuffer.View);
accelerator.Synchronize();
var result = new byte[length];
var swCopy = Stopwatch.StartNew();
ouputBuffer.CopyTo(result, 0, 0, length);
// ouputBuffer.CopyTo(buffer2, 0, 0, length); // Use this to increase the speed (only first run will be slow).
swCopy.Stop();
Console.WriteLine($"{x} - CopyTo: {swCopy.ElapsedMilliseconds}");
}
}
}
}
private static void KernelMethod(Index pixelIndex, int width, int height, int bpp, ArrayView<byte> source, ArrayView<byte> destination)
{
var offset = width * height;
var r = source[pixelIndex];
var g = source[pixelIndex + offset];
var b = source[pixelIndex + offset * 2];
destination[pixelIndex * bpp] = b;
destination[pixelIndex * bpp + 1] = g;
destination[pixelIndex * bpp + 2] = r;
}
private static byte[] CreateBuffer(int width, int height, int bytesPerPixel)
{
var length = width * height * bytesPerPixel;
var result = new byte[length];
byte c = 0;
for (var p = 0; p < length; p++)
{
result[p] = c++;
}
return result;
}
}
}
from ilgpu.
Related Issues (20)
- VelocityDevice and MaxGridSize HOT 3
- Sample of "AlgorithmsRadixSort" failed on OpenCL device HOT 4
- `NullReferenceException` when passing empty `ArrayView`s to OpenCL kernel HOT 1
- XMath.Pow() only work on CPU HOT 3
- Better error messages when kernel program failed to run. HOT 1
- Is it possible to use a stored dataset on GPU again and again with throwing extra data to GPU, and even change the value of the established dataset? HOT 1
- Add a CPU-GPU-Shared MemoryBuffer for systems that support it HOT 2
- Iteration of value with loops on GPU slows down significantly HOT 5
- Feature request: cudaStreamWaitEvent HOT 7
- Higher precision float (decimal) support? HOT 2
- Passing Int128 as kernel parameter is not working HOT 3
- System.BadImageFormatException in System.Reflection.Metadata.dll HOT 4
- OpenCL.CLException HOT 2
- [QUESTION]: Exception in Accelerator.Synchronize on CUDA HOT 2
- [BUG] Cuda 12 SDK not supported with ILGPU 1.5.X HOT 2
- [POTENTIAL BUG]: CopyToCpu is using refs in unsafe way but there is no indication of that. HOT 2
- [BUG]: Unit tests failing on GitHub runner with MacOS 14 HOT 2
- [QUESTION]: Help/Orientation/Documentation to use ILGPU.Algorithms HOT 3
- [BUG]: `IRContext.Import(Method)` does not correctly duplicate the full IR graph HOT 2
- [BUG]: NET v8.0.5 SDK has broken CI pipeline
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ilgpu.