Code Monkey home page Code Monkey logo

cutt's People

Contributors

ap-hynninen avatar aphynninen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

cutt's Issues

Illegal instruction (core dumped)

Hello, I use your cutt to do transpose, but I have encountered a problem---'Illegal instruction (core dumped)'. My code is
`int main() {

// Four dimensional tensor
// Transpose (31, 549, 2, 3) -> (3, 31, 2, 549)
int dim[4] = {31, 549, 2, 3};
int permutation[4] = {3, 0, 2, 1};
int size = 1;
for (int i = 0; i < sizeof(permutation) / sizeof(permutation[0]); i++)
{
    size = dim[i]*size;
}
double *idata = new double[size]();
double *odata = new double[size];
// Option 1: Create plan on NULL stream and choose implementation based on heuristics
cuttHandle plan;
cuttCheck(cuttPlan(&plan, 4, dim, permutation, sizeof(double), 0));

// Option 2: Create plan on NULL stream and choose implementation based on performance measurements
// cuttCheck(cuttPlanMeasure(&plan, 4, dim, permutation, sizeof(double), 0, idata, odata));

// Execute plan
cuttCheck(cuttExecute(plan, idata, odata));
cout << odata << endl;

// Destroy plan
cuttCheck(cuttDestroy(plan));
delete[](idata);
delete[](odata);
return 0;

}`

Then , I use gdb and I find the problem happens on cuttCheck(cuttPlan(&plan, 4, dim, permutation, sizeof(double), 0));
I run the cutt_test and the same problem happens.
Thanks.

Get error when run `cutt_test` or `cutt_bench`

Error Infomation

run ./cutt_test and get:

cudaGetLastError() in file src/TensorTester.cu, function setTensorCheckPattern
Error String: no kernel image is available for execution on the device

run ./cutt_bench and get:

Using GeForce GTX 950M SM version 5.0
Clock 1.124Ghz numSM 5 ECC 0 mem BW 28.80GB/s shMemBankSize 4B
L2 2.00MB
CPU using vector type AVX2 of length 8
cudaMalloc(pp, sizeofT*len) in file src/CudaUtils.cu, function allocate_device_T
Error String: out of memory

Program Version

version: git commit 4c251c6

Environment

  • system: archlinux
  • gcc: gcc (GCC) 8.2.1 20181127
  • nvcc: Cuda compilation tools, release 10.0, V10.0.130
  • gpu: GeForce GTX 950M

More Infomation

make stdout: https://paste.ubuntu.com/p/xJPfMg7V3D/
make stderr: https://paste.ubuntu.com/p/5kJt82yQGJ/

Error on Summit

Hi,
many thanks for your library - it seems to be a really useful tool for GPU codes!
I am testing it on Summit and find the following error:

cudaFuncSetSharedMemConfig(transposePacked<float, 1>, cudaSharedMemBankSizeFourByte ) in file src/calls.h, function cuttKernelSetSharedMemConfig
Error String: invalid device function

Please let me know what is going on.

c++11

I believe we need the std=c++11 flag in the CUDA flags as well, otherwise it did not compile in my case since the code uses "nullptr" which is c++11.

CUDA inlining on Linux

On new Ubuntu 16.04 with recent CUDA there is a problem with a .cu file compilation due to not resolving the symbol in string.h. Adding a flag "-D_FORCE_INLINES" to the nvcc flags solves the problem.

Output is empty for some cases

Output is empty when one of dims is 1, such as
` int dim[4] = {W, H, C, N};
int permutation[4] = {3, 0, 1, 2};

cuttHandle handle;
cuttPlan(&handle, 4, dim, permutation, sizeof(float), streamId);
cuttExecute(handle, in, out);
cuttDestroy(handle);`

Output is empty when W==1.

I have been running the code with valgrind and found the following error

==21682== Conditional jump or move depends on uninitialised value(s)
==21682== at 0x41E27D: computePos0(int, int const*, int const*, int const*, int const*, int*, int*) (cuttGpuModel.cpp:249)
==21682== by 0x41E429: computePos0(int, TensorConvInOut const*, int, int*, int*) (cuttGpuModel.cpp:294)
==21682== by 0x40B9C1: cuttPlan_t::countCycles(cudaDeviceProp&, int) (cuttplan.cpp:1126)
==21682== by 0x409A30: cuttPlan(unsigned int*, int, int*, int*, unsigned long, CUstream_st*) (cutt.cpp:148)
==21682== by 0x4046D3: bool test_tensor(std::vector<int, std::allocator >&, std::vector<int, std::allocator >&) (cutt_test.cpp:465)
==21682== by 0x4031DE: test1() (cutt_test.cpp:151)
==21682== by 0x401D72: main (cutt_test.cpp:102)
==21682==
==21682== Conditional jump or move depends on uninitialised value(s)
==21682== at 0x41E2BD: computePos0(int, int const*, int const*, int const*, int const*, int*, int*) (cuttGpuModel.cpp:256)
==21682== by 0x41E429: computePos0(int, TensorConvInOut const*, int, int*, int*) (cuttGpuModel.cpp:294)
==21682== by 0x40B9C1: cuttPlan_t::countCycles(cudaDeviceProp&, int) (cuttplan.cpp:1126)
==21682== by 0x409A30: cuttPlan(unsigned int*, int, int*, int*, unsigned long, CUstream_st*) (cutt.cpp:148)
==21682== by 0x4046D3: bool test_tensor(std::vector<int, std::allocator >&, std::vector<int, std::allocator >&) (cutt_test.cpp:465)
==21682== by 0x4031DE: test1() (cutt_test.cpp:151)
==21682== by 0x401D72: main (cutt_test.cpp:102)
==21682==
==21682== Conditional jump or move depends on uninitialised value(s)
==21682== at 0x41E27D: computePos0(int, int const*, int const*, int const*, int const*, int*, int*) (cuttGpuModel.cpp:249)
==21682== by 0x41E429: computePos0(int, TensorConvInOut const*, int, int*, int*) (cuttGpuModel.cpp:294)
==21682== by 0x40BA5F: cuttPlan_t::countCycles(cudaDeviceProp&, int) (cuttplan.cpp:1154)
==21682== by 0x409A30: cuttPlan(unsigned int*, int, int*, int*, unsigned long, CUstream_st*) (cutt.cpp:148)
==21682== by 0x4046D3: bool test_tensor(std::vector<int, std::allocator >&, std::vector<int, std::allocator >&) (cutt_test.cpp:465)
==21682== by 0x4031DE: test1() (cutt_test.cpp:151)
==21682== by 0x401D72: main (cutt_test.cpp:102)
==21682==
==21682== Conditional jump or move depends on uninitialised value(s)
==21682== at 0x41E2BD: computePos0(int, int const*, int const*, int const*, int const*, int*, int*) (cuttGpuModel.cpp:256)
==21682== by 0x41E429: computePos0(int, TensorConvInOut const*, int, int*, int*) (cuttGpuModel.cpp:294)
==21682== by 0x40BA5F: cuttPlan_t::countCycles(cudaDeviceProp&, int) (cuttplan.cpp:1154)
==21682== by 0x409A30: cuttPlan(unsigned int*, int, int*, int*, unsigned long, CUstream_st*) (cutt.cpp:148)
==21682== by 0x4046D3: bool test_tensor(std::vector<int, std::allocator >&, std::vector<int, std::allocator >&) (cutt_test.cpp:465)
==21682== by 0x4031DE: test1() (cutt_test.cpp:151)
==21682== by 0x401D72: main (cutt_test.cpp:102)
==21682==
==21682== Conditional jump or move depends on uninitialised value(s)
==21682== at 0x41FCF7: countPackedShTransactions0(int, int, int, int, TensorConv const*, int, int&, int&, int&, int&) (cuttGpuModel.cpp:513)
==21682== by 0x40C29F: cuttPlan_t::countCycles(cudaDeviceProp&, int) (cuttplan.cpp:1352)
==21682== by 0x409A30: cuttPlan(unsigned int*, int, int*, int*, unsigned long, CUstream_st*) (cutt.cpp:148)
==21682== by 0x4046D3: bool test_tensor(std::vector<int, std::allocator >&, std::vector<int, std::allocator >&) (cutt_test.cpp:465)
==21682== by 0x4031DE: test1() (cutt_test.cpp:151)
==21682== by 0x401D72: main (cutt_test.cpp:102)
==21682==
==21682== Conditional jump or move depends on uninitialised value(s)
==21682== at 0x41FCF7: countPackedShTransactions0(int, int, int, int, TensorConv const*, int, int&, int&, int&, int&) (cuttGpuModel.cpp:513)
==21682== by 0x40C346: cuttPlan_t::countCycles(cudaDeviceProp&, int) (cuttplan.cpp:1384)
==21682== by 0x409A30: cuttPlan(unsigned int*, int, int*, int*, unsigned long, CUstream_st*) (cutt.cpp:148)
==21682== by 0x4046D3: bool test_tensor(std::vector<int, std::allocator >&, std::vector<int, std::allocator >&) (cutt_test.cpp:465)
==21682== by 0x4031DE: test1() (cutt_test.cpp:151)
==21682== by 0x401D72: main (cutt_test.cpp:102)

stack smashing

Hi, I am running into this problem when constructing cuttPlan:

cuttPlan(&m_rot_plan[0], 3, dim_0, permu_0, sizeof(int), nullptr);

and I also used vigrind to test it. The relevant msg is:
`==8915== Process terminating with default action of signal 6 (SIGABRT)
==8915== at 0x69D9FB7: raise (raise.c:51)
==8915== by 0x69DB920: abort (abort.c:79)
==8915== by 0x6A24966: __libc_message (libc_fatal.c:181)
==8915== by 0x6ACFB60: __fortify_fail_abort (fortify_fail.c:33)
==8915== by 0x6ACFB21: __stack_chk_fail (stack_chk_fail.c:29)
==8915== by 0x21B386: cuttPlan(unsigned int*, int, int*, int*, unsigned long, CUstream_st*) (in /home/joseph/yzchen_ws/UAV/cpc_ws/devel/lib/cpc_aux_mapping/cpc_aux_mapping_node)
==8915== by 0x18C140: AuxMapper::AuxMapper() (aux_mapper.cpp:67)
==8915== by 0x184201: main (aux_mapping_node.cpp:7)
==8915==
==8915== HEAP SUMMARY:
==8915== in use at exit: 15,351,186 bytes in 16,317 blocks
==8915== total heap usage: 23,410 allocs, 7,093 frees, 62,269,008 bytes allocated

==8915== 104 bytes in 1 blocks are possibly lost in loss record 1,715 of 3,065
==8915== at 0x4C33B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8915== by 0xE4B53C2: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.460.73.01)
==8915== by 0xE4B5B90: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.460.73.01)
==8915== by 0xE4B6690: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.460.73.01)
==8915== by 0xE3520E4: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.460.73.01)
==8915== by 0xE40C1B6: cuMemAlloc_v2 (in /usr/lib/x86_64-linux-gnu/libcuda.so.460.73.01)
==8915== by 0x1E9CAD: __cudart602 (in /home/joseph/yzchen_ws/UAV/cpc_ws/devel/lib/cpc_aux_mapping/cpc_aux_mapping_node)
==8915== by 0x1BFFAA: __cudart607 (in /home/joseph/yzchen_ws/UAV/cpc_ws/devel/lib/cpc_aux_mapping/cpc_aux_mapping_node)
==8915== by 0x1F577A: cudaMalloc (in /home/joseph/yzchen_ws/UAV/cpc_ws/devel/lib/cpc_aux_mapping/cpc_aux_mapping_node)
==8915== by 0x23BA69: allocate_device_T(void**, unsigned long, unsigned long) (in /home/joseph/yzchen_ws/UAV/cpc_ws/devel/lib/cpc_aux_mapping/cpc_aux_mapping_node)
==8915== by 0x21C430: cuttPlan_t::activate() (in /home/joseph/yzchen_ws/UAV/cpc_ws/devel/lib/cpc_aux_mapping/cpc_aux_mapping_node)
==8915== by 0x21B204: cuttPlan(unsigned int*, int, int*, int*, unsigned long, CUstream_st*) (in /home/joseph/yzchen_ws/UAV/cpc_ws/devel/lib/cpc_aux_mapping/cpc_aux_mapping_node)
==8915== by 0x18C140: AuxMapper::AuxMapper() (aux_mapper.cpp:67)
==8915== by 0x184201: main (aux_mapping_node.cpp:7)

`
Anyone has similar problems?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.