stonexjr / back40computing Goto Github PK
View Code? Open in Web Editor NEWAutomatically exported from code.google.com/p/back40computing
License: BSD 3-Clause "New" or "Revised" License
Automatically exported from code.google.com/p/back40computing
License: BSD 3-Clause "New" or "Revised" License
i run it on GTX560 on Ubuntu10.10
when i try to make,it gives errors,such as
/usr/local/cuda/include/thrust/detail/device/cuda/detail/b40c/radixsort_api.h(48
1): error: kernel launches from templates are not allowed in system files
/usr/local/cuda/include/thrust/detail/device/cuda/detail/b40c/radixsort_api.h(48
8): error: kernel launches from templates are not allowed in system files
/usr/local/cuda/include/thrust/detail/device/cuda/detail/b40c/radixsort_api.h(50
4): error: kernel launches from templates are not allowed in system files
/usr/local/cuda/include/thrust/detail/device/cuda/detail/b40c/radixsort_api.h(51
7): error: kernel launches from templates are not allowed in system files
/usr/local/cuda/include/thrust/detail/device/cuda/detail/b40c/radixsort_api.h(52
1): error: kernel launches from templates are not allowed in system files
i find the first error before occured in:
if ((_device_sm_version == 130) && (_work_decomposition.num_elements > static_cast<unsigned int>(_device_props.multiProcessorCount * _cycle_elements * 2))) {
FlushKernel<void><<<_grid_size, B40C_RADIXSORT_THREADS, scan_scatter_attrs.sharedSizeBytes>>>();
synchronize_if_enabled("FlushKernel");
}
can you give me some help about it?
thx
Original issue reported on code.google.com by [email protected]
on 27 Nov 2011 at 5:33
Testcases fails with flowing errors:
Using device 0: GeForce 8800 GTS 512
Simple key-value sort: INCORRECT: [0]: 102400 != 4128
Small-problem key-value sort: INCORRECT: [0]: 102400 != 4128
Small-problem restricted-range key-value sort: INCORRECT: [0]: 47840 != 4128
Original issue reported on code.google.com by [email protected]
on 18 Aug 2011 at 6:05
Very nice performance with your sorting function! A 2x boost on all sorting is
always welcome. Just looked at the updates and see that you guys have been
very busy, so I guess this is already on the radar. I wanted to use it in
multiple files that are compiled separately, but received linking errors.
What steps will reproduce the problem?
1. Include the radixsort_api.cu in different files and try to combine them at
linking time.
nvcc -c file1.cu
nvcc -c file2.cu
nvcc -c main.cu
nvcc -o main main.o file1.o file2.o
This gives linking errors, when some variables are defined multiple times.
I attached the files, compiler_error.txt with the error I received, and the
source files.
Thanks,
Scott
Original issue reported on code.google.com by [email protected]
on 11 Aug 2010 at 11:58
Attachments:
What steps will reproduce the problem?
1. If N is a const unsigned int, this will not work
sort_enactor.Sort(sort_storage, N);
2. This will work
sort_enactor.Sort(sort_storage, (int) N);
3.
What is the expected output? What do you see instead?
I get unspecified launch failures in the first case
What version of the product are you using? On what operating system?
r603 Ubuntu Linux 11.04
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 10 Jul 2011 at 9:54
What steps will reproduce the problem?
1.Do make in test/bfs folder
2.Run test_bfs_5.0_i386 random 32 128 --v --undirected
What is the expected output? What do you see instead?
Expected output should be BFS traversal of above graph but it is displaying
some partial output as follows:
Using device 0: Tesla C2070
Selecting 128 undirected random edges in COO format... Done selecting (0s).
Converting 32 vertices, 256 edges (unordered rows) to CSR format... Done converting (0s).
Degree Histogram (32 vertices, 256 directed edges):
Degree 2^-1: 0 (0.00%)
Degree 2^0: 0 (0.00%)
Degree 2^1: 2 (6.25%)
Degree 2^2: 13 (40.62%)
Degree 2^3: 17 (53.12%)
Running non-instrumented distance-marking copied-to-device tests...
---------------------------------------------------------------
Work Histogram:
Depth, Expanded, Unique-Expanded, Discovered
0, 1, 1, 1
1, 7, 5, 5
2, 42, 22, 19
3, 158, 30, 7
Warmup iteration: 0.000 ms
GPU 0 source path: 32 elements (128 bytes)
GPU 0 collision mask: 4 elements (4 bytes)
GPU 0 queue sizes: compact 332 elements (1328 bytes), expand 332 elements (1328
bytes)
BFS min occupancy 8, level-grid size 112
Warmup iteration: 1.532 ms
BFS min occupancy 8, level-grid size 112
Warmup iteration: 0.579 ms
BFS expand min occupancy 8, level-grid size 112
BFS compact min occupancy 8, level-grid size 112
Warmup iteration: 0.986 ms
BFS one_phase min occupancy 8, level-grid size 112
BFS expand min occupancy 8, level-grid size 112
BFS compact min occupancy 8, level-grid size 112
and after this no output for long time. Graph is small but still why it is taking so much time? Am I missing something?
I'm using linux and recent version of product.
Original issue reported on code.google.com by [email protected]
on 25 Aug 2012 at 12:05
What steps will reproduce the problem?
1. go to ../test/bfs
2. make
What is the expected output? What do you see instead?
../../b40c/partition/upsweep/kernel_policy.cuh(110): error: identifier
"CUDA_ARCH" is undefined
../../b40c/partition/downsweep/kernel_policy.cuh(122): error: identifier
"CUDA_ARCH" is undefined
What version of the product are you using? On what operating system?
Version v1.0.655 (SVN r655)
NVCC release 4.0, V0.2.1221
Please provide any additional information below.
ubuntu 11.04
GTS 450
Original issue reported on code.google.com by [email protected]
on 28 Nov 2011 at 3:16
I am getting the following error on running make inside test/radix_sort:
/usr/include/c++/4.8/cstdlib:179:8: error: ‘__int128_t’ does not name a type
abs(__int128 __x) { return __x >= 0 ? __x : -__x; }
^
make: *** [bin/simple_sort_7.5_i386] Error 1
I am on ubuntu 14.04 LTS. I have cuda-7.5, gcc-4.8, g++-4.8 and GeForce TitanX gpu.
What steps will reproduce the problem?
1. Perform a sort of sufficient size that LARGE_SORT is used. Keys-only or
keys with values will reproduce this issue.
2. Have the keys be doubles, but all of them be integers.
3. A basic example I use is the natural numbers decreasing from 1000000 to 1.
What is the expected output? What do you see instead?
I expect to see the numbers 1 through to 1000000 in increasing order. I
instead see negative numbers for the sorted keys.
What version of the product are you using? On what operating system?
Version v1.0.655 (SVN r655). Windows 8 64-bit, targeting 64-bit,
compute_13,sm_13. Running on NVidia GEForce GTX 580.
Please provide any additional information below.
I have an (inelegant) workaround of multiplying all numbers by some number
(7.76345621464357) before sorting, and dividing again afterwards, for problem
sizes over 100,000. This is of course not ideal - speed, roundoff, etc :).
Original issue reported on code.google.com by [email protected]
on 18 Feb 2013 at 9:00
What steps will reproduce the problem?
1. Given a key array of floating points (float or double) and a value array of
integer. For example:
//key[i] = (double)N / (double)i;
double key[5] = [0.0, 5.0, 2.5, 1.66667, 1.25 ] ;
int value[5] = [0, 1, 2, 3, 4] ;
2. Use the radix_sort from Branch FastSortSm20. The key value between 1.xx will
be greater than any other elements.
The output is
key[] = [0, 2.5, 5, 1.25, 1.66667 ]
value[] = [0, 2, 1, 4, 3, ]
3. Use the radix_sort in trunk gives the correct result.
key[] = [0, 1.25, 1.66667, 2.5, 5 ]
value[] = [0, 4, 3, 2, 1 ]
What version of the product are you using? On what operating system?
Ubuntu 10.10 x64
CUDA 4.0 (GPU GTX 470)
r893, branches/FastSortSm20
r893, trunk
Please provide any additional information below.
I want to sort the elements in each row vector of a floating poinrt 2D matrix.
The Entactor::SmallSort() interface in the branch FastSortSm20 seems a good
fit. Even better, it allows specifying cudaStream in the interface. I want to
split the row vectors by several streams to utilize the concurrent kernel
execution. Note that the Compute Capability 2.0 hardware supports 16 concurrent
kernels.
Original issue reported on code.google.com by [email protected]
on 3 Jul 2012 at 10:26
What steps will reproduce the problem?
code snippet:
b40c::radix_sort::Enactor sort_enactor;
b40c::util::PingPongStorage<unsigned int, Scalar4> sort_storage(d_keys,d_values);
sort_enactor.Sort(sort_storage, N);
What is the expected output? What do you see instead?
I get lots of compiler warnings
.../b40c/radix_sort/enactor.cuh:529:40: warning: suggest parentheses around
assignment used as truth value
also for similar lines in enactor_base.h etc.
What version of the product are you using? On what operating system?
r603
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 9 Jul 2011 at 3:21
What steps will reproduce the problem?
1. "make cull", and you will get many compiler errors, but they can be solved
easily. (most of them seem to be caused by the parameters of function calls to
the underlying function don't match with their declaration)
2. after solve the compile errors,
run "microbench_bfs_5.0_x86_64 grid2d 5000 --src=randomize --i=50 --quick
--device=1 --queue-sizing=0.5"
3. you will get an error "illegal addr", because the space of d_filter_mask is
not allocated.
Original issue reported on code.google.com by [email protected]
on 28 Mar 2013 at 4:29
What steps will reproduce the problem?
1. sorting the particular input sequence of unsigned long long values in file
input_data gives incorrect output (It works on most other input however, I
include one example working_data)
2. Compiling and running the included sort_by_key.cu should reproduce
What is the expected output? What do you see instead?
I sorted them using thrust v1.2.1 as well, output is in the files
Expected:
thrust_data
thrust_indices
Received:
b40c_data
b40c_indices
What version of the product are you using? On what operating system?
Using rv208 of b40c and thrust v 1.2.1 compiled on a 64 bit linux machine, with
a C2050 GPU and using nvcc 3.1.
I tried
a) nvcc -O2 -arch=sm_20 -o sort-test sort_by_key.cu
b) nvcc -o sort-test sort_by_key.cu
Original issue reported on code.google.com by [email protected]
on 15 Aug 2010 at 5:55
Attachments:
The following code that uses v270 (compiled under Windows 7) works fine with
SM20, but not with SM12:
http://encode.ru/attachment.php?attachmentid=1488&d=1297084367
Replacing
EarlyExitRadixSortingEnactor<K, V> sorting_enactor;
with
SingleGridRadixSortingEnactor<K, V> sorting_enactor;
solves the problem
[email protected]
Original issue reported on code.google.com by [email protected]
on 9 Feb 2011 at 11:54
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.