Comments (4)
Can you give the timings for: 1e8 1e1 0 0
(no NAs and no sorting)?
from mrpowers-benchmarks.
@ghuls - Rscript groupby-datagen.R 1e8 1e1 0 0
ran in 2 minutes. Let me know if you need anything else!
from mrpowers-benchmarks.
@ghuls - any chance you can send me your sed script to create these data files so I can try it out? I've never used sed before and I'm interested in learning more. Thanks!
from mrpowers-benchmarks.
It is not a sed script, but an awk script.
I didn't have time to add support for NAs yet. Once it is there I can make a pull request.
groupby-datagen () {
local N="${1:-1e7}";
local K="${2:-1e2-0}";
local NAs="${3:-0}";
local sort="${4:-0}";
frawk \
-B cranelift \
-v "N=${N}" \
-v "K=${K}" \
-v "NAs=${NAs}" \
-v "sort=${sort}" \
'
function rand_int(x) {
return 1 + int(rand() * x);
}
BEGIN {
# Convert input variables to numbers (needed in case they are in scientific notation).
N = int(N + 0);
K = int(K + 0);
NAs = int(NAs + 0);
# Set fixed seed for random number generator.
srand(123);
# Print header.
print "id1,id2,id3,id4,id5,id6,v1,v2,v3";
if (sort != 1) {
for (i=0; i<N; i++) {
printf("id%03d,id%03d,id%010d,%d,%d,%d,%d,%d,%.06f\n", rand_int(K), rand_int(K), rand_int(N/K), rand_int(K), rand_int(K), rand_int(N/K), rand_int(5), rand_int(15), rand() * 100);
}
} else {
for (i=0; i<N; i++) {
printf("id%03d,id%03d,id%010d,%d,%d,%d,%d,%d,%.06f\n", rand_int(K), rand_int(K), rand_int(N/K), rand_int(K), rand_int(K), rand_int(N/K), rand_int(5), rand_int(15), rand() * 100) | "LC_COLLATE=C sort --parallel=2 -k 1,1 -k2,2 -k3,3 -k4,4n -k 5,5n -k 6,6n";
}
}
}
'
}
from mrpowers-benchmarks.
Related Issues (4)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mrpowers-benchmarks.