Code Monkey home page Code Monkey logo

Comments (6)

Omega359 avatar Omega359 commented on July 17, 2024 1

The 25-30% improvement is in line with what I expected for the box removal - very nice work!

from arrow-datafusion.

kevinmingtarja avatar kevinmingtarja commented on July 17, 2024 1

I'll make a PR some time today or tmr!

from arrow-datafusion.

kevinmingtarja avatar kevinmingtarja commented on July 17, 2024

Hi, I was curious about this and decided to test it out myself.

I first generated a flamegraph using flamegraph-rs (CARGO_PROFILE_BENCH_DEBUG=true cargo flamegraph --root --bench substr_index -- --bench).

It seems like quite a lot of time is spent on malloc, free, and memmove, as you noted in #9864 (comment).

A snippet of flamegraph.svg:
Screen Shot 2024-04-05 at 03 11 53

I then tested out a quick change to avoid the Box::new() (at the expense of code duplication :D), and managed to get around -28% improvement!

% cargo bench --bench substr_index -- --baseline main
   Compiling datafusion-functions v37.0.0 (/Users/kevin/git/arrow-datafusion/datafusion/functions)
    Finished bench [optimized] target(s) in 56.31s
     Running benches/substr_index.rs (target/release/deps/substr_index-bcc13e06371405f5)
substr_index_array_array_1000
                        time:   [87.912 µs 88.197 µs 88.515 µs]
                        change: [-29.549% -28.257% -27.292%] (p = 0.00 < 0.05)
                        Performance has improved.

diff:

-                let splitted: Box<dyn Iterator<Item = _>> = if n > 0 {
-                    Box::new(string.split(delimiter))
-                } else {
-                    Box::new(string.rsplit(delimiter))
-                };
                 let occurrences = usize::try_from(n.unsigned_abs()).unwrap_or(usize::MAX);
-                // The length of the substring covered by substr_index.
-                let length = splitted
-                    .take(occurrences) // at least 1 element, since n != 0
-                    .map(|s| s.len() + delimiter.len())
-                    .sum::<usize>()
-                    - delimiter.len();
+                let length;
+                if n > 0 {
+                    let splitted = string.split(delimiter);
+                    length = splitted
+                        .take(occurrences)
+                        .map(|s| s.len() + delimiter.len())
+                        .sum::<usize>()
+                        - delimiter.len();
+                } else {
+                    let splitted = string.rsplit(delimiter);
+                    length = splitted
+                        .take(occurrences)
+                        .map(|s| s.len() + delimiter.len())
+                        .sum::<usize>()
+                        - delimiter.len();
+                }

from arrow-datafusion.

alamb avatar alamb commented on July 17, 2024

🚀

from arrow-datafusion.

alamb avatar alamb commented on July 17, 2024

FYI @Omega359

from arrow-datafusion.

kevinmingtarja avatar kevinmingtarja commented on July 17, 2024

take

from arrow-datafusion.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.