Code Monkey home page Code Monkey logo

micro-opt's Introduction

Micro Optimization

Code for Sierra Schlott's Micro-Optimization of an atoi converter

Best run time overall for converter_original.c: 0.059506s

Best run time overall for converter.c: 0.003674s(My implemetation)

Best run time overall for old_converter.c: 0.007489s (Stack exchange fast_atoi)

Final: About 16.2x faster than the original implementation; 2.04x faster than fast_atoi.

Recorded on laptop's VM, 10000 iters, with minimal interruptions

Important files:

  • converter.c my implementation of converter
  • old_converter.c some person on stack exchange's fast_atoi
  • converter_original.c The orignal converter (only uses atoi library call)
  • test.c was only used as a way of working through bugs for my atoi implementation. Not used in compiling process.

To run:

gcc -O3 -std=c99 -march=native -o driver -Werror -Wall -Wextra -pedantic converter.c driver.c
./driver quotes.txt 10000

Optimization Process

To start, I wanted my code to implement atoi since making library calls can be expensive in terms of time. Admittedly, I did look up an implementation of atoi by some person on stack overflow, but realized it wouldn’t be in the spirit of the assignment and abandoned it. This is located in my old_converter.c. After this, I stopped looking at the implementation this person had used and worked through the process of finding what worked best for myself.

The main change I started with was trying my own atoi. I looked up the corresponding integer values on an ascii table and found ‘0’ started at int 48. So, for any index in a string of number chars, I knew I would only have to subtract 48 in order to get the actual value. I started off using this as a function that was written outside of convert_all, but eventually brought it into my main function.

Next, I took a closer look at the actual workload that was expected of the convert_all function by having a quick loop count the number of instances of 3-character, 4-character, and 5-character strings. I kept this tally in mind and used it to order my if/else statements so that 4-character strings, which occur the most by quite a bit, would immediately move on to the else statement. I followed similarly for the 3 and 5 character strings. This brought me down to about two times as fast as the fast_atoi that I’d tested on stack overflow.

Some things I tried were less useful.

  • I tried using bitwise shifts instead of multiplication by powers of 10 to assemble the final integer values, but found the gcc optimization was faster than what I could accomplish.
  • I tried declaring the I iterator outside of the for loop, but this ended up being slightly slower.
  • I couldn’t get branch prediction to operate faster than my initial arrangement of nested if/else statements
  • I found little change with using just if(word[i]) as opposed to if (word[i]!=’\0’)
  • I found little change in using the actual integer values as opposed to declaring const ints TENS, HUNDREDS, THOUSANDS,... for multiplying by powers of ten

The final change I found to make a difference was altering my multiplication process so the program had smaller amounts to multiply by in any given instance. For example, this is how I replaced the lines in my conversion for a three character integer:

nums[i]= (word[0]-48)*TTHOUS + (word[1]-48)*THOUSANDS + (word[2]-48)*HUNDREDS +(word[3]-48)*TENS + (word[4]-48);

I started using:

nums[i]= ((word[0]-48)*TENS + (word[1]-48))*TENS +(word[2]-48);

This cut off about .0006 seconds, taking me from 14.8x faster to 16.2x faster.

Some Cool Bug I encountered

This doesn't work on my desktop computer, where the clock would only measure the time in discrete amounts. It would measure only multiples of .015625 seconds, so when my optimization became faster than that, it reported taking only zero seconds. Note: This involved me changing driver.c to include a printf(%f,t) so I could see what times were being recorded. Nothing changed about actual functionality/API. You do not need my driver.c for this to work.

For the same files compiled with the same flags I saw the following

Time as measured on my desktop computer:

alt text

Time as measured on my laptop:

alt text

micro-opt's People

Contributors

sschlott avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.