Code Monkey home page Code Monkey logo

Comments (10)

BjornTheProgrammer avatar BjornTheProgrammer commented on September 24, 2024

@core-c, does the compiler not already optimize this loop? If you have a way to verify that there is a statistically significant speed up, I could make a pull request.

from arduinofft.

core-c avatar core-c commented on September 24, 2024

I can check that out, and get back on it.

There are more things that can be optimized though: Like assigning a value inside a loop, from a line of code that always produces the same result.. after some minor calculation.
Such things seem fine enough on a fast processor, maybe, but not so on an AVR, i think. Those MCUs run at low clockspeeds, where every tick counts.

from arduinofft.

BjornTheProgrammer avatar BjornTheProgrammer commented on September 24, 2024

So I compiled the code changing this block of code

for (uint16_t i = 0; i < this->_samples; i++) {
    this->_vReal[i] /= this->_samples;
    this->_vImag[i] /= this->_samples;
}

to

for (uint16_t i = 0; i < this->_samples; i++) {
    double reciprocal = 1 / this->_samples;
    this->_vReal[i] *= reciprocal;
    this->_vImag[i] *= reciprocal;
}

The compiled code for each did change. I ran it through a disassembler, but the resulting output was completely different. It turns out that Arduino compiles with the gcc -Os flag enabled. This means that Arduino optimizes for space instead of speed. The compiler still might also be optimizing this loop, but there is no way to know, so there might actually be a speed benefit for doing this. It warrants additional testing/research.

from arduinofft.

core-c avatar core-c commented on September 24, 2024

-Os optimizes for size.
There are other options you can try, like the most common: -O1, -O2, -O3 (increasingly more speed optimization).

In any case, you should calculate that reciprocal ONCE outside the for-loop.
Now it calculates reciprocal this->_samples times, while it keeps resulting the same value (_samples value does not change).

I will see if i can make things faster, in C.
But i am really thinking of writing it in AVR assembler (because i need it for my own projects). But for that, first the algorithm/calculation must be efficient.

For example, i see that function Exponent(). It is not used very often, but still it can be changed to make it run faster:

uint8_t arduinoFFT::Exponent(uint16_t value) {
  uint8_t result = 0;
  while (((value >> result) & 1) != 1)
    result++;
  return (result);
}

The AVR has bit-shift instructions that shift only 1 bit (to the left or right).
Say, you enter a value where only bit 10 is set. That while-loop starts manipulating that value, shifting result times in each loop-pass. The function does not store the intermediate value, but performs a new bit-shift on every loop-pass. Eventually, once it gets to the 10th bit, it did perform these shifts:

value >> 0
value >> 1
value >> 2
...all the way to...
value >> 10

If you want to shift a value 10 bits, the AVR will perform 10 shift instructions (because it can only shift a byte one bit at a time). !!note: the AVR can shift bytes. To shift a 16b word, it must shift 2 bytes!!
The way Exponent() is doing it, there will be even more shift instructions being executed (1+2+3+4+5+6+7+8+9+10 in total in the while-loop). That is too much and the function can easily be improved.
I guess this code will run faster:

uint8_t arduinoFFT::Exponent(uint16_t value) {
  uint16_t v = value;
  uint8_t result = 1;
  while ((v & 1) == 0) {
    result++;
    v >>= 1;
  }
  return result;
}

!!note: Compilers are smart nowadays. With proper settings, it will not just shift 10 times, but it does shift 2 bits on the MSB of a word instead.. but you get the idea of my example.

But yeah, this is just an example of how an algorithm can be optimized.
Perhaps in C-code that little function seems longer (a few lines of code), but it does run faster.

I was also thinking of using fixed-point numbers. That would really improve performance.
The AVR has no floating-point instructions. If you use fixed-point numbers it'll make it a lot easier for the AVR to calculate with such numbers.

from arduinofft.

FintasticMan avatar FintasticMan commented on September 24, 2024

The develop branch has an FFT_SPEED_OVER_PRECISION macro that you can define that does what you're proposing.

from arduinofft.

core-c avatar core-c commented on September 24, 2024

o cool. To be honest, i had only checked out this branch (a bit). I will have a look. Thanks..

While i was tracking down the origin of this "abandoned" project, i did find some code that was already highly optimized (written in assembler). I think it is from the creator..
I have no need, for myself, to optimize any C-code anymore. Someone already did it..
There is a github source that shows several FFT implementations for Arduino.
I also found more code that inspired the original maker of this project. There you find an FHT implementation (input & output are real values.. easy to use for Arduino users, and sufficient for many/most Arduino projects).
I am unsure to post the URLs here, because i do not want to discredit this project. email me, if you want to know

Something i want to mention:
Performance-wise: Have you ever tried to use only (signed) integer values, instead of doubles/floats?
For an AVR-Arduino i think this library is used most often in combination with sampled real values from the ADC (10-bit values most probably), all integers.

from arduinofft.

kosme avatar kosme commented on September 24, 2024

@core-c I am intrigued about checking out the URLs you mention

from arduinofft.

core-c avatar core-c commented on September 24, 2024

http://wiki.openmusiclabs.com/wiki/ArduinoFHT
https://github.com/imagest108/arduino/blob/master/libraries/PlainFFT/PlainFFT.cpp
https://github.com/imagest108/arduino/tree/master/libraries
https://github.com/Evg33/ArduinoFHT
https://github.com/kosme/arduinoFFT

..

from arduinofft.

core-c avatar core-c commented on September 24, 2024

sorry for a very late reply.. i was busy with RL. kosme

from arduinofft.

kosme avatar kosme commented on September 24, 2024

Both mentioned optimizations are implemented on version 1.6.1

from arduinofft.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.