In the sourcecode i see divisions being done inside a loop. Here is one example:<b

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

So I compiled the code changing this block of code <div class="highlight highlight

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Performance++ about arduinofft HOT 10 CLOSED

core-c commented on September 24, 2024

Performance++

from arduinofft.

Comments (10)

BjornTheProgrammer commented on September 24, 2024

@core-c, does the compiler not already optimize this loop? If you have a way to verify that there is a statistically significant speed up, I could make a pull request.

from arduinofft.

core-c commented on September 24, 2024

I can check that out, and get back on it.

There are more things that can be optimized though: Like assigning a value inside a loop, from a line of code that always produces the same result.. after some minor calculation.
Such things seem fine enough on a fast processor, maybe, but not so on an AVR, i think. Those MCUs run at low clockspeeds, where every tick counts.

from arduinofft.

BjornTheProgrammer commented on September 24, 2024

So I compiled the code changing this block of code

for (uint16_t i = 0; i < this->_samples; i++) {
    this->_vReal[i] /= this->_samples;
    this->_vImag[i] /= this->_samples;
}

for (uint16_t i = 0; i < this->_samples; i++) {
    double reciprocal = 1 / this->_samples;
    this->_vReal[i] *= reciprocal;
    this->_vImag[i] *= reciprocal;
}

The compiled code for each did change. I ran it through a disassembler, but the resulting output was completely different. It turns out that Arduino compiles with the gcc -Os flag enabled. This means that Arduino optimizes for space instead of speed. The compiler still might also be optimizing this loop, but there is no way to know, so there might actually be a speed benefit for doing this. It warrants additional testing/research.

from arduinofft.

core-c commented on September 24, 2024

-Os optimizes for size.
There are other options you can try, like the most common: -O1, -O2, -O3 (increasingly more speed optimization).

In any case, you should calculate that reciprocal ONCE outside the for-loop.
Now it calculates reciprocal this->_samples times, while it keeps resulting the same value (_samples value does not change).

I will see if i can make things faster, in C.
But i am really thinking of writing it in AVR assembler (because i need it for my own projects). But for that, first the algorithm/calculation must be efficient.

For example, i see that function Exponent(). It is not used very often, but still it can be changed to make it run faster:

uint8_t arduinoFFT::Exponent(uint16_t value) {
  uint8_t result = 0;
  while (((value >> result) & 1) != 1)
    result++;
  return (result);
}

The AVR has bit-shift instructions that shift only 1 bit (to the left or right).
Say, you enter a value where only bit 10 is set. That while-loop starts manipulating that value, shifting result times in each loop-pass. The function does not store the intermediate value, but performs a new bit-shift on every loop-pass. Eventually, once it gets to the 10th bit, it did perform these shifts:

value >> 0
value >> 1
value >> 2
...all the way to...
value >> 10

If you want to shift a value 10 bits, the AVR will perform 10 shift instructions (because it can only shift a byte one bit at a time). !!note: the AVR can shift bytes. To shift a 16b word, it must shift 2 bytes!!
The way Exponent() is doing it, there will be even more shift instructions being executed (1+2+3+4+5+6+7+8+9+10 in total in the while-loop). That is too much and the function can easily be improved.
I guess this code will run faster:

uint8_t arduinoFFT::Exponent(uint16_t value) {
  uint16_t v = value;
  uint8_t result = 1;
  while ((v & 1) == 0) {
    result++;
    v >>= 1;
  }
  return result;
}

!!note: Compilers are smart nowadays. With proper settings, it will not just shift 10 times, but it does shift 2 bits on the MSB of a word instead.. but you get the idea of my example.

But yeah, this is just an example of how an algorithm can be optimized.
Perhaps in C-code that little function seems longer (a few lines of code), but it does run faster.

I was also thinking of using fixed-point numbers. That would really improve performance.
The AVR has no floating-point instructions. If you use fixed-point numbers it'll make it a lot easier for the AVR to calculate with such numbers.

from arduinofft.

FintasticMan commented on September 24, 2024

The develop branch has an FFT_SPEED_OVER_PRECISION macro that you can define that does what you're proposing.

from arduinofft.

core-c commented on September 24, 2024

o cool. To be honest, i had only checked out this branch (a bit). I will have a look. Thanks..

While i was tracking down the origin of this "abandoned" project, i did find some code that was already highly optimized (written in assembler). I think it is from the creator..
I have no need, for myself, to optimize any C-code anymore. Someone already did it..
There is a github source that shows several FFT implementations for Arduino.
I also found more code that inspired the original maker of this project. There you find an FHT implementation (input & output are real values.. easy to use for Arduino users, and sufficient for many/most Arduino projects).
I am unsure to post the URLs here, because i do not want to discredit this project. email me, if you want to know

Something i want to mention:
Performance-wise: Have you ever tried to use only (signed) integer values, instead of doubles/floats?
For an AVR-Arduino i think this library is used most often in combination with sampled real values from the ADC (10-bit values most probably), all integers.

from arduinofft.

kosme commented on September 24, 2024

@core-c I am intrigued about checking out the URLs you mention

from arduinofft.

core-c commented on September 24, 2024

http://wiki.openmusiclabs.com/wiki/ArduinoFHT
https://github.com/imagest108/arduino/blob/master/libraries/PlainFFT/PlainFFT.cpp
https://github.com/imagest108/arduino/tree/master/libraries
https://github.com/Evg33/ArduinoFHT
https://github.com/kosme/arduinoFFT

from arduinofft.

core-c commented on September 24, 2024

sorry for a very late reply.. i was busy with RL. kosme

from arduinofft.

kosme commented on September 24, 2024

Both mentioned optimizations are implemented on version 1.6.1

from arduinofft.

Performance++ about arduinofft HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent