Code Monkey home page Code Monkey logo

algorithmica's People

Contributors

alexandernenninger avatar arnu152 avatar craftmaster1231 avatar dingelsz avatar dpaleka avatar gdv avatar hhy3 avatar lionell avatar lunastorm avatar mcognetta avatar mode-six avatar modjular avatar molney239 avatar nayuki avatar platelett avatar qrbaker avatar rgriege avatar romanpovol avatar sangwoo-joh avatar sayyoungman avatar seiko-iwasawa avatar sslotin avatar supercilex avatar tyoming avatar vergeev avatar weethet avatar wladslaw avatar yatancuyu avatar yistarostin avatar yuki0iq avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

algorithmica's Issues

Code snippet measures integer operations, text mentions floating point operations.

- When the whole array fits into the lowest layer of cache, the program is bottlenecked by the CPU rather than the L1 cache bandwidth. As the array becomes larger, the overhead associated with the first iterations of the loop becomes smaller, and the performance gets closer to its theoretical maximum of 16 GFLOPS.

Not sure if the meaning of FLOPS has been extended in practice, or integer op throughput is tightly coupled to flops.

Confusion with 'highest' and 'lowest' in `hierarchy.md`.

In this section, "higher" seem to mean "closest from CPU". As suggested by the pyramid picture and sentences like:

Modern CPUs have multiple layers of cache (L1, L2, often L3, and rarely even L4). The lowest layer is shared between cores and is usually scaled with their number (e.g., a 10-core CPU should have around 10M of L3 cache).

However, this logic seems to be reverted in the following?

When accessed, the contents of a cache line are emplaced onto the lowest cache layer and then gradually evicted to higher levels unless accessed again in time.

Example does not compile

https://github.com/algorithmica-org/algorithmica/blob/master/content/english/hpc/analyzing-performance/_index.md#compiled-languages

I cannot compile following example. If some complier flags are required, I think we should clarify that

gcc bench.c

bench.c:6:8: error: variably modified ‘a’ at file scope
    6 | double a[n][n], b[n][n], c[n][n];
      |        ^
bench.c:6:8: error: variably modified ‘a’ at file scope
bench.c:6:17: error: variably modified ‘b’ at file scope
    6 | double a[n][n], b[n][n], c[n][n];
      |                 ^
bench.c:6:17: error: variably modified ‘b’ at file scope
bench.c:6:26: error: variably modified ‘c’ at file scope
    6 | double a[n][n], b[n][n], c[n][n];
      |                          ^
bench.c:6:26: error: variably modified ‘c’ at file scope
bench.c: In function ‘main’:
bench.c:23:21: error: expected expression before ‘float’
   23 |     float seconds = float(clock() - start) / CLOCKS_PER_SEC;

gcc --version
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0

Intervall Inclusion Statement untrue

This way we can represent numbers in the form $\pm \\; m \times 2^e$ where both $m$ and $e$ are bounded *and possibly negative* integers — which would correspond to negative or small numbers respectively. The distribution of these numbers is very much non-uniform: there are as many numbers in the $[0, 1]$ range as in the $[0, +\infty)$ range.

Since [0,1] is included in [0,infinity), even for floats, this statement cannot be true.

Misprint in SVGs

Hi,

I'm pretty sure I've found 3 misprints in the English SVGs on the AoS and SoA page.

https://github.com/algorithmica-org/algorithmica/blob/master/content/english/hpc/cpu-cache/img/aos-soa-padded-n.svg
array os structures -> array of structures

https://github.com/algorithmica-org/algorithmica/blob/master/content/english/hpc/cpu-cache/img/aos-soa-padded.svg
structure or arrays -> structure of arrays

https://github.com/algorithmica-org/algorithmica/blob/master/content/english/hpc/cpu-cache/img/aos-soa.svg
structure or arrays -> structure of arrays

I would fix them myself but when I opened the SVG files, I found out that all the text had been converted to paths so hopefully you guys can regenerate correct versions with matplotlib.
Also, I just wanted to say thanks for creating this free resource because I've learnt a lot from it. Wish you guys the best :)

The description of the sorting algorithm by inserts does not correspond to the algorithm itself

The description of the insert sorting contains:
Когда это произойдет, это будет означать, что он будет больше всех элементов слева и меньше всех элементов префикса справа,
but the fragment of the algorithm contains the condition:
a[i - 1] < a[i], which is incorrect in terms of definition and sorts arrays in descending order.

Suggestion for more readable text.

I propose to change the font-family of the text to a more readable one on Windows systems. Or leave the old font, but as a fallback to a more readable font - for example Verdana (font-family: Verdana, crimson, serif;).
The existing solution in the form of "crimson" has some artifacts along the top line of the text.

image

It would also be great to slightly increase the font size of the text (font-size: 20px;)

image

Small bug in OOP segment tree

In https://en.algorithmica.org/hpc/data-structures/segment-trees/:

int sum(int lq, int rq) {
    if (rb <= lq && rb <= rq) // if we're fully inside the query, return the sum
        return s;
    if (rq <= lb || lq >= rb) // if we don't intersect with the query, return zero
        return 0;
    return l->sum(k) + r->sum(k);
}

should be (last line changed):

int sum(int lq, int rq) {
    if (rb <= lq && rb <= rq) // if we're fully inside the query, return the sum
        return s;
    if (rq <= lb || lq >= rb) // if we don't intersect with the query, return zero
        return 0;
    return l->sum(lq, rq) + r->sum(lq, rq);
}

Statement not strictly true

Note is that while most operations with real numbers are commutative and associative, their rounding errors are not: even the result of $(x+y+z)$ depends on the order of summation. Compilers are not allowed to produce non-spec-compliant results, so this disables some potential optimizations that involve rearranging operands. You can disable this strict compliance with the `-ffast-math` flag in GCC and Clang.

With real numbers, precisely addition and multiplication are associative and commutative. In contrast this is not the case for floating point numbers, and thus by extension not for the rounding error of these operations.

At least to me, this is quite unintuitive and an further explanation would be greatly appreciated.

Section 1.2 # Types of Languages, better examples

Hello,

while it might be worth making the distinction between a programming languages and its implementation(s) or adding the garbage collection (protip: reference counting is also GC) distinction, this issue was prompted by reading the native/managed duality point immediately followed by the grouping of Python (cpython, I suppose) and JavaScript/Ruby (both being JiT compiled in their most popular and/or reference implementations) in the examples.
Even more confusingly, PyPy is mentioned a bit further.

Personally, when I have to use such categories, I use Interpreters, Bytecode compilers (aka VM) and Native compilers with JiT/AoT and GC/manual as possible additional differentiators.

PS: your work is amazing and very beginner friendly, I hope it'll help CS students smothered by theory and doing everything in Java/Python understand real world programming a bit more.

Log curve shows negative time

This picture can be confusing because the log curve is not starting at zero, and it becomes negative as the data input becomes small, which makes no sense. Maybe it's worth showing log(x+1) instead?

Binary GCD vs. c++ std::gcd benchmark.

Hello,

Not sure how you benchmarked the GCD code but on my laptop (with Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz), std::gcd is significantly faster than the optimized binary GCD implementation. (Edit: I accidentally left my benchmark on a pathological test case instead... whoops.)

https://godbolt.org/z/3475szdz8

std::gcd     6.6 ms / 1048576 iterations =  6.294 nanoseconds
binary_gcd  23   ms / 1048576 iterations = 21.934 nanoseconds
Benchmark 1: ./std_gcd
  Time (mean ± σ):       6.6 ms ±   0.7 ms    [User: 6.2 ms, System: 0.6 ms]
  Range (min … max):     6.1 ms …   9.3 ms    252 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: ./binary_gcd
  Time (mean ± σ):      23.0 ms ±   1.7 ms    [User: 22.6 ms, System: 0.6 ms]
  Range (min … max):    22.3 ms …  30.4 ms    119 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  './std_gcd' ran
    3.35 ± 0.32 times faster than './binary_gcd'
 Performance counter stats for './std_gcd':

             11.17 msec task-clock:u              #    0.969 CPUs utilized
                 0      context-switches:u        #    0.000 /sec
                 0      cpu-migrations:u          #    0.000 /sec
               100      page-faults:u             #    8.949 K/sec
        30,609,505      cycles:u                  #    2.739 GHz
        31,468,434      instructions:u            #    1.03  insn per cycle
         5,564,990      branches:u                #  498.021 M/sec
             8,829      branch-misses:u           #    0.16% of all branches

       0.011527144 seconds time elapsed

       0.011597000 seconds user
       0.000000000 seconds sys
 Performance counter stats for './binary_gcd':

             31.39 msec task-clock:u              #    0.986 CPUs utilized
                 0      context-switches:u        #    0.000 /sec
                 0      cpu-migrations:u          #    0.000 /sec
                98      page-faults:u             #    3.122 K/sec
        99,257,572      cycles:u                  #    3.162 GHz
       235,231,331      instructions:u            #    2.37  insn per cycle
        18,247,030      branches:u                #  581.359 M/sec
           628,198      branch-misses:u           #    3.44% of all branches

       0.031830958 seconds time elapsed

       0.031932000 seconds user
       0.000000000 seconds sys

improve accuracy of integer factorization with primality testing.

For numbers <2^64, not having factors <2^16 means that they have a roughly 1/2 chance of being prime. As such, it is worthwhile to perform a primality check. This can be done efficiently using Miller-Rabin with the bases (2, 325, 9375, 28178, 450775, 9780504, 1795265022) which have been proven to give the correct answer for all inputs <2^64. This can be done efficiently using Montgomery multiplication for the modular exponentiation, and runs in ~1us for prime inputs and ~350ns for composite inputs. This will have a somewhat notable affect on the time the algorithm takes, but makes it very easy to never give the wrong answer (which is somewhat messing with benchmarking since the cases where the current algorithm messes up tend to be the points that will take longer to factor anyway).

Incorrect / incomplete definition of B-tree criterion

https://github.com/algorithmica-org/algorithmica/blob/master/content/english/hpc/data-structures/s-tree.md#b-tree-layout

https://en.algorithmica.org/hpc/data-structures/s-tree/#b-tree-layout

The following criterion for B-tree keys is incomplete: "...each satisfying the property that all keys in the subtrees of the first i children are not greater than the i-th key in the parent node."

The criterion as written is met even if all keys in all subtrees are zero while all keys in the parent node are greater than zero.

What's missing is that the child keys must also be greater than (or not less than, depending on how one wants to define the B-tree) the 'previous' parent key. Or you might say that the keys in child node i must be between the parent keys i-1 and i. Or something like that. I can't think of a good wording right now...

Found the random search limit

The slowdown for randomized binary search versus the normal version is 2*log(2), about 1.386. Doesn't require much calculus in the end: the randomized time simplifies to twice the harmonic series, which has the well-known sum log(n) + O(1). Dividing by log2(n) causes the O(1) part to go to 0 (slowly), and log(n) / log2(n) is the constant log(2).

Flattening the recurrence, starting from your linear-time version:

# From random-binsearch.py
g[k] = (s[k - 1] * 2 / k / (k - 1) + 1) * k
     = 2 * s[k - 1] / (k - 1) + k
s[k] = s[k - 1] + g[k]

# Substitute g
s[k] = s[k - 1] * (1 + 2 / (k - 1)) + k
     = s[k - 1] * (k + 1) / (k - 1) + k
(s[k] / k) = s[k - 1] * (1 + 1 / k) / (k - 1) + 1
# Define r
r[k] = s[k] / k
r[k] = r[k - 1] * (1 + 1 / k) + 1

# Rewrite original g recurrence in f and r
f[k] * k = 2 * r[k - 1] + k
r[k - 1] = (f[k] - 1) * k / 2

# Substitute into r recurrence
(f[k + 1] - 1) * (k + 1) / 2 = (f[k] - 1) * k / 2 * (1 + 1 / k) + 1
                             = (f[k] - 1) * (k + 1) / 2 + 1
(f[k + 1] - 1) = (f[k] - 1) + 2 / (k + 1)
f[k + 1] = f[k] + 2 / (k + 1)
f[k] = f[k - 1] + 2 / k

Section 3.3: ambiguous code and statements to fix

Another, more complicated way to implement this whole sequence is to convert this sign bit into a mask and then use bitwise and instead of multiplication: ((a[i] - 50) >> 31 - 1) & a[i].

First, in the C language the minus operation has higher priority than the >> operation, hence that code is equivalent to ((a[i] - 50) >> 30) & a[i] and thus is not consistent with the assembly language code shown below.
Second, as mentioned in Section 4.4, the result of the >> operation on negative integer depends on the implementation, which is not specified in Section 3.3. Let us consider the following 2 cases:
Case 1. If the leftmost bit is extended: then ((a[i]-50)>>31) is -1 when a[i]<50 (ignoring underflow) and is 0 when a[i]>=50. Therefore there is no need of the -1 after 31 at all.
Case 2. If the leftmost bit is filled with 0: Then (a[i] - 50) >> 31 does get the sign bit of a[i]-50. In this case, (((a[i] - 50) >> 31) - 1) is 0 when a[i]<50 and is -1 when a[i]>=50. This is the reverse of what we want here: We want a '-1' when 'a[i]<50' and a '0' when 'a[i]>=50'.

To sum up, I think this part is pretty confusing and need to be fixed.

Website hotkeys conflict with browser hotkeys

In most browsers Alt + Left arrow and Alt + Right arrow are used to go back and forward in page history.

These hotkeys don't work on Algorithmica. They are mixed up with Left arrow and Right arrow (navigation between pages).

Consider using passive voice

One of the reasons why they are stored in this exact order is so that it would be easier to compare and sort them: you can simply use largely the same comparator circuit as for [unsigned integers](../integer) — except for maybe flipping the bits in the case of negative numbers.

The sentence reads a little weird to me. I’m not a native English speaker though.

Condition and stability are intrinsically different.

An algorithm is called *numerically stable* if its error, whatever its cause, does not grow to be much larger during the calculation. This happens if the problem is *well-conditioned*, meaning that the solution changes by only a small amount if the problem data are changed by a small amount.

Condition is a property of the problem at hand, stability is a property of an algorithm used to solve it. An algorithm solving a problem can be stable only if the underlying problem is well-conditioned. On the other hand, even if a problem is well conditioned, an algorithm solving it may be unstable (A classic example is matrix inversion). This is important because the remedy to bad conditioning is modifying the problem, whereas if your algorithm suffers from instability, it’s finding another algorithm.

Bug in the binary gcd code

The code on the line "a >>= az, b >>= bz;" is incorrect, it should be "a >>= shift, b >>= shift;"
Similarly, in the final version "b >>= bz;" is incorrect, it should be "b >>= shift;"

--Laci

PS: I wanted to submit a patch and pull request, but when I tried to submit the edited the page I got an error:
"Error while loading data from Github. This might be a temporary issue. Please try again later."
But it never worked in the course of 3 days... So I'm submitting the bug as an issue.

Recursion vs. work list

Recursive algorithms that can't tail-call optimized can still be implemented without recursion by using an auxiliary work list (essentially replacing the call stack with a dynamically allocated array). It would be useful to see some discussion of the tradeoffs inherent in that approach.

Better by which benchmark.

This test is better than switching all computations to lower precision and checking whether the result changed by too much. The default rounding-to-nearest converges to the correct “expected” value given enough averaging: statistically, half of the time, they are rounding up, and the other half, they are rounding down — so they cancel each other.

Not a very compelling reason to why changing rounding mode is preferable to changing precision.

Time estimation formulae is incorrect in Complexity Models

To estimate the real running time of a program, you need to sum all latencies for its executed instructions and multiply it by the clock frequency, that is, the number of cycles a particular CPU does per second.
This would mean that a higher clock frequency would cause the program to take longer to run rather than be quicker. This should rather be sum of latencies divided by clock frequency. Dimensionally, cycles * (cycles/second) gives us cycles^2/second whereas cycles * (seconds/cycles) gives us seconds.

Which compiler?

Although I can imagine which compiler you mean, it’s better to write it out.

Feedback on fast sqrt

https://github.com/algorithmica-org/algorithmica/blob/master/content/english/hpc/arithmetic/rsqrt.md

Above the graph, I think there's a typo: instead of L = 23, it should say L = 2^23. It would be nice to also mention that a multiplication by a power of two is equivalent to a left shift (hence where the I_x bit pattern comes from). The bias also comes out of nowhere: it should be briefly explained that IEEE exponents are shifted by 2^(e_x-1)-1. It's also be nice to have labels on the graph axes as I'm still not 100% sure what they mean. I believe vertical=binary representation while horizontal=float number.

Assembly Language

Hello. I suspect some information are missing from the page: https://en.algorithmica.org/hpc/architecture/assembly/

For comparison, here is what the summation loop looks like in AT&T asm

But there is no loop in the page, the example is *c = *a + *b.

Good luck, keep the good work! :)

EDIT: What I mean because it hasn't been explained before, the example is confusing as you don't know what the original code is supposed to do, and so, don't understand the AT&T assembly code.

EDIT2: OK, the for loop you are talking about is on the next page (loops and conditionals) and directly in assembly. Here is my suggestion: Stay on the original example in Assembly Language page (means *c = *a + *b) and provide a C example at the beginning of the "Loops and Conditionals" page. I think this would be more clear.

Keep the good work! :)

Cost of Ariane 5

Sometimes a software crash, in turn, causes a real, physical one. In 1996, the maiden flight of the [Ariane 5](https://en.wikipedia.org/wiki/Ariane_5) (the space launch vehicle that ESA uses to lift stuff into low Earth orbit) ended in [a catastrophic explosion](https://www.youtube.com/watch?v=gp_D8r-2hwk) due to the policy of aborting computation on arithmetic error, which in this case was a floating-point to integer conversion overflow, that led to the navigation system thinking that it was off course and making a large correction, eventually causing the disintegration of a $1B rocket.

The development cost of Ariane 5 was approximately 6B€. Launch prices are around 180M€

Right-shift 31 formula in section 3.3 is not always correct

"(a[i] - 50) >> 31" is equivalent to (a[i] < 50) for most possible values of a[i]... but not the ones in [-2^31, 50 - 2^31). So this is not a valid optimization unless the compiler can prove that all a[i] are not smaller than 50 - 2^31.

Self contradiction

Of course, a more general approach would be to switch to a more precise data type, like `double`, either way effectively squaring the machine epsilon. It can sort of be scaled by bundling two `double` variables together: one for storing the value, and another for its non-representable errors, so that they actually represent $a+b$. This approach is known as *double-double* arithmetic, and it can be similarly generalized to define quad-double and higher precision arithmetic.

You are contradicting yourself. On the one hand, you say switching to higher precision is not scalable, on the other hand, that’s what you recommend here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.