Code Monkey home page Code Monkey logo

Comments (6)

gannimo avatar gannimo commented on July 24, 2024 1

Still not sure if you're trolling. I do know what UB is and I get the point that you're making.

Let me repeat: this project is a fun pun at the complexity of printf implementations and we demonstrate that common implementations are TC. Let me also quote the disclaimer:

Keep in mind that this printbf interpreter is supposed to be a fun example of Turing completeness that
is available in current programs and not a new generic attack vector. This demo is NOT intended to
be a generic FORTIFY_SOURCE bypass.

You're barking up the wrong tree, please stop.

from printbf.

gannimo avatar gannimo commented on July 24, 2024

Not sure what the issue is. This was a fun side project to demonstrate Turing completeness in printf, please clarify your issue and possible actions, reopen if needed

from printbf.

bradenbest avatar bradenbest commented on July 24, 2024

printf is Turing complete, except when it's not. You are invoking undefined behavior.

The issue is that the program is not standards-compliant, not portable, and not proof that printf is Turing complete. You have only proven that printf on a specific set of specific versions of specific OSes using specific versions of specific compilers and libraries might have an ACE exploit that renders their implementation of printf TC, but printf is not TC and it is false to make such a claim. The standard specifies no such attack vector, making it a bug in whatever implementation you're using that just happens to work in your favor (for now). You are invoking undefined behavior. And you are relying on assumed semantics that are not and never were guaranteed nor explicitly stated to happen by ISO/IEC 9899, to execute an ACE exploit. One day your program will break in a way that is noticeable to you, and all it'll take is for one of these things to have a small change:

  • The CPU
  • The OS
  • The Kernel
  • The ABI
  • The libc implementation or version
  • The compiler implementation or version

Literally anything could change and make the program cease to function, but those are the most likely components, especially regarding memory management/protection and compiler optimization, because it would be so easy for your hack to turn into a stack smashing crash, a segmentation fault, or just be deleted by the compiler. You are invoking undefined behavior. In fact, I assure you the program is already broken somewhere. Can you really say you have tested it on every Intel, AMD and ARM chipset, every combination of hardware, and every version of every OS, compiler, library and other infrastructure? Of course not.

Just because "I compile it and it works for me" does not mean it will continue to work into the future, or that it's portable. Take this program:

int main(void){
    int msg[] = {0x6c6c6548, 0x216f};
    return puts( (char*)msg );
}

If you run it, it will (probably!) print "Hello!". But the program relies on several assumptions and invokes behavior that is not quite undefined, but unspecified. The fact is, this program will break if run on a big-endian machine, a machine where sizeof (int) == 4 is not true (such as a 16-bit DOS machine), machines that don't use an ASCII/UTF-8 compatible charset, and possibly other things. Did you know there are other byte orders than just LE and BE, used with certain PDP and IBM architectures? In other words, it's impossible to say what that code will do on any given machine, unless you make several assumptions.

You are invoking undefined behavior.

from printbf.

bradenbest avatar bradenbest commented on July 24, 2024

This issue is not resolved. The program doesn't work on my PDP-11 running version 5 Unix.

from printbf.

gannimo avatar gannimo commented on July 24, 2024

Not exactly sure what you're trying to demonstrate. Note that printbf is intended as a fun demonstration of a concept referenced in a paper and not as a formal proof.

from printbf.

bradenbest avatar bradenbest commented on July 24, 2024

Do you maybe not understand what UB is? It's one of the most important concepts you can learn regarding C programming. Undefined Behavior is when the standard that defines the language leaves a certain behavioral domain undefined. What it means is that they are saying that doing X is to be considered insane, and if you do it, you can't complain to the vendor that your program isn't working consistently. To the programmer, the standard says "You see this? This is insane. Don't do this.". To the vendor, the standard says "You see this? This is insane. You are allowed to assume whatever you want, as we are making no requirements here." The compiler can do anything when it encounters UB and still be technically correct, whether it fires a nuke, sets your PC on fire, travels back in time to kill your parents, or makes demons fly out of your nose. You can only blame yourself, because even if the compiler was designed to do malicious things when it encounters UB, you were the one who invoked it. The responsibility is shifted onto the programmer. Thus, it's important to be aware of what can cause undefined behavior, so that you can avoid it and be assured that your code will do what you expect it to do.

In other words, Undefined Behavior is a series of contracts between the programmer and the language, that the language will do what you expect as long as you comply with its restrictions.

Why would they do this? Because it allows compilers to better-tailor their code generation and optimization to any given hardware, without having to introduce unnecessary overhead that would bottleneck performance. For example, the fact that there is no bounds checking on arrays means that the compiler doesn't have to emit code that checks array bounds. The compiler will, in the general sense, assume that you will not do that, an assumption that the standard explicitly allows. The compiler is allowed to do or let anything happen when that contract is violated. Another example would be null pointers. If you dereference a null pointer, you have just invoked undefined behavior. An optimizing compiler might look at a sequence like

FILE *fp = fopen("file", "r");
char buffer[4096];
ssize_t nread;

nread = fread(buffer, 1, 4096, fp);

if(fp != NULL){
    do_the_thing_with_the_thing_please(fp);
    // do a bunch of things that are critical to the functioning of this program
    // however, if these things are done with a null pointer, it triggers a series
    // of events that somehow causes an ebola outbreak in Texas.
    // Yeah, we're just as confused as you. Linus says it's a kernel bug.
}

and decide "hey, the programmer checks if fp is NULL here, but earlier, they're using the pointer. Clearly, they already know that fp is a valid pointer and this null check is unnecessary". So what does the compiler do? It deletes the entire if statement, rewriting the code as if the safe condition is always true (so the "bunch of things" would be executed even if the pointer was null, which means if the file is missing, in this scenario, Texas becomes ground zero for an ebola pandemic).

Silly outbreak scenario aside, sometimes this can be blatant and easily noticeable, other times it can be so subtle that it goes unnoticed for years, because the only time it's noticeable (e.g. the program crashes or does something obviously wrong) is in those rare edge cases that you expected to be handled.

And that's basically where the crux of this issue comes from. The entire reason this program even works is because it invokes at least three different types of undefined behavior, and in this case, the stars have aligned to allow something interesting to happen. But the presence of UB isn't mentioned anywhere in this project as far as I can tell. There are programmers like us who are battle-hardened and can spin up crazy solutions to problems, and we know the risks and nuances of the systems we use. But there are also novice programmers who aren't there yet, and they might see this project and assume that printf is just "capable of weird stuff", or they might assume that C is a flawed language, even though the layers behind this are actually well-reasoned and it was a contractual violation that ultimately led to the exploit, and that the actual vulnerability lies with the OS, not with the programming language (if C didn't have UB, people would just write exploits in assembly, a language that has no UB but allows for virtually full control of the instruction set).

If this project were to go viral, wouldn't you want to have said something about UB and what it means, so that those newbie programmers can gain a better appreciation of cybersecurity and technology standards? Sorry to turn this into an ethics discussion, but it's important.

from printbf.

Related Issues (6)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.