lowlevelmemes / subleq-emu Goto Github PK
View Code? Open in Web Editor NEWBootable emulator for the Dawn Operating System for x86.
License: Other
Bootable emulator for the Dawn Operating System for x86.
License: Other
How do you expect me to convert 2D anime girls into 3D when the framerate is under 5 FPS!!!!!11!one! See, this is why Geri wants to burn down IBM and Intel HQ, because they made it so nothing is compatible. How are we going to live in a world where Waifu marriage is legal is we don't have a superfast DawnOS emulator and to do this we need KysVM!!!!!oen!1
but srsly, it gets to the point right before the screen clears Dawn Initializing
shows up and then crashes :^(
In readme file following line exists:
https://github.com/mintsuki/subleq-emu/blob/38abb9063ab0e4c0fef87591e9e91e4dc9715315/README#L1
When I looked inside this archive:
https://github.com/mintsuki/subleq-emu/releases/download/bin-release10/subleq.img.xz
I found .img
file instead of .iso
file.
Renaming it to .iso
did not allowed me to boot it in VirtualBox.
Converting to .vdi
with VBoxManage
command succeeded, but I see empty window when launch Disks application, so I'm not sure if such conversion was the right way of solving this problem.
Please either mention in readme that it is HDD image and not ISO 9660 one or convert your image to .iso
format.
Given the abysmal performance of Dawn on any hardware, I propose a change in design:
Instead of emulating every single instruction, we could instead convert/compile the SUBLEQ bytecode into direct CPU machine code. The idea is to let primary code run directly on the hardware, on paged RAM, and handle I/O by means of page faults.
The conversion/compilation speed will depend on whether or not Dawn separates its code from its data, because if it doesn't we'd have to perform an entire analysis of the binary before compiling. Needless to say, this suggestion wouldn't work at all if Dawn engages in self-modifying-code shenanigans.
You may be asking though why go through all this trouble? Compilation by itself won't make it run much faster, since the frequent page faults would offset the gains of direct execution... Ahh, but now, the bytecode can be optimized! We can convert sequences of subleq into faster instructions, for example:
0 : subleq A, B, 24
24:
0 : subleq A, A, 24
24:
0 : subleq A, A, off
to
mov rsi, A
mov rdi, B
mov rax, [rsi]
sub [rdi], rax
mov rdi, A
xor rax, rax
mov [rdi], rax
mov rdi, A
xor rax, rax
mov [rdi], rax
mov rax, off
jmp rax
you know, instead of the generic:
mov rsi, A
mov rdi, B
mov rax, [rsi]
sub [rdi], rax
jg .skip
mov rax, C
jmp rax
.skip:
All addresses of course would have to be converted to the proper endianess during compilation. Or rather, not just addresses but every single qword.
Fam,
SubleqEmu crashes with 2GiB of RAM, it needs moar than 2GiB of RAM
Graphics.c is only an example, but there is a problematic design pattern in your extended inline assembly usage. Part of it is your belief that an input constraint is automatically also a clobber. That is false. If a constraint is an input/output(+
) or an output only (=
) the compiler assumes they can be clobbered. That is not the case for input only constraints. unless paired with a corresponding output constraint.
In May when you informed me of how clobbers allegedly worked (I knew better) - I pulled up this project and found within a minute a piece of inline assembly that breaks many of GCC's inline assembly usage rules. I knew what the issues were, very obvious so I decided back then to post my concerns about your understanding of clobbers and an analysis of one function in particular: swap_vbufs
for all its flaws. A third party who also is well versed in inline assembly confirmed all these observations in an answer on Stackoverflow
As well, when passing addresses of data structure and/or arrays through registers you have to inform the compiler that there are constraints that are accessing data indirectly. For example - an address being passed via a register where the compiler may not be aware that what the register points at is being read and/or written to. The easiest brute force mechanism in the GCC documentation is the usage of "memory"
clobber or you can create dummy memory constraints.
If you would like to see all the issues with swap_vbufs
's inline assembly then this Stackoverflow question and answer may be of some benefit.
Regarding memset
(using inline assembly) where you don't inform the compiler (assume 64-bit code) that the RDI and RCX registers clobbered. I have godbolt output of this code:
#include<stddef.h>
#include<stdio.h>
#include<string.h>
/* Compiler isn't aware that RDI or RCX are clobbered */
void *badmemset1(void *dest, int value, size_t count)
{
asm volatile ("cld; rep stosb" :: "D"(dest), "c"(count), "a"(value) : "cc", "memory");
return dest;
}
/* Compiler isn't aware that RCX is clobbered, but RDI is clobbered */
void *badmemset2(void *dest, int value, size_t count)
{
void *temp = dest;
asm volatile ("cld; rep stosb" : "+D"(dest) : "c"(count), "a"(value) : "cc", "memory");
return temp;
}
/* Compiler is aware that RDI and RCX are clobbered */
void *goodmemset(void *dest, int value, size_t count)
{
void *temp = dest;
asm volatile ("cld; rep stosb" : "+c"(count), "+D"(dest) : "a"(value) : "cc", "memory");
return temp;
}
int main()
{
char bufa[]="0123456789";
char bufb[]="0123456789";
char bufc[]="0123456789";
badmemset1(bufa, 'a', 10);
badmemset1(bufa, 'b', 10);
/* This should print out 10 'b' */
printf("bufa: %10s\n", bufa);
badmemset2(bufb, 'a', 10);
badmemset2(bufb, 'b', 10);
/* This should print out 10 'b' */
printf("bufb: %10s\n", bufb);
goodmemset(bufc, 'a', 10);
goodmemset(bufc, 'b', 10);
/* This should print out 10 'b' */
printf("bufc: %10s\n", bufc);
}
#if 0
/* This version is less bruteforce and doesn't require volatile, and drops the unneeded CLD
* the OP had, as the calling convention requires that DF=0 upon entry to a function */
void *goodmemset(void *dest, int value, size_t count)
{
void *temp = dest;
asm ("rep stosb"
: "+c"(count), "+D"(dest), "=m"(*(char (*)[count])dest)
: "a"(value));
return temp;
}
#endif
Very simple code. Memset a buffer with a
and then immediately memset a buffer with b
. Three different variants of the code. Each should should print bbbbbbbbbb
. With -O0
it works as expected as the the output is:
bufa: bbbbbbbbbb
bufb: bbbbbbbbbb
bufc: bbbbbbbbbb
At -O3
it doesn't work. Versions of GCC on godbolt produce output that may often look like:
bufa:
bufb: aaaaaaaaaa
bufc: bbbbbbbbbb
The behaviour of violating GCC's inline assembly rules as documented may vary from compiler to compiler so the output of any one compiler may not match the results of above.
badmemset1
is a version where the compiler is unaware RDI and RCX are potentially clobbered; badmemset2
is a version where the compiler is told RDI is clobbered and RCX isn't; and goodmemset
is a version where the compiler is properly told that RCX and RDI are both potentially clobbered. Only the last version produces the expected results.
This is all contrary to your assertion on Discord in May that:
"c"(count) already tells the compiler c is clobbered
Contrary to your claim that I said
claims rep ** shit in x86 is good
I have never made such claims. Someone posted code using stosb
in memset
and I tried to explain about the clobber issues which you refuse to understand. My discussion wasn't about the merits/pitfalls of using string instructions., but why the inline assembly was incorrect. I spent great deal of time in your absence explaining to iiSaLMaN why his code was wrong (despite your incorrect claims), and got him an implementation that works with higher optimization levels and code being inlined. In fact I'd be more inclined to code memset
in C as it doesn't necessarily have to be done in assembly.
You also published this following bios_print
function so that iiSaLMaN could learn from your programming prowess. I'll give you an A
for effort but an F
on implementation. It suffers the same issues as the other code you believe is correct. If you compile with -O0
there is a reasonable chance the code will work because the compiler like GCC/CLANG will usually generate load/stores around every C statement. The problem with clobbers may be hidden but becomes potential source of bugs on higher optimization levels especially when the compiler decides to inline functions. This was the bios_print
you touted as an example:
void bios_print(const char *str) {
asm (
"1:\n\t"
"lodsb\n\t"
"test al, al\n\t"
"jz 2f\n\t"
"int 0x10\n\t"
"jmp 1b\n\t"
"2:\n\t"
:
: "a"(0x0e00), "S"(str)
: "cc", "memory"
);
}
A simple way to break this code with -O3
is to call the function bios_print
twice in a row with the same string:
void bios_print(const char *str) {
asm (
"1:\n\t"
"lodsb\n\t"
"test al, al\n\t"
"jz 2f\n\t"
"int 0x10\n\t"
"jmp 1b\n\t"
"2:\n\t"
:
: "a"(0x0e00), "S"(str)
: "cc", "memory"
);
}
int test_print()
{
char *mystring = "Hello, world!";
bios_print (mystring);
bios_print (mystring);
}
You can observe the incorrect behaviour in the generated assembly in the second godbolt pane:
.LC0:
.string "Hello, world!"
test_print:
push esi
mov eax, 3584
mov esi, OFFSET FLAT:.LC0
1:
lodsb
test al, al
jz 2f
int 0x10
jmp 1b
2:
; At this point ESI is pointing one past the end NUL byte of the string.
; The following loop will start where ESI left off printing something unexpected
; or nothing rather than printing the same string again.
1:
lodsb
test al, al
jz 2f
int 0x10
jmp 1b
2:
pop esi
ret
This occurred because "S"(str)
doesn't mark ESI as a clobber. This is an input only constraint where the compiler expects the value not to change. The compiler assumed that the beginning of the string that was originally in ESI was still there and it reused it. Unfortunately you changed it without telling the compiler. This is documented in the GCC manual with a warning:
Warning: Do not modify the contents of input-only operands (except for inputs tied to outputs). The compiler assumes that on exit from the asm statement these operands contain the same values as they had before executing the statement.
What happens if the code is changed to mark ESI ("S"
) as being clobbered by specifying it as an input/output constraint using "+S"
? The code could look like:
void bios_print(const char *str) {
asm volatile (
"1:\n\t"
"lodsb\n\t"
"test al, al\n\t"
"jz 2f\n\t"
"int 0x10\n\t"
"jmp 1b\n\t"
"2:\n\t"
: "+S"(str)
: "a"(0x0e00)
: "cc", "memory"
);
}
You can see this in the third pane in the godbolt example how the output differs where ESI is being reloaded with the address of the string:
.LC0:
.string "Hello, world!"
test_print:
mov edx, OFFSET FLAT:.LC0 ; EDX is used to store a copy of the string address
push esi
mov eax, 3584
mov esi, edx
1:
lodsb
test al, al
jz 2f
int 0x10
jmp 1b
2:
mov esi, edx ; ESI is reloaded with copy of address in EDX
1:
lodsb
test al, al
jz 2f
int 0x10
jmp 1b
2:
pop esi
ret
I would have avoided having to specify the "memory"
clobber by using a dummy memory constraint ("m"
) to inform the compiler that we are reading an unspecified number of elements from the character string. I would have also ensured BH was 0 (assuming output to page 0 of course) to ensure the page was set. I would have also coded the loop so there is only one unconditional jump and the test al, al
results form the conditional branch. This avoids extra unnecessary unconditional branches. The code could have looked like:
void bios_print(const char *str) {
asm volatile (
"jmp 2f\n\t"
"1:\n\t"
"int 0x10\n\t"
"2:\n\t"
"lodsb\n\t"
"test al, al\n\t"
"jnz 1b\n\t"
: "+S"(str)
: "a"(0x0e00), "b"(0x0000), "m"(*(const char (*)[])str)
: "cc"
);
}
Of course the lazy way would be to have the compiler do the looping. This will likely produce longer encoding and probably why you did the loop inside inline assembly. That code would have simply looked like:
void bios_print(const char *str) {
while (*str) {
asm (
"int 0x10\n\t"
:
: "a"((uint16_t)0x0e<<8 | *str++), "b"(0x0000)
);
}
}
Note:: It can probably go without saying that it is probably better to reimplement the BIOS tty output, scrolling etc in C and dispense with calling the BIOS functions if you are writing a large quantity of real mode code in GCC.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.