Page 1 of 1

IBM PC emulator written in only 4043 bytes of C code

Posted: January 15th, 2014, 4:32 am
by Malvineous
Not sure if anyone saw this, but in the 2013 winners of the International Obfuscated C Code Contest (a competition among C programmers to create the most difficult to read source code) one of the entries was a pretty much fully functional IBM PC emulator. This is quite amazing considering the source code is only 4043 bytes long (exactly half of 8086.) DOSBox emulates more than the original PC (including sound cards and other peripherals) but it is hundreds of times larger.

You can find it at http://www.ioccc.org/years.html#2013 (look under the "cable3" heading, and try hint.html for some info.)

For completeness, here is the source code of the emulator, which when coupled with a copy of a PC BIOS ROM is capable of running an unmodified MS-DOS disk image (supplied as part of the entry.) What you see below is real code and can be compiled with a standard C compiler to produce the emulator program.

Amazing!

Code: Select all

#include "SDL.h"

#define $ for(O=9
#define CX M+=(T%3+2*!(!T*t-6))
#define x ,A=4*!T,O=t,W=h=T<3?u(Q?p:D(A+3),D(A),D(A+1)[i]+D(A+2)*g+):K(t),U=V=K(a),o?U=h,W=V:V,
#define C 8*-~L
#define Z short
#define y a(Z)Y[++O]
#define B ),a--||(
#define _ ),e--||(

#define V(I,D,E)(O=a(I)h[r])&&!(A=(D)(V=(1[E+L]<<16)+*i)/O,A-(I)A)?1[E+L]=V-O*(*E=A):H(0)
#define i(B,M)B(o){return M;}
#define R(O,M,_)(S=L?a(I Z)O:O,N=L?a(I Z)O M(f=a(I Z)_):(O M(f=a(I n)_)))
#define T(_)R(r[u(10,L=4,--)],=,_)
#define u(a,r,T)16*i[a]+(I Z)(T i[r])
#define a(_)*(_*)&
#define L(_)M(W,_,U)

#define M(S,F,T)R(r[S],F,r[T])
#define A(_)(i[L=4]+=2,R(_,=,r[u(10,4,-2+)]))
#define c(R,T)(1[u=19,L+T]=(N=a(R)h[r]*(R)*T)>>16,*i=N,G(F(N-(R)N)))
#define h(_)(1&(L?a(Z)_:_)>>C-1)
#define I unsigned
#define n char
#define e(_)v(F(40[L(_##=40[E]+),E]&N==S|_ N<_(int)S))

I n t,e,l[80186],*E,m,u,L,a,T,o,r[1<<21],X,*Y,b,Q=0,R=0;I Z*i,M,p,q=3;I*localtime(),f,S,kb=0,h,W,U,c,g,d,V,A;N,O,P=983040,j[5];SDL_Surface*k=0;i(K,P+(L?2*o:2*o+o/4&7))i(D,r[a(I)E[259+4*o]+O])i(w,i[o]+=~(-2*47[E])*~L)i(v,(z((f^=S^N)&16),G(N-S&&1&(40[E]^f>>C-1))))J(){V=61442;$;O--;)V+=40[E+O]<<D(25);}i(H,(46[u=76,J(),T(V),T(9[i]),T(M),M(P+18,=,4*o+2),R(M,=,r[4*o]),E]=0))s(o){$;O--;)40[E+O]=1&&1<<D(25)&o;}i(BP,(*i+=262*o*z(F((*E&15)>9|42[E])),*E&=15))i(SP,(w(7),R&&--1[i]&&o?R++,Q&&Q++,M--:0))DX(){$,O*=27840;O--;)O[(I*)k->pixels]=-!!(1<<7-O%8&r[O/2880*90+O%720/8+(88+952[l]/128*4+O/720%4<<13)]);SDL_Flip(k);}main(BX,nE)n**nE;{9[i=E=r+P]=P>>4;$;q;)j[--q]=*++nE?open(*nE,32898):0;read(2[a(I)*i=*j?lseek(*j,0,2)>>9:0,j],E+(M=256),P);$;Y=r+16*9[i]+M,Y-r;Q|R||kb&46[E]&&KB)--64[T=1[O=32[L=(X=*Y&7)&1,o=X/2&1,l]=0,t=(c=y)&7,a=c/8&7,Y]>>6,g=~-T?y:(n)y,d=BX=y,l],!T*t-6&&T-2?T-1?d=g:0:(d=y),Q&&Q--,R&&R--x(O=*Y,O=u=D(51),e=D(8),m=D(14)_ O=*Y/2&7,M+=(n)c*(L^(D(m)[E]|D(22)[E]|D(23)[E]^D(24)[E]))_ L=*Y&8,R(K(X)[r],=,c)_ L=e+=3,o=0,a=X x a=m _ T(X[i])_ A(X[i])_ a<2?M(U,+=1-2*a+,P+24),v(f=1),G(S+1-a==1<<C-1),u=u&4?19:57:a-6?CX+2,a-3||T(9[i]),a&2&&T(M),a&1&&M(P+18,=,U+2),R(M,=,U[r]),u=67:T(h[r])_(W=U B u=m,M-=~L,R(W[r],&,d)B 0 B L(=~)B L(=-),S=0,u=22,F(N>S)B L?c(I Z,i):c(I/**/n,E)B L?c(Z,i):c(n,E)B L?V(I Z,I,i):V(I n,I Z,E)B L?V(Z,int,i):V(n,Z,E))_++e,h=P,d=c,T=3,a=m,M--_++e,13[W=h,i]=(o|=!L)?(n)d:d,U=P+26,M-=~!o,u=17+(m=a)_(a=m B L(+=),F(N<S)B L(|=)B e(+)B e(-)B L(&=)B L(-=),F(N>S)B L(^=)B L(-),F(N>S)B L(=))_!L?L=a+=8 x L(=):!o?Q=1,R(r[p=m x V],=,h):A(h[r])_ T=a=0,t=6,g=c x M(U,=,W)_(A=h(h[r]),V=m?++M,(n)g:o?31&2[E]:1)&&(a<4?V%=a/2+C,R(A,=,h[r]):0,a&1?R(h[r],>>=,V):R(h[r],<<=,V),a>3?u=19:0,a<5?0:F(S>>V-1&1)B R(h[r],+=,A>>C-V),G(h(N)^F(N&1))B A&=(1<<V)-1,R(h[r],+=,A<<C-V),G(h(N*2)^F(h(N)))B R(h[r],+=(40[E]<<V-1)+,A>>1+C-V),G(h(N)^F(A&1<<C-V))B R(h[r],+=(40[E]<<C-V)+,A<<1+C-V),F(A&1<<V-1),G(h(N)^h(N*2))B G(h(N)^F(h(S<<V-1)))B G(h(S))B 0 B V<C||F(A),G(0),R(h[r],+=,A*=~((1<<C)-1>>V)))_(V=!!--1[a=X,i]B V&=!m[E]B V&=m[E]B 0 B V=!++1[i]),M+=V*(n)c _ M+=3-o,L?0:o?9[M=0,i]=BX:T(M),M+=o*L?(n)c:c _ M(U,&,W)_ L=e+=8,W=P,U=K(X)_!R||1[i]?M(m<2?u(8,7,):P,=,m&1?P:u(Q?p:11,6,)),m&1||w(6),m&2||SP(1):0 _!R||1[i]?M(m?P:u(Q?p:11,6,),-,u(8,7,)),43[u=92,E]=!N,F(N>S),m||w(6),SP(!N==b):0 _ o=L,A(M),m&&A(9[i]),m&2?s(A(V)):o||(4[i]+=c)_ R(U[r],=,d)_ 986[l]^=9,R(*E,=,l[m?2[i]:(n)c])_ R(l[m?2[i]:(n)c],=,*E)_ R=2,b=L,Q&&Q++_ W-U?L(^=),M(U,^=,W),L(^=):0 _ T(m[i])_ A(m[i])_ Q=2,p=m,R&&R++_ L=0,O=*E,F(D(m+=3*42[E]+6*40[E])),z(D(1+m)),N=*E=D(m-1)_ N=BP(m-1)_ 1[E]=-h(*E)_ 2[i]=-h(*i)_ 9[T(9[i]),T(M+5),i]=BX,M=c _ J(),T(V)_ s(A(V))_ J(),s((V&~m)+1[E])_ J(),1[E]=V _ L=o=1 x L(=),M(P+m,=,h+2)_++M,H(3)_ M+=2,H(c&m)_++M,m[E]&&H(4)_(c&=m)?1[E]=*E/c,N=*E%=c:H(0)_*i=N=m&E[L=0]+c*1[E]_*E=-m[E]_*E=r[u(Q?p:m,3,*E+)]_ m[E]^=1 _ E[m/2]=m&1 _ R(*E,&,c)_(a=c B write(1,E,1)B time(j+3),memcpy(r+u(8,3,),localtime(j+3),m)),a<2?*E=~lseek(O=4[E][j],a(I)5[i]<<9,0)?(a?write:read)(O,r+u(8,3,),*i):0:0),O=u,D(16)?v(0):D(17)&&G(F(0)),CX*D(20)+D(18)-D(19)*~!!L,D(15)?O=m=N,41[43[44[E]=h(N),E]=!N,E]=D(50):0,!++q?kb=1,*l?SDL_PumpEvents(),k=k?k:SDL_SetVideoMode(720,348,32,0),DX():k?SDL_Quit(),k=0:0:0;}i(F,40[E]=!!o)i(z,42[E]=!!o)i(G,48[E]=o)

Re: IBM PC emulator written in only 4043 bytes of C code

Posted: January 15th, 2014, 6:09 am
by MrFlibble
That sounds pretty cool. I wonder if it will have any practical implementation though (e.g. in the DOSBox project). This story immediately brought to my mind the .kkrieger project which also has impressive capabilities in spite of a very small size, however this has not become a widely used technology none the less.

Re: IBM PC emulator written in only 4043 bytes of C code

Posted: January 15th, 2014, 9:01 am
by DOSGuy
It would be a lot larger than 4043 bytes once the code was made readable (descriptive variable names, line breaks, etc.), but it will still have the same compiled size. How that compares to other PC emulators is impossible to say without compiling it. I'll definitely check this out later.

Re: IBM PC emulator written in only 4043 bytes of C code

Posted: January 16th, 2014, 2:51 am
by Malvineous
The main point behind this emulator is not so much functionality, but rather that the 8086 CPU can be emulated at all in such a small amount of source code. The Intel x86 architecture is extremely difficult to emulate, and it's not until relatively recently that it was done successfully. Just think of all the not-quite-right attempts over the years, like the MS-DOS Prompt in Windows 95 (which despite having hardware support, often required you to reboot in MS-DOS mode to work correctly) or VDMSound, which occasionally let you emulate a sound card under an NT/2000 DOS prompt but took a certain amount of voodoo to work properly.

All these workarounds were created because of various difficulties in emulating the x86 CPU. Even today, DOSBox tends to crash on occasion when a game does something that would work on real hardware. It's really, really difficult to do.

The fact that the CPU can be emulated with such a tiny amount of code here is nothing short of astonishing. When you consider what the emulator actually has to do it's unbelievable!

Re: IBM PC emulator written in only 4043 bytes of C code

Posted: January 17th, 2014, 7:30 am
by MrFlibble
Malvineous wrote:The fact that the CPU can be emulated with such a tiny amount of code here is nothing short of astonishing. When you consider what the emulator actually has to do it's unbelievable!
Unfortunately my knowledge of programming is negligible, could you please explain what kind of breakthrough was needed to create this emulator now while all these years other authors spent a lot more effort and resources and did not achieve a similar result?

Can we say now that the x86 CPU emulation algorithm is "worked out" (deciphered) now and it will change the entire field of hardware emulation? Or is this emulator simplified in certain respects when compared to that part of DOSBox that does the same thing (but less perfectly if I understood it correctly)?

Re: IBM PC emulator written in only 4043 bytes of C code

Posted: January 20th, 2014, 5:14 am
by Malvineous
There wasn't any sort of 'breakthrough' as such, but over the years people gradually figured out how to emulate the CPU properly. But it was *so* complex that the emulators were massive beasts and very buggy, because there was so much that could go wrong. Other CPUs like those used in the Atari have been emulated gracefully and simply for many years, but the x86 was an order of magnitude more complex.

For instance, due to the segment:offset method of accessing memory, each memory location can be accessed by 4096 different addresses. All those thousands of addresses represent the exact same byte of memory.

Most other architectures accessed peripherals like keyboards by assigning special memory locations, so you communicated with the hardware by just reading and writing these memory locations just like anything else in memory. In the PC, this is how you accessed the display, but there was a separate system of 65536 I/O ports which were much slower, could only read and write one byte at a time, and used completely different instructions (assembly language commands.) Most hardware was placed here, but then because they were so slow, devices like the Sound Blaster couldn't use them for digitised audio. But fixing this by putting memory on the SB card and using memory-mapped IO like the non-Intel architectures would have been more expensive, so instead they used Direct Memory Access (DMA). This involved programming a separate chip on the motherboard to allow the SB card to read from system memory, which gave it the performance it needed without the cost of dedicated memory. Except now there is a DMA controller chip that needs to be emulated, and it uses different memory addresses so the segment:offset ones (where there are 4096 ways of accessing the same memory address) have to be converted into "physical memory addresses" where there is only one address for each byte of memory.

If that wasn't enough, there are multiple ways of doing the same thing. To set the AX register to zero for example, you can use the assembly command "MOV AX,0". Except this takes up three bytes compared to the unrelated command "XOR AX,AX" which also sets AX to zero using only two bytes. So now you have commands which don't really say what they do, but they're widely used because it makes the .exe files smaller. For an emulator this is a headache, because if there is a command you're not aware of and you read in the wrong number of bytes, you'll no longer be aligned properly and every single command for a while afterwards will be nonsense, because you're reading the next commands from the middle instead of from the start. By comparison on ARM all instructions are four bytes long, so this never happens because your emulator doesn't need to know how long each command is - they are all the same.

This is a very, very quick summary of some of the more widely known complications when emulating an x86 processor. Perhaps to put it in some perspective, compare the DOSBox source code to the code above and you will see that DOSBox is considerably longer, although of course *much* easier to read. It's quite amazing that the code above does the same as all these files here plus more, because those four are only just part of the CPU emulation - they don't fully emulate a CPU or handle any graphics or floppy drive interfaces.

I'm sorry this explanation isn't really that good, but hopefully it at least gives you a little better of an idea why this emulator is such a work of art!

Re: IBM PC emulator written in only 4043 bytes of C code

Posted: January 20th, 2014, 7:34 am
by MrFlibble
Thanks for the very detailed explanation! I suspected there were, uh, complications related to the x86 emulations, but never really gave a thought to their extent :)

So the code here can be used in the DOSBox project to optimize it, right? Do they DOSBox developers know about it already?

Re: IBM PC emulator written in only 4043 bytes of C code

Posted: January 29th, 2014, 7:04 am
by Malvineous
I'm not sure whether you need to be a programmer to fully appreciate it, but I think it's always fascinating reading up on the x86 architecture and learning just how complicated and weird it really is. It's kind of odd that it started life as an embedded CPU. These days embedded CPUs are used to control microwaves, washing machines and air conditioners. But for whatever reason, this lowly CPU architecture kept getting extended and expanded until it's the mishmash of components we all know and love today.

For instance did you know that when you first power on your PC, your quad-core 64-bit CPU comes up in the same 16-bit mode as the original 8086? Then there are special commands you can run to flip it into essentially a different CPU, turning it into a 32-bit processor completely compatible with a Pentium (this is what DOS4GW does when you run some games.) Then there are still more commands that flip it into yet another mode where it is a full proper 64-bit CPU. So modern CPUs are kind of like three CPUs all in one, fully compatible back to the 1980s. This is why you can still boot MS-DOS 1.0 on a modern PC.

I'm not sure whether the DOSBox team could benefit from it - even if they could they'd need the original code where just the CPU emulation was split out from the terminal control and other routines. But you'd have to run benchmarks to be sure. The code might well be a fraction of the size, but that doesn't necessarily mean it's faster! Also this code only implements the original 16-bit 8086, whereas DOSBox emulates 32-bit CPUs like the 386 and newer.

Re: IBM PC emulator written in only 4043 bytes of C code

Posted: January 29th, 2014, 2:33 pm
by MrFlibble
Malvineous wrote:For instance did you know that when you first power on your PC, your quad-core 64-bit CPU comes up in the same 16-bit mode as the original 8086? Then there are special commands you can run to flip it into essentially a different CPU, turning it into a 32-bit processor completely compatible with a Pentium (this is what DOS4GW does when you run some games.) Then there are still more commands that flip it into yet another mode where it is a full proper 64-bit CPU. So modern CPUs are kind of like three CPUs all in one, fully compatible back to the 1980s. This is why you can still boot MS-DOS 1.0 on a modern PC.
I certainly did not think of any of this that way :) It's probably logical to have newer layers of CPU architecture built over older ones, from the underlying binary principle standpoint at least.
Malvineous wrote:I'm not sure whether the DOSBox team could benefit from it - even if they could they'd need the original code where just the CPU emulation was split out from the terminal control and other routines. But you'd have to run benchmarks to be sure. The code might well be a fraction of the size, but that doesn't necessarily mean it's faster! Also this code only implements the original 16-bit 8086, whereas DOSBox emulates 32-bit CPUs like the 386 and newer.
While smaller programmes don't have to be faster, it is also often true that more sophisticated solutions aren't always the most efficient ones, and this is why I asked if there was some kind of breakthrough with this emulator - perhaps the author figured out an algorithm that has "elegance in simplicity" so to speak, and it could - maybe not by direct implementation in DOSBox but by providing some programming insights - turn out to be beneficial for the project.

Re: IBM PC emulator written in only 4043 bytes of C code

Posted: March 27th, 2015, 12:36 pm
by developertn
Another example is my game usually between around 150k of codes. 4043 bytes is equivalent to about 4k. Very impressive indeed.