asm jurnal APJ 1


::/ \::::::.
:/___\:::::::.
/| \::::::::.
:| _/\:::::::::.
:| _|\ \::::::::::. Oct/Nov 98
:::\_____\::::::::::. Issue 1
::::::::::::::::::::::.........................................................

A S S E M B L Y P R O G R A M M I N G J O U R N A L
http://asmjournal.freeservers.com
asmjournal@mailcity.com




T A B L E O F C O N T E N T S
----------------------------------------------------------------------
Introduction...................................................mammon_

"VGA Programming in Mode 13h".............................Lord Lucifer

"SMC Techniques: The Basics"...................................mammon_

"Going Ring0 in Windows 9x".....................................Halvar

Column: Win32 Assembly Programming
"The Basics"..............................................Iczelion
"MessageBox"..............................................Iczelion

Column: The C standard library in Assembly
"_itoa, _ltoa and _ultoa"...................................Xbios2

Column: The Unix World
"x86 ASM Programming for Linux"............................mammon_

Column: Issue Solution
"11-byte Solution"..........................................Xbios2
----------------------------------------------------------------------
+++++++++++++++++++++++Issue Challenge++++++++++++++++++++
Write a program that displays its command line in 11 bytes
----------------------------------------------------------------------




::/ \::::::.
:/___\:::::::.
/| \::::::::.
:| _/\:::::::::.
:| _|\ \::::::::::.
:::\_____\:::::::::::..............................................INTRODUCTION
by mammon_


Welcome to the first issue of Assembly Programming Journal. Assembly language
has become of renewed interest to a lot of programmers, in what must be a
backlash to the surge of poor-quality RAD-developed programs (from Delphi, VB,
etc) released as free/shareware over the past few years. Assembly language
code is tight, fast, and often well-coded -- you tend to find fewer
inexperienced coders writing in assembly language than you do writing in, say,
Visual Basic.

The selection of articles is somewhat eclectic and should demonstrate the
focus of this magazine: i.e., it targets the assembly-language programming
community, not any particular type of coding such as Win32, virus, or demo
programmimg. As the magazine is newly born and much of its purpose may seem
unclear, I will devote the rest of this column to the most common questions I
have received via email regarding the mag.


How often will an issue be released?
------------------------------------
Barring hazard, an issue will be released every other month.


What types of articles will be accepted?
----------------------------------------
Anything to do with assembly language. Obviously repeats of previously
presented material are not necessary unless they enhance or clarify the
earlier material. The focus will be on Intel x86 instruction sets; however
coding for other processors is acceptable (though out of courtesy it would be
good point to an x86 emulator for the processor you write on).

Personally I am looking for articles on the areas of asembly language that
interest me: code optimization, demo/graphics programming, virus coding, unix
and other-OS asm coding, and OS-internals.

Demos (with source) and quality ASCII art (for issue covers, column logos,
etc) are especially welcome.


For what level of coding experience is the mag intended?
--------------------------------------------------------
The magazine is intended to appeal to asm coders of all levels. Each issue
will contain mostly beginner and intermediate level code/techniques, as these
will by nature be of the greatest demand; however one of the goals of APJ is
to include enough advanced material to make the magazine appeal to "pros" as
well.


How will the mag be distributed?
--------------------------------
Assembly Programming Journal has its own web page at
http://asmjournal.freeservers.com
which will contain the current issue and an archive of previous issues. The
page also contains a guestbook and a disucssion board for article writers and
readers.

An email subscription may be obtained by sending an email to
asmjournal@mailcity.com
with the subject "SUBSCRIBE"; starting with the next issue, Assembly
Programming Journal will be emailed to the address you sent the mail from.


Wrap-up
-------
That's the bulk of the "faq". Enjoy the mag!


::/ \::::::.
:/___\:::::::.
/| \::::::::.
:| _/\:::::::::.
:| _|\ \::::::::::.
:::\_____\:::::::::::...........................................FEATURE.ARTICLE
VGA Programming in Mode 13h
by Lord Lucifer


This article will describe how to program VGA graphics Mode 13h using assembly
language. Mode 13h is the 320x200x256 graphics mode, and is fast and very
convenient from a programmer's perspective.

The video buffer begins at address A000:0000 and ends at address A000:F9FF.
This means the buffer is 64000 bytes long and that each pixel in mode 13h is
represented by one byte.

It is easy to set up mode 13h and the video buffer in assembly language:

mov ax,0013h ; Int 10 - Video BIOS Services
int 10h ; ah = 00 - Set Video Mode
; al = 13 - Mode 13h (320x200x256)

mov ax,0A000h ; point segment register es to A000h
mov es,ax ; we can now access the video buffer as
; offsets from register es

At the end of your program, you will probably want to restore the text mode.
Here's how:

mov ax,0003h ; Int 10 - Video BIOS Services
int 10h ; ah = 00 - Set Video Mode
; al = 03 - Mode 03h (80x25x16 text)

Accessing a specific pixel int the buffer is also very easy:

; bx = x coordinate
; ax = y coordinate
mul 320 ; multiply y coord by 320 to get row
add ax,bx ; add this with the x coord to get offset

mov cx,es:[ax] ; now pixel x,y can be accessed as es:[ax]

Hmm... That was easy, but that multiplication is slow and we should get rid of
it. That's easy to do too, simply by using bit shifting instead of multiplica-
tion. Shifting a number to the left is the same as multiplying by 2. We want to
multiply by 320, which is not a multiple of 2, but 320 = 256 + 64, and 256 and
64 are both even multiples of 2. So a faster way to access a pixel is:

; bx = x coordinate
; ax = y coordinate
mov cx,bx ; copy bx to cx, to save it temporatily
shl cx,8 ; shift left by 8, which is the same as
; multiplying by 2^8 = 256
shl bx,6 ; now shift left by 6, which is the same as
; multiplying by 2^6 = 64
add bx,cx ; now add those two together, whis is
; effectively multiplying by 320
add ax,bx ; finally add the x coord to this value
mov cx,es:[ax] ; now pixel x,y can be accessed as es:[ax]

Well, the code is a little bit longer and looks more complicated, but I can
guarantee it's much faster.

To plot colors, we use a color look-up table. This look-up table is a 768
(3x256) array. Each index of the table is really the offset index*3. The 3
bytes at each index hold the corresponding values (0-63) of the red, green,
and blue components. This gives a total of 262144 total possible colors.
However, since the table is only 256 elements big, only 256 different colors
are possible at a given time.

Changing the color palette is accomplished through the use of the I/O ports of
the VGA card:

Port 03C7h is the Palette Register Read port.
Port 03C8h is the Palette Register Write port
Port 03C9h is the Palette Data port

Here is how to change the color palette:

; ax = palette index
; bl = red component (0-63)
; cl = green component (0-63)
; dl = blue component (0-63)

mov dx,03C8h ; 03c8h = Palette Register Write port
out dx,ax ; choose index

mov dx,03C9h ; 03c8h = Palette Data port
out dx,al
mov bl,al ; set red value
out dx,al
mov cl,al ; set green value
out dx,al
mov dl,al ; set blue value

Thats all there is to it. Reading the color palette is similar:

; ax = palette index
; bl = red component (0-63)
; cl = green component (0-63)
; dl = blue component (0-63)

mov dx,03C7h ; 03c7h = Palette Register Read port
out dx,ax ; choose index

mov dx,03C9h ; 03c8h = Palette Data port
in al,dx
mov bl,al ; get red value
in al,dx
mov cl,al ; get green value
in al,dx
mov dl,al ; get blue value

Now all we need to know is how to plot a pixel of a certain color at a certain
location. Its very easy, given what we already know:

; bx = x coordinate
; ax = y coordinate
; dx = color (0-255)
mov cx,bx ; copy bx to cx, to save it temporatily
shl cx,8 ; shift left by 8, which is the same as
; multiplying by 2^8 = 256
shl bx,6 ; now shift left by 6, which is the same as
; multiplying by 2^6 = 64
add bx,cx ; now add those two together, whis is
; effectively multiplying by 320
add ax,bx ; finally add the x coord to this value
mov es:[ax],dx ; copy color dx into memory location
; thats all there is to it

Ok, we now know how to set up Mode 13h, set up the video buffer, plot a pixel,
and edit the color palette.

My next article will go on to show how to draw lines, utilize the vertical
retrace for smoother rendering, and anything else I can figure out by that
time...


::/ \::::::.
:/___\:::::::.
/| \::::::::.
:| _/\:::::::::.
:| _|\ \::::::::::.
:::\_____\:::::::::::...........................................FEATURE.ARTICLE
SMC Techniques: The Basics
by mammon_


One of the benefits of coding in assembly language is that you have the option
to be as tricky as you like: the binary gymnastics of viral code demonstrate
this above all else. One of the viral "tricks" that has made its way into
standard protection schemes is SMC: self-modifying code.

In this article I will not be discussing polymorphic viruses or mutation
engines; I will not go into any specific software protection scheme, or cover
any anti-debugger/anti-disassembler tricks, or even touch on the matter of the
PIQ. This is intended to be a simple primer on self-modifying code, for those
new to the concept and/or implementation.


Episode 1: Opcode Alteration
----------------------------
One of the purest forms of self-modifying code is to change the value of an
instruction before it is executed...sometimes as the result of a comparison,
and sometimes to hide the code from prying eyes. This technique essentially
has the following pattern:
mov reg1, code-to-write
mov [addr-to-write-to], reg1
where 'reg1' would be any register, and where '[addr-to-write-to]' would be a
pointer to the address to be changed. Note that 'code-to-write- would ideally
be an instruction in hexadecimal format, but by placing the code elsewhere in
the program--in an uncalled subroutine, or in a different segment--it is
possible to simply transfer the compiled code from one location to another via
indirect addressing, as follows:
call changer
mov dx, offset [string] ;this will be performed but ignored
label: mov ah, 09 ;this will never be perfomed
int 21h ;this will exit the program
....
changer: mov di, offset to_write ;load address of code-to-write in DI
mov byte ptr [label], [di] ;write code to location 'label:'
ret ;return from call
to_write: mov ah, 4Ch ;terminate to DOS function

this small routine will cause the program to exit, though in a disassembler it
at first appears to be a simple print string routine. Note that by combining
indirect addressing with loops, entire subroutines--even programs--can be
overwritten, and the code to be written--which may be stored in the program as
data--can be encrypted with a simple XOR to disguise it from a disassembler.

The following is a complete asm program to demonstrate patching "live" code;
it asks the user for a password, then changes the string to be printed
depending on whether or not the password is correct:
; smc1.asm ==================================================================
.286
.model small
.stack 200h
.DATA
;buffer for Keyboard Input, formatted for easy reference:
MaxKbLength db 05h
KbLength db 00h
KbBuffer dd 00h

;strings: note the password is not encrypted, though it should be...
szGuessIt db 'Care to guess the super-secret password?',0Dh,0Ah,'$'
szString1 db 'Congratulations! You solved it!',0Dh,0Ah, '$'
szString2 db 'Ah, damn, too bad eh?',0Dh,0Ah,'$'
secret_word db "this"

.CODE
;===========================================
start:
mov ax,@data ; set segment registers
mov ds, ax ; same as "assume" directive
mov es, ax
call Query ; prompt user for password
mov ah, 0Ah ; DOS 'Get Keyboard Input' function
mov dx, offset MaxKbLength ; start of buffer
int 21h
call Compare ; compare passwords and patch
exit:
mov ah,4ch ; 'Terminate to DOS' function
int 21h
;===========================================
Query proc
mov dx, offset szGuessIt ; Prompt string
mov ah, 09h ; 'Display String' function
int 21h
ret
Query endp
;===========================================
Reply proc
PatchSpot:
mov dx, offset szString2 ; 'You failed' string
mov ah, 09h ; 'Display String' function
int 21h
ret
Reply endp
;===========================================
Compare proc
mov cx, 4 ; # of bytes in password
mov si, offset KbBuffer ; start of password-input in Buffer
mov di, offset secret_word ; location of real password
rep cmpsb ; compare them
or cx, cx ; are they equal?
jnz bad_guess ; nope, do not patch
mov word ptr cs:PatchSpot[1], offset szString1 ;patch to GoodString
bad_guess:
call Reply ; output string to display result
ret
Compare endp
end start
; EOF =======================================================================


Episode 2: Encryption
---------------------
Encryption is undoubtedly the most common form of SMC code used today. It is
used by packers and exe-encryptors to either compress or hide code, by viruses
to disguise their contents, by protection schemes to hide data. The basic
format of encryption SMC would be:
mov reg1, addr-to-write-to
mov reg2, [reg1]
manipulate reg2
mov [reg1], reg2
where 'reg1' would be a register containing the address (offset) of the
location to write to, and reg2 would be a temporary register which loads the
contents of the first and then modifies them via mathematical (ROL) or logical
(XOR) operations. The address to be patched is stored in reg1, its contents
modified within reg2, and then written back to the original location still
stored in reg1.

The program given in the preceding section can be modified so that it
unencrypts the password by overwriting it (so that it remains unencrypted
until the program is terminated) by first changing the 'secret_word' value as
follows:
secret_word db 06Ch, 04Dh, 082h, 0D0h

and then by changing the 'Compare' routine to patch the 'secret_word' location
in the data segment:
;===========================================
magic_key db 18h, 25h, 0EBh, 0A3h ;not very secure!

Compare proc ;Step 1: Unencrypt password
mov al, [magic_key] ; put byte1 of XOR mask in al
mov bl, [secret_word] ; put byte1 of password in bl
xor al, bl
mov byte ptr secret_word, al ; patch byte1 of password
mov al, [magic_key+1] ; put byte2 of XOR mask in al
mov bl, [secret_word+1] ; put byte2 of password in bl
xor al, bl
mov byte ptr secret_word[1], al ; patch byte2 of password
mov al, [magic_key+2] ; put byte3 of XOR mask in al
mov bl, [secret_word+2] ; put byte3 of password in bl
xor al, bl
mov byte ptr secret_word[2], al ; patch byte3 of password
mov al, [magic_key+3] ; put byte4 of XOR mask in al
mov bl, [secret_word+3] ; put byte4 of password in bl
xor al, bl
mov byte ptr secret_word[3], al ; patch byte4 of password
mov cx, 4 ;Step 2: Compare Passwords...no changes from here
mov si,offset KbBuffer
mov di, offset secret_word
rep cmpsb
or cx, cx
jnz bad_guess
mov word ptr cs:PatchSpot[1], offset szString1
bad_guess:
call Reply
ret
Compare endp

Note the addition of the 'magic_key' location which contains the XOR mask for
the password. This whole thing could have been made more sophisticated with a
loop, but with only four bytes the above speeds debugging time (and, thereby,
article-writing time). Note how the password is loaded, XORed, and re-written
one byte at a time; using 32-bit code, the whole (dword) password could be
written, XORed and an re-written at once.


Episode 3. Fooling with the stack
---------------------------------
This is a trick I learned while decompiling some of SunTzu's code. What
happens here is pretty interesting: the stack is moved into the code segment
of the program, such that the top of the stack is set to the first address to
be patched (which, BTW, should be the one closest to the end of the program
due to the way the stack works); the byte at this address is the POPed into a
register, manipulated, and PUSHed back to its original location. The stack
pointer (SP) is then decremented so that the next address to be patched (i
byte lower in memory) is now at the top of the stack.

In addition, the bytes are being XORed with a portion of the program's own
code, which disguises somewhat the actual value of the XOR mask. In the
following code, I chose to use the bytes from Start: (200h when compiled)
up to --but not including-- Exit: (214h when compiled; Exit-1 = 213h).
However, as with SunTzu's original code I kept the "reverse" sequence of the
XOR mask such that byte 213h is the first byte of the XOR mask, and byte 200h
is the last. After some experimentation I found this was the easiest way to
sync a patch program--or a hex editor--to the stack-manipulative code; since
the stack moves backwards (a forward-moving stack is more trouble than it is
worth), using a "reverse" XOR mask allows both filepointers in a patcher to be
INCed or DECed in sync.

Why is this an issue? Unlike the previous two examples, the following does not
contain the encrypted version of the code-to-be-patched. It simply contains
the source code which, when compiled, results in the unencrypted bytes which
are then run through the XOR routine, encrypted, and then executed (which, if
you have followed thus far, will immediately demonstrate to be no good...
though it is a fantastic way of crashing the DOS VM!).

Once the program is compiled you must either patch the bytes-to-be-decrypted
manually, or write a patcher to do the job for you. The former is more
expedient, the latter is more certain and is a must if you plan on maintaining
the code. In the following example I have embedded 2 CCh's (Int3) in the code
at the fore and aft end of the bytes-to-be-decrypted section; a patcher need
simply search for these, count the bytes in between, and then XOR with the
bytes between 200-213h.

Once again, this sample is a continuation of the previous example. In it, I
have written a routine to decrypt the entire 'Compare' routine of the previous
section by XORing it with the bytes between 'Start' and 'Exit'. This is
accomplished by seeting the stack segment equal to the code segment, then
setting the stack pointer equal to the end (highest) address of the code to be
modified. A byte is POPed from the stack (i.e. it's original location), XORed,
and PUSHed back to its original location. The next byte is loaded by
decrementing the stack pointer. Once all of the code it decrypted, control is
returned to the newly-decrypted 'Compare' routine and normal execution
resumes.

;===========================================
magic_key db 18h, 25h, 0EBh, 0A3h

Compare proc
mov cx, offset EndPatch[1] ;start addr-to-write-to + 1
sub cx, offset patch_pwd ;end addr-to-write-to
mov ax, cs
mov dx, ss ;save stack segment--important!
mov ss, ax ;set stack segment to code segment
mov bx, sp ;save stack pointer
mov sp, offset EndPatch ;start addr-to-write-to
mov si, offset Exit-1 ;start sddr of XOR mask
XorLoop:
pop ax ;get byte-to-patch into AL
xor al, [si] ;XOR al with XorMask
push ax ;write byte-to-patch back to memory
dec sp ;load next byte-to-patch
dec si ;load next byte of XOR mask
cmp si, offset Start ;end sddr of XOR mask
jae GoLoop ;if not at end of mask, keep going
mov si, offset Exit-1 ;start XOR mask over
GoLoop:
loop XorLoop ;XOR next byte
mov sp, bx ;restore stack pointer
mov ss, dx ;restore stack segment
jmp patch_pwd
db 0CCh,0CCh ;Identifcation mark: START
patch_pwd: ;no changes from here
mov al, [magic_key]
mov bl, [secret_word]
xor al, bl
mov byte ptr secret_word, al
mov al, [magic_key+1]
mov bl, [secret_word+1]
xor al, bl
mov byte ptr secret_word[1], al
mov al, [magic_key+2]
mov bl, [secret_word+2]
xor al, bl
mov byte ptr secret_word[2], al
mov al, [magic_key+3]
mov bl, [secret_word+3]
xor al, bl
mov byte ptr secret_word[3], al
;compare password
mov cx, 4
mov si, offset KbBuffer
mov di, offset secret_word
rep cmpsb
or cx, cx
jnz bad_guess
mov word ptr cs:PatchSpot[1], offset szString1
bad_guess:
call Reply
ret
Compare endp
EndPatch:
db 0CCh, 0CCh ;Identification Mark: END

This kind of program is very hard to debug. For testing, I substituted 'xor
al, [si]' first with 'xor al, 00h', which would cause no encryption and is
useful for testing code for final bugs, and then with 'xor al, EBh', which
allowed me to verify that the correct bytes were being encrypted (it never
hurts to check, after all).


Episode 4: Summation
--------------------
That should demonstrate the basics of self-modifying code. There are a few
techniques to consider to make development easier, though really any SMC
programs will be tricky.

The most important thing is to get your program running completely before you
start overwriting any of its code segments. Next, always create a program that
performs the reverse of any decryption/encryption code--not only does this
speed up comilation and testing by automating the encryption of code areas
that will be decrypted at runtime, it also provides a good tool for error
checking using a disassembler (i.e. encrypt the code, disassemble, decrypt the
code, disassemble, compare). In fact, it is a good idea to encapsulate the SMC
portion of your program in a separate executable and test it on the compiled
"release product" until all of the bugs are out of the decryption routine, and
only then add the decryption routine to your final code. The CCh 'landmarks'
(codemarks?) are extremely useful as well.

Finally, do your debugging with debug.com for DOS applications--the debugger
is quick, small, and if it crashes you simply lose a Windows DOS box. The
ability to view the program address space after the program has terminated but
before it is unloaded is another distinct advantage.

More complex examples of SMC programs can be found in Dark Angel's code, the
Rhince engine, or in any of the permutation engines used in ploymorphic
viruses. Acknowledgements go to Sun-Tzu for the stack technique used in his
ghf-crackme program.


::/ \::::::.
:/___\:::::::.
/| \::::::::.
:| _/\:::::::::.
:| _|\ \::::::::::.
:::\_____\:::::::::::...........................................FEATURE.ARTICLE
Going Ring0 in Windows 9x
by Halvar Flake


This article gives a short overview over two ways to go Ring0 in Windows 9x in
an undocumented way, exploiting the fact that none of the important system
tables in Win9x are on pages which are protected from low-privilege access.

A basic knowledge of Protected Mode and OS Internals are required, refer to
your Assembly Book for that :-) The techniques presented here are in no way a
good/clean way to get to a higher privilege level, but since they require only
a minimal coding effort, they are sometimes more desirable to implement than a
full-fledged VxD.

1. Introduction
---------------
Under all modern Operating Systems, the CPU runs in protected mode, taking
advantage of the special features of this mode to implementvirtual memory,
multitasking etc. To manage access to system-critical resources (and to thus
provide stability) a OS is in need of privilege levels, so that a program can't
just switch out of protected mode etc. These privilege levels are represented
on the x86 (I refer to x86 meaning 386 and following) CPU by 'Rings', with
Ring0 being the most privileged and Ring3 being the least privileged level.
Theoretically, the x86 is capable of 4 privilege levels, but Win32 uses only
two of them, Ring0 as 'Kernel Mode' and Ring3 as 'User Mode'.

Since Ring0 is not needed by 99% of all applications, the only documented way
to use Ring0 routines in Win9x is through VxDs. But VxDs, while being the only
stable and recommended way, are work to write and big, so in a couple of
specialized situations, other ways to go Ring0 are useful.

The CPU itself handles privilege level transitions in two ways: Through
Exceptions/Interrupts and through Callgates. Callgates can be put in the LDT or
GDT, Interrupt-Gates are found in the IDT.

We'll take advantage of the fact that these tables can be freely written to
from Ring3 in Win9x (NOT IN NT !).


2. The IDT method
-----------------
If an exception occurs (or is triggered), the CPU looks in the IDT to the
corresponding descriptor. This descriptor gives the CPU an Address and Segment
to transfer control to. An Interrupt Gate descriptor looks like this:

--------------------------------- ---------------------------------
D D
1.Offset (16-31) P P P 0 1 1 1 0 0 0 0 R R R R R +4
L L
--------------------------------- ---------------------------------
2.Segment Selector 3.Offset (0-15) 0
--------------------------------- ---------------------------------
DPL == Two bits containing the Descriptor Privilege Level
P == Present bit
R == Reserved bits

The first word (Nr.3) contains the lower word of the 32-bit address of the
Exception Handler. The word at +6 contains the high-order word. The word at +2
is the selector of the segment in which the handler resides.

The word at +4 identifies the descriptor as Interrupt Gate, contains its
privilege and the present bit. Now, to use the IDT to go Ring0, we'll create a
new Interrupt Gate which points to our Ring0 procedure, save an old one and
replace it with ours.

Then we'll trigger that exception. Instead of passing control to Window's own
handler, the CPU will now execute our Ring0 code. As soon as we're done, we'll
restore the old Interrupt Gate.

In Win9x, the selector 0028h always points to a Ring0-Code Segment, which spans
the entire 4 GB address range. We'll use this as our Segment selector.

The DPL has to be 3, as we're calling from Ring3, and the present bit must be
set. So the word at +4 will be 1110111000000000b => EE00h. These values can
be hardcoded into our program, we have to just add the offset of our Ring0
Procedure to the descriptor. As exception, you should preferrably use one that
rarely occurs, so do not use int 14h ;-)

I'll use int 9h, since it is (to my knowledge) not used on 486+.

Example code follows (to be compiled with TASM 5):

-------------------------------- bite here -----------------------------------

.386P
LOCALS
JUMPS
.MODEL FLAT, STDCALL

EXTRN ExitProcess : PROC

.data

IDTR df 0 ; This will receive the contents of the IDTR
; register

SavedGate dq 0 ; We save the gate we replace in here

OurGate dw 0 ; Offset low-order word
dw 028h ; Segment selector
dw 0EE00h ;
dw 0 ; Offset high-order word



.code

Start:
mov eax, offset Ring0Proc
mov [OurGate], ax ; Put the offset words
shr eax, 16 ; into our descriptor
mov [OurGate+6], ax

sidt fword ptr IDTR
mov ebx, dword ptr [IDTR+2] ; load IDT Base Address
add ebx, 8*9 ; Address of int9 descriptor in ebx

mov edi, offset SavedGate
mov esi, ebx
movsd ; Save the old descriptor
movsd ; into SavedGate

mov edi, ebx
mov esi, offset OurGate
movsd ; Replace the old handler
movsd ; with our new one

int 9h ; Trigger the exception, thus
; passing control to our Ring0
; procedure

mov edi, ebx
mov esi, offset SavedGate
movsd ; Restore the old handler
movsd

call ExitProcess, LARGE -1

Ring0Proc PROC
mov eax, CR0
iretd
Ring0Proc ENDP

end Start

-------------------------------- bite here -----------------------------------


3. The LDT Method
-----------------
Another possibility of executing Ring0-Code is to install a so- called callgate
in either the GDT or LDT. Under Win9x it is a little bit easier to use the LDT,
since the first 16 descriptors in it are always empty, so I will only give
source for that method here.

A Callgate is similar to a Interrupt Gate and is used in order to transfer
control from a low-privileged segment to a high-privileged segment using a CALL
instruction.

The format of a callgate is:

--------------------------------- ---------------------------------
D D D D D D
1.Offset (16-31) P P P 0 1 1 0 0 0 0 0 0 W W W W +4
L L C C C C
--------------------------------- ---------------------------------
2.Segment Selector 3.Offset (0-15) 0
--------------------------------- ---------------------------------
P == Present bit
DPL == Descriptor Privilege Level
DWC == Dword Count, number of arguments copied to the ring0 stack

So all we have to do is to create such a callgate, write it into one of the
first 16 descriptors, then do a far call to that descriptor to execute our
Ring0 code.

Example Code:

-------------------------------- bite here -----------------------------------

.386P
LOCALS
JUMPS
.MODEL FLAT, STDCALL

EXTRN ExitProcess : PROC

.data

GDTR df 0 ; This will receive the contents of the IDTR
; register

CallPtr dd 00h ; As we're using the first descriptor (8) and
dw 0Fh ; its located in the LDT and the privilege level
; is 3, our selector will be 000Fh.
; That is because the low-order two bits of the
; selector are the privilege level, and the 3rd
; bit is set if the selector is in the LDT.

OurGate dw 0 ; Offset low-order word
dw 028h ; Segment selector
dw 0EC00h ;
dw 0 ; Offset high-order word

.code

Start:
mov eax, offset Ring0Proc
mov [OurGate], ax ; Put the offset words
shr eax, 16 ; into our descriptor
mov [OurGate+6], ax

xor eax, eax

sgdt fword ptr GDTR
mov ebx, dword ptr [GDTR+2] ; load GDT Base Address
sldt ax
add ebx, eax ; Address of the LDT descriptor in
; ebx
mov al, [ebx+4] ; Load the base address
mov ah, [ebx+7] ; of the LDT itself into
shl eax, 16 ; eax, refer to your pmode
mov ax, [ebx+2] ; manual for details

add eax, 8 ; Skip NULL Descriptor

mov edi, eax
mov esi, offset OurGate
movsd ; Move our custom callgate
movsd ; into the LDT

call fword ptr [CallPtr] ; Execute the Ring0 Procedure

xor eax, eax ; Clean up the LDT
sub edi, 8
stosd
stosd

call ExitProcess, LARGE -1

Ring0Proc PROC
mov eax, CR0
retf
Ring0Proc ENDP

end Start

-------------------------------- bite here -----------------------------------

Well, that's all for now folks. This method can be easily changedto use the GDT
instead which would save a few bytes in case you have to optimize heavily.

Anyways, do use these methods with care, they will NOT run on NT and are
generally not exactly a clean or stable way to do these things.


Credits & Thanks
----------------
The IDT-Method taken from the CIH virus & Stone's example source at
http://www.cracking.net.
The LDT-Method was done by me, but without IceMans & The_Owls help I would
still be stuck, so all credits go to them.


::/ \::::::.
:/___\:::::::.
/| \::::::::.
:| _/\:::::::::.
:| _|\ \::::::::::.
:::\_____\:::::::::::................................WIN32.ASSEMBLY.PROGRAMMING
Win32 ASM: The Basics
by Iczelion


The required tools:
-Microsoft Macro Assembler 6.1x : MASM support of Win32 programming
starts from version 6.1. The latest version is 6.13 which
is a patch to previous version of 6.11. Win98 DDK includes MASM
6.11d which you can download from Microsoft at
http://www.microsoft.com/hwdev/ddk/download/win98ddk.exe
But be warned, this monstrosity is huge, 18.5 MB in size. MASM 6.13
patch can also be downloaded from
ftp://ftp.microsoft.com/softlib/mslfiles/ml613.exe
-Microsoft import libraries : You can use the import libraries from
Visual C++. Some are included in Win98 DDK.
-Win32 API Reference : You can download it from Borland's site:
ftp://ftp.borland.com/pub/delphi/techpubs/delphi2/win32.zip

Here's a brief description of the assembly process.

MASM 6.1x comes with two essential tools: ml.exe and link.exe. ml.exe is the
assembler. It takes in the assembly source code (.asm) and produces an object
file (.obj) . An object file is an intermediate file between the source code
and the executable file. It needs some address fixups which are the services
provided by link.exe. Link.exe makes an object file into an executable file by
several means such as adding the codes from other modules to the object files
or providing the address fixups, addingr esouces, etc.

For example:
ml skeleton.asm ---> this produces skeleton.obj
link skeleton.obj ---> this produces skeleton.exe

The above lines are simplification of course. In the real world, you must add
several switches to ml.exe and link.exe to customize your application. Also
there will be several files you must link with the object file in order to
create your application.

Win32 programs run in protected mode which is available since 80286. But 80286
is now history. So we only have to concern ourselves with 80386 and its
descendants. Windows run each Win32 program in separated virtual space. That
means each Win32 program will have its own 4 GB address space. Each program is
alone in its address space. This is in contrast to the situation in Win16. All
Win16 programs can *see* each other. Not so in Win32. This feature helps reduce
the chance of one program writing over other program's code/data.

Memory model is also drastically different from the old days of the 16-bit
world. Under Win32, we need not be concerned with memory model or segment
anymore! There's only one memory model: Flat memory model. There's no more 64K
segments. The memory is a large continuous space of 4 GB. That also means you
don't have to play with segment registers. You can use any segment register to
address any point in the memory space. That's a GREAT help to programmers. This
is what makes Win32 assembly programming as easy as C.

We will examine a miminal skeleton of a Win32 assembly program. We'll add more
flesh to it later. Here's the skeleton program. If you don't understand some of
the codes, don't panic. I'll explain each of them later.

.386
.MODEL Flat, STDCALL
.DATA

......
.DATA?
<Your uninitialized data>
......
.CONST

......
.CODE