; AMXJITSN.ASM: Just-In-Time compiler for the Abstract Machine of the "Pawn" ; scripting language ; (C) 1999-2000, Marc Peter; beta version; provided AS IS WITHOUT ANY WARRANTIES ; I reached >155 million instr./sec on my AMD K6-2/366 with the Hanoi "bench" ; (27 disks, no output, DOS4/GW under Win95) with this implementation of the ; JIT compiler. ; NOTE 1: ; There is only one pass implemented in this version. This means there is no ; way of knowing the size of the compiled code before it has actually been com- ; piled. So the only chance the caller has, is to count the number of opcodes ; (in amx_BrowseRelocate()) and multiply this count with a "safe" factor to ; obtain a size value big enough to hold the entire code (and data, including ; the stack and heap, after adding their sizes). Afterwards it can realloc() ; this memory block to the actually needed smaller size. ; NOTE 2: ; The compiler destroys the opcode addresses of the given source by storing the ; respective compiled code's addresses there for the final address relocation ; step. ; NOTE 3: ; Although the Pawn compiler doesn't generate the LCTRL, SCTRL and CALL.I ; instructions, I have to tell that they don't work as expected in a JIT ; compiled program, because there is no easy way of transforming AMX code ; addresses and JIT translated ones. This might be fixed in a future version. ; NOTE 4: ; Stack Pointer issues (by David Anderson) ; The JIT was changed recently so it no longer uses ESP as a general purpose ; register (GRP), because it can conflict with threading/signal systems which ; rely on the stack pointer being in-tact to find thread-ids. My fix for this ; was to keep esp safe, but save the stack pointer in 'ecx'. As such, ecx is no ; longer the CIP or scratch register, it is the save point for pieces of the AMX ; structure on the x86 stack. ; This means that the optimization of the JIT has changed, as every amx stack ; push call now takes two operations instead of one (same for pop), and pushing ; addresses is 4 instructions instead of 1. ; As of this moment I don't see a better way around it, but the sacrifice for ; having pthread-safe code was deemed to be necessary. ; NOTE 5: ; NX ("No eXecute") and XD (eXecution Denied) bits ; (by Thiadmer Riemersma) ; ; AMD defined a bit "No eXecute" for the page table entries (for its 64-bit ; processors) and Intel came with the same design, but calling it differently. ; The purpose is to make "buffer overrun" security holes impossible, or at least ; very, very difficult, by marking the stack and the heap as memory regions ; such that an attempt to execute processor instructions will cause a processor ; exception (of course, a buffer overrun that is not explictly handled will then ; crash the application --instead of executing the rogue code). ; ; For JIT compilers, this has the impact that you are not allowed to execute the ; code that the JIT has generated. To do that, you must adjust the attributes ; for the memory page. For Microsoft Windows, you can use VirtualAlloc() to ; allocate a memory block with the appropriate fags; on Linux (with a recent ; kernel), you would use vmalloc_exec(). Microsoft Windows also offers the ; function VirtualProtect() to change the page attributes of an existing memory ; block, but there are caveats in its use: if the block spans multiple pages, ; these pages must be consecutive, and if there are blocks of memory in a page ; unrelated to the JIT, their page attributes will change too. ; ; The JIT compiler itself requires only read-write access (this is the default ; for a memory block that you allocate). The execution of the JIT-compiled code ; requires full access to the memory block: read, write and execute. It needs ; write access, because the SYSREQ.C opcode is patched to SYSREQ.D after the ; first lookup (this is an optimization, look up the address of the native ; function only once). For processors that do not support the NX/XD bit, ; execution of code is implicitly supported if read access is supported. ; ; During compilation, the JIT compiler requires write-access to its own code ; segment: the JIT-compiler patches P-code parameters into its own code segment ; during compilation. This is handled in the support code for amx_InitJIT. ; ; NOTE 6: ; CALLING CONVENTIONS ; (by Thiadmer Riemersma) ; ; This version is the JIT that uses the "stack calling convention". In the ; original implementation, this meant __cdecl; both for the calling convention ; for the _asm_runJIT routine itself as for the callback functions. ; The current release supports __stdcall for the callback functions; to ; use it, you need to assemble the file with STDECL defined (Since STDCALL is ; a reserved word on the assembler, I had to choose a different name for the ; macro, hence STDECL.) ; Revision History ; ---------------- ; 16 august 2005 by David "BAILOPAN" Anderson (DA) ; Changed JIT to not swap stack pointer during execution. This ; is playing with fire, especially with pthreads and signals on linux, ; where the stack pointer is used to find the current thread id. If ; the stack pointer is altered during a thread/signal switch/interrupt ; unexpected behaviour can occur (crashes). ; 26 july 2005 by David "BAILOPAN" Anderson (DA) ; Fixed a bug where zero casetbl entries would crash the JIT. ; 17 february 2005 by Thiadmer Riemersms ; Addition of the BREAK opcode, removal of the older debugging opcode ; table. There should now be some debug support (if enabled during the ; build of the JIT compiler), but not enough to run a debugger: the JIT ; compiler does not keep a list that relates the code addresses of the ; P-code versus the native code. ; 29 June 2004 by G.W.M. Vissers ; Translated the thing into NASM. The actual generation of the code is ; put into the data section because the code modifies itself whereas the ; text section is usually read-only in the Unix ELF format. ; 6 march 2004 by Thiadmer Riemersma ; Corrected a bug in OP_FILL, where a cell preceding the array would ; be overwritten (zero'ed out). This bug was brought to my attention ; by Robert Daniels. ; 22 december 2003 by Thiadmer Riemersma (TR) ; Added the SYMTAG and SYSCALL.D opcodes (these are not really supported; ; SYMTAG is a no-op). ; Support __stdcall calling convention for for the native function "hook" ; function (the __cdecl calling convention is also still supported). ; 14 October 2002 by Thiadmer Riemersma (TR) ; Corrected the amx_s structure. The _hlw field was missing, which caused ; errors for arguments to native functions that were passed by reference. ; 2002/08/05 TR ; * store the status of the abstract machine in the AMX structure upon ; return, so that the machine can be restarted (OP_SLEEP) ; * added OP_NOP (for alignment, it is ignored by the JIT) ; * make sure the JIT does not crash when we NULL is passed for the ; return value ; 2000/03/03 MP ; * _amx_opcodelist is equipped with an underscore, again 8-P ; * added SRANGE as a no-op, so debugging info doesn't upset the JIT ; compiler anymore ; * added note about LCTRL, SCTRL and CALL.I ; 2000/03/02 MP ; * made JIT support __cdecl calling conventions ; * removed some rather unnecessary pops in the epilog of amx_exec_asm ; * changed the template for CALL into a DB byte sequence (tasm 4.1 ; didn't like the immediate value) ; 1999/12/07 MP ; * fixed crash caused by JIT compiler not saving registers ; 1999/08/06 MP - design change: closer to the "iron" with native stack ; * The JIT compiler now generates relocatable code for case tables by ; setting FORCERELOCATABLE = 1. ; * removed all debug hook code ; * exchanged meaning of ESP and ESI in asm_exec(): now low-level calls/ ; pushes/pops are possible ; * removed the run-time functions for the CALL, CALL_I and RET op-codes, ; they are now inline ; * All these changes gained around 80% performance increase for the ; hanoi bench. ; 1999/08/05 MP ; * fixed OP_LINE in the case of NODBGCALLS==1, where no compiled address ; was stored for the LINE byte code (i.e. SWITCH would jump to totally ; wrong addresses). The same fix was applied to OP_FILL, OP_FILE and ; OP_SCTRL (for the no-op case). ; 1999/08/04 MP ; * updated with 4 new opcodes (SRANGE does nothing at the moment; 2dim. ; arrays have not been tested.) ; * hacked relocation code to support absoulute addresses for CASETBL ; (This assumes that no generated address will be greater than ; 0x7fffffff. Bit no. 31 is used as flag for absolute addresses.) ; * The run-time function for SWITCH uses a (hopefully) faster algorithm ; to compute the destination address: It searches backwards now. ; 1999/07/08 MP - initial revision ; ; Support for the BREAK opcode (callback to the debugger): 0 = no, all other ; values = yes. Beware that the compiled code runs slower when this is enabled, ; and that debug support is still fairly minimal. ; ; GWMV: to generate LINE opcode, %define DEBUGSUPPORT ; %undef DEBUGSUPPORT ; ; If this is set to 1 the JIT generates relocatable code for case tables, too. ; If set to 0, a faster variant for switch (using absolute addresses) is ; generated. I consider setting it to 0 a bad idea. ; ; GWMV: to use absolute addresses, %undef FORCERELOCATABLE ; %define FORCERELOCATABLE ; ; This variable controls the generation of memory range checks at run-time. ; You should set this to 0, only when you are sure that there are no range ; violations in your Pawn programs and you really need those 5% speed gain. ; ; GWMV: To disable runtime checks, %undef it, instread of setting it to zero ; %define DORUNTIMECHECKS %define JIT 1 %include "amxdefn.asm" ;Registers used for JIT during execution: ; eax - pri ; ebx - reloc frame ; ecx - info params ; edx - alt ; esi - AMX stack ; edi - DAT ; ebp - scratch ;DA: ; These are still stored in the stack, but the stack pointer ; holding them is now kept in ecx. %define stk [ecx+32] ; define some aliases to registers that will %define alt [ecx+28] ; be stored on the stack when the code is %define pri [ecx+24] ; actually beeing executed %define code [ecx+20] %define amx [ecx+16] %define retval [ecx+12] %define stp [ecx+8] %define hea [ecx+4] %define frm [ecx] ; FRM is NOT stored in ebp, FRM+DAT is being held ; in ebx instead. ; ; #define PUSH(v) ( stk-=sizeof(cell), *(cell *)(data+(int)stk)=v ) ; %macro _PUSH 1 lea esi,[esi-4] mov dword [esi], %1 %endmacro %macro _PUSHMEM 1 lea esi,[esi-4] mov ebp, dword %1 mov dword [esi], ebp %endmacro ; ; #define POP(v) ( v=*(cell *)(data+(int)stk), stk+=sizeof(cell) ) ; %macro _POP 1 mov %1, dword [esi] lea esi,[esi+4] %endmacro ; ; For determining the biggest native code section generated for ONE Pawn ; opcode. (See the following macro and the PUBLIC function getMaxCodeSize().) ; ; GWMV: Do NOT see the following macro. See CHECKCODESIZE instead. ; %assign MAXCODESIZE 0 ; ; This is the work horse of the whole JIT: It actually copies the code. %macro GO_ON 2-3 4 mov esi, %1 ;get source address of JIT code mov ecx,%2-%1 ;get number of bytes to copy mov [ebx],edi ;store address for jump-correction add ebx,%3 rep movsb cmp ebx,[end_code] jae code_gen_done jmp dword [ebx] ;go on with the next opcode %endmacro ; GWMV: ; Nasm can't handle the determination of the maximum code size as was done ; in the Masm implementation, since it only does two passes. This macro is ; called *after* the code for each Pawn instruction. %macro CHECKCODESIZE 1 %if MAXCODESIZE < $-%1 %assign MAXCODESIZE $-%1 %endif %endmacro ; ; Modify the argument of an x86 instruction with the Pawn opcode's parameter ; before copying the code. ; %macro putval 1 mov eax,[ebx+4] mov dword [%1],eax %endmacro ; ; Add an entry to the table of addresses which have to be relocated after the ; code compilation is done. ; %macro RELOC 1-2 ; adr, dest mov ebp,[reloc_num] %if %0 < 2 mov eax,[ebx+4] %else lea eax,[%2] %endif mov [edx+ebp],eax ; write absolute destination lea eax,[edi+%1] mov [edx+ebp+4],eax ; write address of jump operand add dword [reloc_num],8 %endmacro %macro _DROPARGS 1 ; (TR) remove function arguments from the stack %ifndef STDECL ; (for __cdecl calling convention only) add esp,%1 %endif %endmacro %ifdef WIN32 section .data exec %else section .text %endif global asm_runJIT, _asm_runJIT global amx_exec_jit, _amx_exec_jit global getMaxCodeSize, _getMaxCodeSize ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; ; ; void asm_runJIT( AMX_HEADER *amxh, JumpAddressArray *jumps, void *dest ) ; ; eax edx ebx ; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; asm_runJIT() assumes that the code of this module is allready browsed and ; relocated for the JIT compiler. It also assumes that both the jumps array and ; the dest memory block are large enough to hold all the data it has to write ; to them, as well as that the prefix (header) has already been copied to dest. asm_runJIT: _asm_runJIT: push ebp push ebx push edi push esi mov eax,[esp+20] ; get amxh mov edx,[esp+24] ; get jumps array mov ebx,[esp+28] ; get destination mov [amxhead],eax ; save pointer to AMX_HEADER struct mov ecx,[eax+_cod] ; get offset of start of code mov eax,[eax+_dat] ; offset of start of data = end of code mov edi,ecx add ecx,[amxhead] ; compute the real pointer add eax,[amxhead] ; dito add edi,ebx ; get write pointer into EDI mov [compiled_code],ebx mov [end_code],eax ; Store end-of-code address, so JIT ; compiler knows when to stop. mov dword [reloc_num],0 ; init the index into the jumps array mov ebx,ecx jmp dword [ecx] ; start compiling ; The compiler will jump back here when code generation is complete. code_gen_done: ; Now copy the data section. mov ebp,[amxhead] ; get source AMX_HEADER start address add edi,3 ; DAT follows directly after COD and edi,0fffffffch ; align it on a DWORD boundary push edi ; save data start pointer mov esi,[end_code] ; get start of data segment mov ecx,[ebp+_h_hea] sub ecx,[ebp+_dat] ; compute length of array to copy rep movsb ; copy the data ; Now adjust the register values in the compiled AMX_HEADER. ; COD stays the same since the size of AMX_HEADER doesn't change in ; compiled mode. mov ebx,[compiled_code] ; get compiled AMX's header address pop esi ; recall data start pointer sub esi,ebx ; DAT = size of code + size of prefix mov [ebx+_dat],esi ; write corrected DAT register ;HEA and STP are already relative to DAT, so we don't need to fix them. ; Now the calls/jumps in the compiled code have to be relocated. sub ecx,ecx ; reset offset into relocation table cmp ecx,[reloc_num] jae reloc_code_done ; if there's nothing to fix, skip this part reloc_code_loop: mov eax,[edx+ecx] ; get destination address mov edi,[edx+ecx+4] ; determine where to write the relocated value add ecx,8 ; set pointer to next entry in relocation table add edi,4 ; base address from where the offset is taken %ifndef FORCERELOCATABLE ;MP: hack to suport absolute addresses for the CASETBL instruction test eax,80000000h ; check whether it is an absolute address pushf and eax,7fffffffh ; clear the flag bit for absolute addresses popf mov eax,[eax] ; translate into compiled absolute address jne write_reloc ; leave out the subtraction if absolute %else mov eax,[eax] ; translate into compiled absolute address %endif sub eax,edi ; make a relative offset write_reloc: mov [edi-4],eax ; write the relocated address cmp ecx,[reloc_num] jb reloc_code_loop reloc_code_done: ; Relocate the addresses in the AMX_HEADER structure. (CIP and publics) add ebp,[ebp+_cod] ; make all addresses relative to COD, not base mov eax,[ebx+_h_cip] add eax,ebp ; get absolute source CIP mov eax,[eax] ; translate CIP to compiled address sub eax,ebx ; make it relative to base sub eax,[ebx+_cod] ; and now relative to COD mov [ebx+_h_cip],eax; store relocated CIP mov edi,[ebx+_publics] sub esi,esi mov ecx,[ebx+_natives] sub ecx,edi ; ECX = _natives - _publics = public table size mov si,[ebx+_defsize] or ecx,ecx jz reloc_done ; If there are no publics, we are done. reloc_publics_loop: mov eax,[ebx+edi] ; get public function offset add eax,ebp ; make it a source address mov eax,[eax] ; translate to compiled address sub eax,ebx ; make it an offset relative to base sub eax,[ebx+_cod] ; and now relative to COD mov [ebx+edi],eax ; write corrected address back add edi,esi ; step to next public function entry sub ecx,esi ja reloc_publics_loop reloc_done: mov eax,0 pop esi pop edi pop ebx pop ebp ret ; GWMV: ; The code below modifies itself to store the arguments to the Pawn opcodes ; in the compiled code. This is fine, but the .text section in an ELF executable ; is usually marked read-only, that's why this code is in the .data section. %ifndef WIN32 section .data exec %endif OP_LOAD_PRI: ;nop; putval j_load_pri+2 GO_ON j_load_pri, OP_LOAD_ALT, 8 j_load_pri: mov eax,[edi+12345678h] CHECKCODESIZE j_load_pri OP_LOAD_ALT: ;nop; putval j_load_alt+2 GO_ON j_load_alt, OP_LOAD_S_PRI, 8 j_load_alt: mov edx,[edi+12345678h] CHECKCODESIZE j_load_alt ;good OP_LOAD_S_PRI: ;nop; putval j_load_s_pri+2 GO_ON j_load_s_pri, OP_LOAD_S_ALT, 8 j_load_s_pri: mov eax,[ebx+12345678h] CHECKCODESIZE j_load_s_pri ;good OP_LOAD_S_ALT: ;nop; putval j_load_s_alt+2 GO_ON j_load_s_alt, OP_LOAD_I, 8 j_load_s_alt: mov edx,[ebx+12345678h] CHECKCODESIZE j_load_s_alt OP_LOAD_I: ;nop; GO_ON j_load_i, OP_LODB_I j_load_i: %ifdef DORUNTIMECHECKS call [verify_adr_eax] %endif mov eax,[edi+eax] CHECKCODESIZE j_load_i OP_LODB_I: ;nop; mov eax,[ebx+4] mov eax,dword [(lodb_and-4)+eax*4] mov dword [j_lodb_i_sm+1],eax ;modify AND instruction GO_ON j_lodb_i, OP_LREF_PRI, 8 j_lodb_i: %ifdef DORUNTIMECHECKS call [verify_adr_eax] %endif mov eax,[edi+eax] ;subject to misalignment stalls j_lodb_i_sm: and eax,12345678h CHECKCODESIZE j_lodb_i OP_LREF_PRI: ;nop; putval j_lref_pri+2 GO_ON j_lref_pri, OP_LREF_ALT, 8 j_lref_pri: mov eax,[edi+12345678h] mov eax,[edi+eax] CHECKCODESIZE j_lref_pri OP_LREF_ALT: ;nop; putval j_lref_alt+2 GO_ON j_lref_alt, OP_LREF_S_PRI, 8 j_lref_alt: mov edx,[edi+12345678h] mov edx,[edi+edx] CHECKCODESIZE j_lref_alt OP_LREF_S_PRI: ;nop; putval j_lref_s_pri+2 GO_ON j_lref_s_pri, OP_LREF_S_ALT, 8 j_lref_s_pri: mov eax,[ebx+12345678h] mov eax,[edi+eax] CHECKCODESIZE j_lref_s_pri OP_LREF_S_ALT: ;nop; putval j_lref_s_alt+2 GO_ON j_lref_s_alt, OP_CONST_PRI, 8 j_lref_s_alt: mov edx,[ebx+12345678h] mov edx,[edi+edx] CHECKCODESIZE j_lref_s_alt ;good OP_CONST_PRI: ;nop; putval j_const_pri+1 GO_ON j_const_pri, OP_CONST_ALT, 8 j_const_pri: mov eax,12345678h CHECKCODESIZE j_const_pri ;good OP_CONST_ALT: ;nop; putval j_const_alt+1 GO_ON j_const_alt, OP_ADDR_PRI, 8 j_const_alt: mov edx,12345678h CHECKCODESIZE j_const_alt ;good OP_ADDR_PRI: ;nop; putval j_addr_pri+1 GO_ON j_addr_pri, OP_ADDR_ALT, 8 j_addr_pri: mov eax,12345678h add eax,frm CHECKCODESIZE j_addr_pri ;good OP_ADDR_ALT: ;nop; putval j_addr_alt+1 GO_ON j_addr_alt, OP_STOR_PRI, 8 j_addr_alt: mov edx,12345678h add edx,frm CHECKCODESIZE j_addr_alt OP_STOR_PRI: ;nop; putval j_stor_pri+2 GO_ON j_stor_pri, OP_STOR_ALT, 8 j_stor_pri: mov [edi+12345678h],eax CHECKCODESIZE j_stor_pri OP_STOR_ALT: ;nop; putval j_stor_alt+2 GO_ON j_stor_alt, OP_STOR_S_PRI, 8 j_stor_alt: mov [edi+12345678h],edx CHECKCODESIZE j_stor_alt ;good OP_STOR_S_PRI: ;nop; putval j_stor_s_pri+2 GO_ON j_stor_s_pri, OP_STOR_S_ALT, 8 j_stor_s_pri: mov [ebx+12345678h],eax CHECKCODESIZE j_stor_s_pri ;good OP_STOR_S_ALT: ;nop; putval j_stor_s_alt+2 GO_ON j_stor_s_alt, OP_STOR_I, 8 j_stor_s_alt: mov [ebx+12345678h],edx CHECKCODESIZE j_stor_s_alt ;good OP_STOR_I: ;nop; GO_ON j_stor_i, OP_STRB_I j_stor_i: %ifdef DORUNTIMECHECKS call [verify_adr_edx] %endif mov [edi+edx],eax CHECKCODESIZE j_stor_i OP_STRB_I: ;nop; mov eax,[ebx+4] cmp eax,1 jne strb_not1byte GO_ON j_strb_i_1b, strb_not1byte, 8 j_strb_i_1b: %ifdef DORUNTIMECHECKS call [verify_adr_edx] %endif mov [edi+edx],al CHECKCODESIZE j_strb_i_1b strb_not1byte: cmp eax,4 je strb_4byte GO_ON j_strb_i_2b, strb_4byte, 8 j_strb_i_2b: %ifdef DORUNTIMECHECKS call [verify_adr_edx] %endif mov [edi+edx],ax CHECKCODESIZE j_strb_i_2b strb_4byte: GO_ON j_strb_i_4b, OP_SREF_PRI, 8 j_strb_i_4b: %ifdef DORUNTIMECHECKS call [verify_adr_edx] %endif mov [edi+edx],eax CHECKCODESIZE j_strb_i_4b OP_SREF_PRI: ;nop; putval j_sref_pri+2 GO_ON j_sref_pri, OP_SREF_ALT, 8 j_sref_pri: mov ebp,[edi+12345678h] mov [edi+ebp],eax CHECKCODESIZE j_sref_pri OP_SREF_ALT: ;nop; putval j_sref_alt+2 GO_ON j_sref_alt, OP_SREF_S_PRI, 8 j_sref_alt: mov ebp,[edi+12345678h] mov [edi+ebp],edx CHECKCODESIZE j_sref_alt OP_SREF_S_PRI: ;nop; putval j_sref_s_pri+2 GO_ON j_sref_s_pri, OP_SREF_S_ALT, 8 j_sref_s_pri: mov ebp,[ebx+12345678h] mov [edi+ebp],eax CHECKCODESIZE j_sref_s_pri OP_SREF_S_ALT: ;nop; putval j_sref_s_alt+2 GO_ON j_sref_s_alt, OP_LIDX, 8 j_sref_s_alt: mov ebp,[ebx+12345678h] mov [edi+ebp],edx CHECKCODESIZE j_sref_s_alt ;good OP_LIDX: ;nop; GO_ON j_lidx, OP_LIDX_B j_lidx: lea eax,[edx+4*eax] %ifdef DORUNTIMECHECKS call [verify_adr_eax] %endif mov eax,[edi+eax] CHECKCODESIZE j_lidx OP_LIDX_B: ;nop; mov al,[ebx+4] mov byte [j_lidx_b+2],al GO_ON j_lidx_b, OP_IDXADDR, 8 j_lidx_b: shl eax,12h add eax,edx %ifdef DORUNTIMECHECKS call [verify_adr_eax] %endif mov eax,[edi+eax] CHECKCODESIZE j_lidx_b ;good OP_IDXADDR: ;nop; GO_ON j_idxaddr, OP_IDXADDR_B j_idxaddr: lea eax,[edx+4*eax] CHECKCODESIZE j_idxaddr OP_IDXADDR_B: ;nop; mov al,[ebx+4] mov byte [j_idxaddr_b+2],al GO_ON j_idxaddr_b, OP_ALIGN_PRI, 8 j_idxaddr_b: shl eax,12h add eax,edx CHECKCODESIZE j_idxaddr_b OP_ALIGN_PRI: ;nop; mov eax,4 sub eax,[ebx+4] mov dword [j_align_pri+1],eax GO_ON j_align_pri, OP_ALIGN_ALT, 8 j_align_pri: xor eax,12345678h CHECKCODESIZE j_align_pri OP_ALIGN_ALT: ;nop; mov eax,4 sub eax,[ebx+4] mov dword [j_align_alt+1],eax GO_ON j_align_alt, OP_LCTRL, 8 j_align_alt: xor edx,12345678h CHECKCODESIZE j_align_alt OP_LCTRL: ;nop; mov eax,[ebx+4] cmp eax,0 jne lctrl_1 GO_ON j_lctrl_0, lctrl_1, 8 j_lctrl_0: mov eax,code ; 1=COD CHECKCODESIZE j_lctrl_0 lctrl_1: cmp eax,1 jne lctrl_2 GO_ON j_lctrl_1, lctrl_2, 8 j_lctrl_1: mov eax,edi ; 1=DAT CHECKCODESIZE j_lctrl_1 lctrl_2: cmp eax,2 jne lctrl_3 GO_ON j_lctrl_2, lctrl_3, 8 j_lctrl_2: mov eax,hea ; 2=HEA CHECKCODESIZE j_lctrl_2 lctrl_3: cmp eax,3 jne lctrl_4 GO_ON j_lctrl_3, lctrl_4, 8 j_lctrl_3: mov ebp,amx mov eax,[ebp+_stp] CHECKCODESIZE j_lctrl_3 lctrl_4: cmp eax,4 jne lctrl_5 GO_ON j_lctrl_4, lctrl_5, 8 j_lctrl_4: mov eax,esi ; 4=STK sub eax,edi CHECKCODESIZE j_lctrl_4 lctrl_5: cmp eax,5 jne lctrl_6 GO_ON j_lctrl_5, lctrl_6, 8 j_lctrl_5: mov eax,frm ; 5=FRM CHECKCODESIZE j_lctrl_5 lctrl_6: mov dword [j_lctrl_6+1],edi GO_ON j_lctrl_6, OP_SCTRL, 8 j_lctrl_6: mov eax,12345678h ; 6=CIP CHECKCODESIZE j_lctrl_6 OP_SCTRL: ;nop; mov eax,[ebx+4] cmp eax,2 jne sctrl_4 GO_ON j_sctrl_2, sctrl_4, 8 j_sctrl_2: mov hea,eax ; 2=HEA CHECKCODESIZE j_sctrl_2 sctrl_4: cmp eax,4 jne sctrl_5 GO_ON j_sctrl_4, sctrl_5, 8 j_sctrl_4: ;mov esp,eax ; 4=STK ;add esp,edi ; relocate stack lea esi,[eax+edi] CHECKCODESIZE j_sctrl_4 sctrl_5: cmp eax,5 jne sctrl_ignore GO_ON j_sctrl_5, sctrl_ignore, 8 j_sctrl_5: mov ebx,eax ; 5=FRM mov frm,eax add ebx,edi ; relocate frame CHECKCODESIZE j_sctrl_5 sctrl_ignore: mov [ebx],edi add ebx,8 jmp dword [ebx] OP_MOVE_PRI: ;nop; GO_ON j_move_pri, OP_MOVE_ALT j_move_pri: mov eax,edx CHECKCODESIZE j_move_pri ;good OP_MOVE_ALT: ;nop; GO_ON j_move_alt, OP_XCHG j_move_alt: mov edx,eax CHECKCODESIZE j_move_alt OP_XCHG: ;nop; GO_ON j_xchg, OP_PUSH_PRI j_xchg: ;one might use pushes/pops for pre-586's xchg eax,edx CHECKCODESIZE j_xchg ;good OP_PUSH_PRI: ;nop; GO_ON j_push_pri, OP_PUSH_ALT j_push_pri: _PUSH eax CHECKCODESIZE j_push_pri ;good OP_PUSH_ALT: ;nop; GO_ON j_push_alt, OP_PUSH_R_PRI j_push_alt: _PUSH edx CHECKCODESIZE j_push_alt OP_PUSH_R_PRI: ;nop; putval j_push_r_pri+2 GO_ON j_push_r_pri, OP_PUSH_C, 8 j_push_r_pri: push ecx mov ecx,12345678h j_push_loop: _PUSH eax loop j_push_loop pop ecx ;dec ecx ;jnz j_push_loop CHECKCODESIZE j_push_r_pri ;good OP_PUSH_C: ;nop; putval j_push_c_end-4 GO_ON j_push_c, OP_PUSH, 8 j_push_c: _PUSH 12345678h j_push_c_end: CHECKCODESIZE j_push_c OP_PUSH: ;nop; putval j_push_end-6 GO_ON j_push, OP_PUSH_S, 8 j_push: _PUSHMEM [edi+12345678h] j_push_end: CHECKCODESIZE j_push ;good OP_PUSH_S: ;nop; putval j_push_s_end-6 GO_ON j_push_s, OP_POP_PRI, 8 j_push_s: _PUSHMEM [ebx+12345678h] j_push_s_end: CHECKCODESIZE j_push_s OP_POP_PRI: ;nop; GO_ON j_pop_pri, OP_POP_ALT j_pop_pri: _POP eax CHECKCODESIZE j_pop_pri ;good OP_POP_ALT: ;nop; GO_ON j_pop_alt, OP_STACK j_pop_alt: _POP edx CHECKCODESIZE j_pop_alt ;good OP_STACK: ;nop; putval j_stack+4 GO_ON j_stack, OP_HEAP, 8 j_stack: mov edx,esi add esi,12345678h sub edx,edi %ifdef DORUNTIMECHECKS call [chk_marginstack] %endif CHECKCODESIZE j_stack ;good OP_HEAP: ;nop; putval j_heap_call-4 GO_ON j_heap, OP_PROC, 8 j_heap: mov edx,hea add dword hea,12345678h j_heap_call: %ifdef DORUNTIMECHECKS call [chk_marginheap] %endif CHECKCODESIZE j_heap ;good OP_PROC: ;nop; GO_ON j_proc, OP_RET j_proc: ;[STK] = FRM, STK = STK - cell size, FRM = STK _PUSHMEM frm ; push old frame (for RET/RETN) mov frm,esi ; get new frame mov ebx,esi ; already relocated sub frm,edi ; relocate frame CHECKCODESIZE j_proc OP_RET: ;nop; GO_ON j_ret, OP_RETN j_ret: _POP ebx ; pop frame lea esi,[esi+4] mov frm,ebx add ebx,edi ret ;call [jit_ret] CHECKCODESIZE j_ret ;good OP_RETN: ;nop; GO_ON j_retn, OP_CALL j_retn: jmp [jit_retn] CHECKCODESIZE j_retn ;good OP_CALL: ;nop; RELOC j_call_e8-j_call+1 GO_ON j_call, OP_CALL_I, 8 j_call: ;call 12345678h ; tasm chokes on this out of a sudden _PUSH 0 j_call_e8 db 0e8h, 0, 0, 0, 0 CHECKCODESIZE j_call OP_CALL_I: ;nop; GO_ON j_call_i, OP_JUMP j_call_i: _PUSH 0 call eax CHECKCODESIZE j_call_i ;good OP_JUMP: ;nop; RELOC 1 GO_ON j_jump, OP_JREL, 8 j_jump: DB 0e9h DD 12345678h CHECKCODESIZE j_jump OP_JREL: ;nop; mov eax,[ebx+4] ; create an absolute address from the relative one RELOC 1, eax+ebx+8 ; GWMV: is the next line really correct!? GO_ON j_jump, OP_JREL, 8 ;good OP_JZER: ;nop; RELOC 4 GO_ON j_jzer, OP_JNZ, 8 j_jzer: or eax,eax DB 0fh, 84h, 0, 0, 0, 0 ;jz NEAR 0 (tasm sucks a bit) CHECKCODESIZE j_jzer ;good OP_JNZ: ;nop; RELOC 4 GO_ON j_jnz, OP_JEQ, 8 j_jnz: or eax,eax DB 0fh, 85h, 0, 0, 0, 0 ;jnz NEAR 0 CHECKCODESIZE j_jnz ;good OP_JEQ: ;nop; RELOC 4 GO_ON j_jeq, OP_JNEQ, 8 j_jeq: cmp eax,edx DB 0fh, 84h, 0, 0, 0, 0 ;je NEAR 0 (tasm sucks a bit) CHECKCODESIZE j_jeq OP_JNEQ: ;nop; RELOC 4 GO_ON j_jneq, OP_JLESS, 8 j_jneq: cmp eax,edx DB 0fh, 85h, 0, 0, 0, 0 ;jne NEAR 0 (tasm sucks a bit) CHECKCODESIZE j_jneq OP_JLESS: ;nop; RELOC 4 GO_ON j_jless, OP_JLEQ, 8 j_jless: cmp eax,edx DB 0fh, 82h, 0, 0, 0, 0 ;jb NEAR 0 (tasm sucks a bit) CHECKCODESIZE j_jless OP_JLEQ: ;nop; RELOC 4 GO_ON j_jleq, OP_JGRTR, 8 j_jleq: cmp eax,edx DB 0fh, 86h, 0, 0, 0, 0 ;jbe NEAR 0 (tasm sucks a bit) CHECKCODESIZE j_jleq OP_JGRTR: ;nop; RELOC 4 GO_ON j_jgrtr, OP_JGEQ, 8 j_jgrtr: cmp eax,edx DB 0fh, 87h, 0, 0, 0, 0 ;ja NEAR 0 (tasm sucks a bit) CHECKCODESIZE j_jgrtr OP_JGEQ: ;nop; RELOC 4 GO_ON j_jgeq, OP_JSLESS, 8 j_jgeq: cmp eax,edx DB 0fh, 83h, 0, 0, 0, 0 ;jae NEAR 0 (unsigned comparison) CHECKCODESIZE j_jgeq OP_JSLESS: ;nop; RELOC 4 GO_ON j_jsless, OP_JSLEQ, 8 j_jsless: cmp eax,edx DB 0fh, 8ch, 0, 0, 0, 0 ;jl NEAR 0 CHECKCODESIZE j_jsless ;good OP_JSLEQ: ;nop; RELOC 4 GO_ON j_jsleq, OP_JSGRTR, 8 j_jsleq: cmp eax,edx DB 0fh, 8eh, 0, 0, 0, 0 ;jle NEAR 0 CHECKCODESIZE j_jsleq OP_JSGRTR: ;nop; RELOC 4 GO_ON j_jsgrtr, OP_JSGEQ, 8 j_jsgrtr: cmp eax,edx DB 0fh, 8Fh, 0, 0, 0, 0 ;jg NEAR 0 CHECKCODESIZE j_jsgrtr OP_JSGEQ: ;nop; RELOC 4 GO_ON j_jsgeq, OP_SHL, 8 j_jsgeq: cmp eax,edx DB 0fh, 8dh, 0, 0, 0, 0 ;jge NEAR 0 CHECKCODESIZE j_jsgeq OP_SHL: ;nop; GO_ON j_shl, OP_SHR j_shl: push ecx mov ecx,edx shl eax,cl pop ecx CHECKCODESIZE j_shl OP_SHR: ;nop; GO_ON j_shr, OP_SSHR j_shr: push ecx mov ecx,edx shr eax,cl pop ecx CHECKCODESIZE j_shr OP_SSHR: ;nop; GO_ON j_sshr, OP_SHL_C_PRI j_sshr: push ecx mov ecx,edx sar eax,cl pop ecx CHECKCODESIZE j_sshr OP_SHL_C_PRI: ;nop; mov al,[ebx+4] mov byte [j_shl_c_pri+2],al GO_ON j_shl_c_pri, OP_SHL_C_ALT, 8 j_shl_c_pri: shl eax,12h CHECKCODESIZE j_shl_c_pri OP_SHL_C_ALT: ;nop; mov al,[ebx+4] mov byte [j_shl_c_alt+2],al GO_ON j_shl_c_alt, OP_SHR_C_PRI, 8 j_shl_c_alt: shl edx,12h CHECKCODESIZE j_shl_c_alt OP_SHR_C_PRI: ;nop; mov al,[ebx+4] mov byte [j_shr_c_pri+2],al GO_ON j_shr_c_pri, OP_SHR_C_ALT, 8 j_shr_c_pri: shr eax,12h CHECKCODESIZE j_shr_c_pri OP_SHR_C_ALT: ;nop; mov al,[ebx+4] mov byte [j_shr_c_alt+2],al GO_ON j_shr_c_alt, OP_SMUL, 8 j_shr_c_alt: shr edx,12h CHECKCODESIZE j_shr_c_alt OP_SMUL: ;nop; GO_ON j_smul, OP_SDIV j_smul: push edx imul edx pop edx CHECKCODESIZE j_smul ;good OP_SDIV: ;nop; GO_ON j_sdiv, OP_SDIV_ALT j_sdiv: call [jit_sdiv] CHECKCODESIZE j_sdiv OP_SDIV_ALT: ;nop; GO_ON j_sdiv_alt, OP_UMUL j_sdiv_alt: xchg eax,edx call [jit_sdiv] CHECKCODESIZE j_sdiv_alt OP_UMUL: ;nop; GO_ON j_umul, OP_UDIV j_umul: push edx mul edx pop edx CHECKCODESIZE j_umul OP_UDIV: ;nop; GO_ON j_udiv, OP_UDIV_ALT j_udiv: mov ebp,edx sub edx,edx call [chk_dividezero] div ebp CHECKCODESIZE j_udiv OP_UDIV_ALT: ;nop; GO_ON j_udiv_alt, OP_ADD j_udiv_alt: mov ebp,eax mov eax,edx sub edx,edx call [chk_dividezero] div ebp CHECKCODESIZE j_udiv_alt ;good OP_ADD: ;nop; GO_ON j_add, OP_SUB j_add: add eax,edx CHECKCODESIZE j_add ;good OP_SUB: ;nop; GO_ON j_sub, OP_SUB_ALT j_sub: sub eax,edx CHECKCODESIZE j_sub ;good OP_SUB_ALT: ;nop; GO_ON j_sub_alt, OP_AND j_sub_alt: neg eax add eax,edx CHECKCODESIZE j_sub_alt OP_AND: ;nop; GO_ON j_and, OP_OR j_and: and eax,edx CHECKCODESIZE j_and OP_OR: ;nop; GO_ON j_or, OP_XOR j_or: or eax,edx CHECKCODESIZE j_or OP_XOR: ;nop; GO_ON j_xor, OP_NOT j_xor: xor eax,edx CHECKCODESIZE j_xor OP_NOT: ;nop; GO_ON j_not, OP_NEG j_not: neg eax ; sets CF iff EAX != 0 sbb eax,eax ; EAX == -1 iff CF set (zero otherwise) inc eax ; -1 => 0 and 0 => 1 CHECKCODESIZE j_not OP_NEG: ;nop; GO_ON j_neg, OP_INVERT j_neg: neg eax CHECKCODESIZE j_neg OP_INVERT: ;nop; GO_ON j_invert, OP_ADD_C j_invert: not eax CHECKCODESIZE j_invert ;good OP_ADD_C: ;nop; putval j_add_c+1 GO_ON j_add_c, OP_SMUL_C, 8 j_add_c: add eax,12345678h CHECKCODESIZE j_add_c ;good OP_SMUL_C: ;nop; putval j_smul_c+3 GO_ON j_smul_c, OP_ZERO_PRI, 8 j_smul_c: push edx imul eax,12345678h pop edx CHECKCODESIZE j_smul_c ;good OP_ZERO_PRI: ;nop; GO_ON j_zero_pri, OP_ZERO_ALT j_zero_pri: sub eax,eax CHECKCODESIZE j_zero_pri ;good OP_ZERO_ALT: ;nop; GO_ON j_zero_alt, OP_ZERO j_zero_alt: sub edx,edx CHECKCODESIZE j_zero_alt OP_ZERO: ;nop; putval j_zero+2 GO_ON j_zero, OP_ZERO_S, 8 j_zero: mov dword [edi+12345678h],0 CHECKCODESIZE j_zero OP_ZERO_S: ;nop; putval j_zero_s+2 GO_ON j_zero_s, OP_SIGN_PRI, 8 j_zero_s: mov dword [ebx+12345678h],0 CHECKCODESIZE j_zero_s OP_SIGN_PRI: ;nop; GO_ON j_sign_pri, OP_SIGN_ALT j_sign_pri: shl eax,24 sar eax,24 CHECKCODESIZE j_sign_pri OP_SIGN_ALT: ;nop; GO_ON j_sign_alt, OP_EQ j_sign_alt: shl edx,24 sar edx,24 CHECKCODESIZE j_sign_alt OP_EQ: ;nop; GO_ON j_eq, OP_NEQ j_eq: cmp eax,edx ; PRI == ALT ? mov eax,0 sete al CHECKCODESIZE j_eq OP_NEQ: ;nop; GO_ON j_neq, OP_LESS j_neq: cmp eax,edx ; PRI != ALT ? mov eax,0 setne al CHECKCODESIZE j_neq OP_LESS: ;nop; GO_ON j_less, OP_LEQ j_less: cmp eax,edx ; PRI < ALT ? (unsigned) mov eax,0 setb al CHECKCODESIZE j_less OP_LEQ: ;nop; GO_ON j_leq, OP_GRTR j_leq: cmp eax,edx ; PRI <= ALT ? (unsigned) mov eax,0 setbe al CHECKCODESIZE j_leq OP_GRTR: ;nop; GO_ON j_grtr, OP_GEQ j_grtr: cmp eax,edx ; PRI > ALT ? (unsigned) mov eax,0 seta al CHECKCODESIZE j_grtr OP_GEQ: ;nop; GO_ON j_geq, OP_SLESS j_geq: cmp eax,edx ; PRI >= ALT ? (unsigned) mov eax,0 setae al CHECKCODESIZE j_geq ;good OP_SLESS: ;nop; GO_ON j_sless, OP_SLEQ j_sless: cmp eax,edx ; PRI < ALT ? (signed) mov eax,0 setl al CHECKCODESIZE j_sless OP_SLEQ: ;nop; GO_ON j_sleq, OP_SGRTR j_sleq: cmp eax,edx ; PRI <= ALT ? (signed) mov eax,0 setle al CHECKCODESIZE j_sleq OP_SGRTR: ;nop; GO_ON j_sgrtr, OP_SGEQ j_sgrtr: cmp eax,edx ; PRI > ALT ? (signed) mov eax,0 setg al CHECKCODESIZE j_sgrtr OP_SGEQ: ;nop; GO_ON j_sgeq, OP_EQ_C_PRI j_sgeq: cmp eax,edx ; PRI >= ALT ? (signed) mov eax,0 setge al CHECKCODESIZE j_sgeq OP_EQ_C_PRI: ;nop; putval j_eq_c_pri+1 GO_ON j_eq_c_pri, OP_EQ_C_ALT, 8 j_eq_c_pri: cmp eax,12345678h ; PRI == value ? mov eax,0 sete al CHECKCODESIZE j_eq_c_pri OP_EQ_C_ALT: ;nop; putval j_eq_c_alt+4 GO_ON j_eq_c_alt, OP_INC_PRI, 8 j_eq_c_alt: sub eax,eax cmp edx,12345678h ; ALT == value ? sete al CHECKCODESIZE j_eq_c_alt OP_INC_PRI: ;nop; GO_ON j_inc_pri, OP_INC_ALT j_inc_pri: inc eax CHECKCODESIZE j_inc_pri OP_INC_ALT: ;nop; GO_ON j_inc_alt, OP_INC j_inc_alt: inc edx CHECKCODESIZE j_inc_alt OP_INC: ;nop; putval j_inc+2 GO_ON j_inc, OP_INC_S, 8 j_inc: inc dword [edi+12345678h] CHECKCODESIZE j_inc ;good OP_INC_S: ;nop; putval j_inc_s+2 GO_ON j_inc_s, OP_INC_I, 8 j_inc_s: inc dword [ebx+12345678h] CHECKCODESIZE j_inc_s OP_INC_I: ;nop; GO_ON j_inc_i, OP_DEC_PRI j_inc_i: inc dword [edi+eax] CHECKCODESIZE j_inc_i OP_DEC_PRI: ;nop; GO_ON j_dec_pri, OP_DEC_ALT j_dec_pri: dec eax CHECKCODESIZE j_dec_pri OP_DEC_ALT: ;nop; GO_ON j_dec_alt, OP_DEC j_dec_alt: dec edx CHECKCODESIZE j_dec_alt OP_DEC: ;nop; putval j_dec+2 GO_ON j_dec, OP_DEC_S, 8 j_dec: dec dword [edi+12345678h] CHECKCODESIZE j_dec OP_DEC_S: ;nop; putval j_dec_s+2 GO_ON j_dec_s, OP_DEC_I, 8 j_dec_s: dec dword [ebx+12345678h] CHECKCODESIZE j_dec_s OP_DEC_I: ;nop; GO_ON j_dec_i, OP_MOVS j_dec_i: dec dword [edi+eax] CHECKCODESIZE j_dec_i OP_MOVS: ;nop; putval j_movs+2 GO_ON j_movs, OP_CMPS, 8 j_movs: push ecx mov ecx,12345678h call [jit_movs] pop ecx CHECKCODESIZE j_movs OP_CMPS: ;nop; putval j_cmps+2 GO_ON j_cmps, OP_FILL, 8 j_cmps: push ecx mov ecx,12345678h call [jit_cmps] pop ecx CHECKCODESIZE j_cmps OP_FILL: ;nop; putval j_fill+2 GO_ON j_fill, OP_HALT, 8 j_fill: push ecx mov ecx,12345678h ;TODO: save ECX if used as special register call [jit_fill] pop ecx CHECKCODESIZE j_fill ;good OP_HALT: ;nop; putval j_halt_sm+1 GO_ON j_halt, OP_BOUNDS, 8 j_halt: cmp dword retval,0 je j_halt_no_value mov ebp,retval mov [ebp],eax j_halt_no_value: j_halt_sm: mov eax,12345678h jmp [jit_return] CHECKCODESIZE j_halt ;good OP_BOUNDS: ;nop; putval j_bounds+1 GO_ON j_bounds, OP_SYSREQ_C, 8 j_bounds: mov ebp,12345678h call [jit_bounds] CHECKCODESIZE j_bounds ;good OP_SYSREQ_C: ;nop; putval j_sysreq_c+1 GO_ON j_sysreq_c, OP_SYSREQ_PRI, 8 j_sysreq_c: mov eax,12345678h ; get function number j_sysreq: call [jit_sysreq] CHECKCODESIZE j_sysreq_c ; GWMV: oh well, it may look stupid, but I don't want to miss anything CHECKCODESIZE j_sysreq OP_SYSREQ_PRI: ;nop; GO_ON j_sysreq, OP_SYSREQ_PRI OP_FILE: ;opcode is simply ignored ;nop; mov eax,[ebx+4] ;get size mov [ebx],edi lea ebx,[ebx+eax+8] ;move on to next opcode cmp ebx,dword [end_code] jae code_gen_done jmp dword [ebx] ;go on with the next opcode OP_LINE: ;nop; mov [ebx],edi ; no line number support: ignore opcode add ebx,12 ; move on to next opcode cmp ebx,[end_code] jae code_gen_done jmp dword [ebx] ; go on with the next opcode OP_SYMBOL: ;ignored mov [ebx],edi mov eax,[ebx+4] ; get size lea ebx,[ebx+eax+8] ; move on to next opcode cmp ebx,[end_code] jae code_gen_done jmp dword [ebx] ; go on with the next opcode OP_SRANGE: ;ignored mov [ebx],edi ; store relocated address add ebx,12 ; move on to next opcode cmp ebx,[end_code] jae code_gen_done jmp dword [ebx] ; go on with the next opcode ;not tested OP_JUMP_PRI: GO_ON j_jump_pri, OP_SWITCH j_jump_pri: ; MP: This opcode makes sense only in con- jmp [eax] ; junction with a possibility to get the ; address of a code location... CHECKCODESIZE j_jump_pri ;good OP_SWITCH: lea eax,[edi+6] ; The case table will be copied directly neg eax ; after the run-time call to [jit_switch]. and eax,3 ; We should align this table on a DWORD mov ecx,eax ; boundary. mov al,90h ; 90h = opcode of x86 NOP instruction rep stosb ; Write the right number of NOPs. mov [ebx],edi ; store address of SWITCH for relocation step mov esi, j_switch mov ecx,6 rep movsb ; copy the call instruction mov esi,[ebx+4] ; get address of CASETBL instruction add ebx,8 ; set instruction pointer to next opcode add esi,4 ; point esi to first entry: (count, default adr) mov ecx,[esi] ; get number of cases (excluding default) inc ecx mov ebp,[reloc_num] j_case_loop: mov eax,[esi] ; get case value stosd ; write it mov eax,[esi+4] ; get destination address %ifndef FORCERELOCATABLE or eax,80000000h ; add flag for "absolute address" %endif mov [edx+ebp],eax ; write dest. adr. into relocation table mov eax,[esi+4] ; get destination address (again) add esi,8 ; set ESI to next case mov [edx+ebp+4],edi ; write adr. to patch into relocation table add ebp,8 ; promote relocation pointer stosd ; write dest. adr. dec ecx jnz j_case_loop mov dword [reloc_num],ebp ; write back updated reloc_num jmp [ebx] ; GO_ON to next op-code j_switch: call [jit_switch] ;good OP_CASETBL: ; compiles to nothing, SWITCH does all the work mov eax,[ebx+4] ; get count of cases lea ebx,[ebx+8*eax+(8+4)] ; adjust instruction pointer jmp [ebx] ; GO_ON with next op-code OP_SWAP_PRI: ; TR GO_ON j_swap_pri, OP_SWAP_ALT j_swap_pri: _POP ebp _PUSH eax mov eax,ebp CHECKCODESIZE j_swap_pri OP_SWAP_ALT: ; TR GO_ON j_swap_alt, OP_PUSHADDR j_swap_alt: _POP ebp _PUSH edx mov edx,ebp CHECKCODESIZE j_swap_alt OP_PUSHADDR: ; TR putval j_pushaddr+1 GO_ON j_pushaddr, OP_NOP, 8 j_pushaddr: mov ebp,12345678h ;get address (offset from frame) add ebp,frm _PUSH ebp CHECKCODESIZE j_pushaddr OP_NOP: ; TR GO_ON j_nop, OP_SYSREQ_D j_nop: ; code alignment is ignored by the JIT CHECKCODESIZE j_nop OP_SYSREQ_D: ;nop; putval j_sysreq_d+1 GO_ON j_sysreq_d, OP_SYMTAG, 8 j_sysreq_d: mov ebx,12345678h ; get function address call [jit_sysreq_d] CHECKCODESIZE j_sysreq_d OP_SYMTAG: ;ignored (TR) mov [ebx],edi ; store relocated address add ebx,8 ; move on to next opcode cmp ebx,[end_code] jae code_gen_done jmp dword [ebx] ; go on with the next opcode OP_BREAK: %ifndef DEBUGSUPPORT mov [ebx],edi ; no line number support: ignore opcode add ebx,4 ; move on to next opcode cmp ebx,[end_code] jae code_gen_done jmp DWORD [ebx] ; go on with the next opcode %else GO_ON j_break, OP_FLOAT_MUL j_break: mov ebp,amx cmp DWORD [ebp+_debug], 0 je $+4 ; jump around the "call" statement call [jit_break] CHECKCODESIZE j_break %endif OP_FLOAT_MUL: GO_ON j_float_mul, OP_FLOAT_DIV j_float_mul: fld dword [esi+4] fmul dword [esi+8] sub esp, 4 fstp dword [esp] pop eax CHECKCODESIZE j_float_mul OP_FLOAT_DIV: GO_ON j_float_div, OP_FLOAT_ADD j_float_div: fld dword [esi+4] fdiv dword [esi+8] sub esp, 4 fstp dword [esp] pop eax CHECKCODESIZE j_float_div OP_FLOAT_ADD: GO_ON j_float_add, OP_FLOAT_SUB j_float_add: fld dword [esi+4] fadd dword [esi+8] sub esp, 4 fstp dword [esp] pop eax CHECKCODESIZE j_float_add OP_FLOAT_SUB: GO_ON j_float_sub, OP_FLOAT_TO j_float_sub: fld dword [esi+4] fsub dword [esi+8] sub esp, 4 fstp dword [esp] pop eax CHECKCODESIZE j_float_sub OP_FLOAT_TO: GO_ON j_float_to, OP_FLOAT_ROUND j_float_to: fild dword [esi+4] sub esp, 4 fstp dword [esp] pop eax CHECKCODESIZE j_float_to OP_FLOAT_ROUND: GO_ON j_float_round, OP_FLOAT_CMP j_float_round: ;get the float control word push 0 mov ebp,esp fstcw [ebp] mov eax,[ebp] push eax ;clear the top bits xor ah,ah ;get the control method push edx mov edx,[esi+8] and edx,3 ;sanity check shl edx,2 ;shift it to right position ;set the bits or ah,dl ;set bits 15,14 of FCW to rounding method or ah,3 ;set precision to 64bit mov [ebp], eax fldcw [ebp] ;calculate sub esp,4 fld dword [esi+4] test edx,edx jz .correct jmp .skip_correct .correct: fadd st0 fadd dword [g_round_nearest] fistp dword [esp] pop eax sar eax,1 jmp .done .skip_correct: frndint fistp dword [esp] pop eax .done: pop edx ;restore bits pop ebp mov [esp], ebp fldcw [esp] pop ebp CHECKCODESIZE j_float_round OP_FLOAT_CMP: GO_ON j_float_cmp, OP_INVALID j_float_cmp: fld dword [esi+8] fld dword [esi+4] fucompp fnstsw ax fwait sahf cmovz eax, [g_flagsjit+4] cmova eax, [g_flagsjit+8] cmovb eax, [g_flagsjit+0] CHECKCODESIZE j_float_cmp OP_INVALID: ; break from the compiler with an error code mov eax,AMX_ERR_INVINSTR pop esi pop edi pop ecx pop ebp ret section .text ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ; ; ;cell amx_exec( cell *regs, cell *retval, cell stp, cell hea ); ; eax edx ebx ecx ; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; amx_exec_jit: _amx_exec_jit: push edi push esi push ebp push ebx ; due to __cdecl ; __cdecl overhead mov eax, [esp+20] ; get address of amx regs structure mov edx, [esp+24] ; get address of retval mov ebx, [esp+28] ; get stp mov ecx, [esp+32] ; get hea sub esp,4*3 ; place for PRI, ALT & STK at SYSREQs push dword [eax+28] ; store pointer to code segment push dword [eax+24] ; store pointer to AMX push edx ; store address of retval push ebx ; store STP push ecx ; store HEA push dword [eax+20]; store FRM mov edx,[eax+4] ; get ALT mov ebp,[eax+8] ; get CIP mov edi,[eax+12] ; get pointer to data segment mov esi,[eax+16] ; get STK !!changed, now ECX free as counter!! mov ebx,[eax+20] ; get FRM mov eax,[eax] ; get PRI add ebx,edi ; relocate frame add esi,edi ; ESP will contain DAT+STK add [esp+8],edi ; make STP absolute address for run-time checks mov dword [esi], 0 ; zero this out, but we need to keep it so ; the stack frame is in tact mov ecx,esp ; copy stack pointer ; Call compiled code via CALL NEAR
call ebp return_to_caller: cmp dword retval,0 je return_to_caller_no_value mov ebp,retval mov [ebp],eax ; provide return value return_to_caller_no_value: mov eax,AMX_ERR_NONE jmp _return _return_popstack: mov esp,ecx ; get our old stack pointer _return: ; store machine state push ecx push ecx mov ebp,amx ; get amx into EBP mov ecx,esi ; get STK into ECX sub ecx,edi ; correct STK mov [ebp+_stk],ecx ; store values in AMX structure: STK, ... pop ecx ; get orig value mov ecx,hea ; ... HEA, ... mov [ebp+_hea],ecx mov ecx,ebx ; ... and FRM sub ecx,edi ; (correct FRM) mov [ebp+_frm],ecx mov [ebp+_pri],eax ; also store PRI, ... mov [ebp+_alt],edx ; ... and ALT ; return pop ecx sub stp,edi ; make STP relative to DAT again add esp,4*9 ; remove temporary data pop ebx ; restore registers that have to be preserved pop ebp ; when using __cdecl convention pop esi pop edi ret err_stack: mov eax,AMX_ERR_STACKERR jmp _return_popstack err_stacklow: mov eax,AMX_ERR_STACKLOW jmp _return_popstack _CHKMARGIN_STACK: ; some run-time check routines cmp esi,stp jg err_stacklow ret err_heaplow: mov eax,AMX_ERR_HEAPLOW jmp _return_popstack _CHKMARGIN_HEAP: cmp esi,stp jg err_stacklow cmp dword hea,0 jl err_heaplow ret err_memaccess: mov eax,AMX_ERR_MEMACCESS jmp _return_popstack _VERIFYADDRESS_eax: ; used in load.i, store.i & lidx cmp eax,stp jae err_memaccess cmp eax,hea jb veax1 lea ebp,[eax+edi] cmp ebp,esi jb err_memaccess veax1: ret _VERIFYADDRESS_edx: ; used in load.i, store.i & lidx cmp edx,stp jae err_memaccess cmp edx,hea jb vedx1 lea ebp,[edx+edi] cmp ebp,esi jb err_memaccess vedx1: ret JIT_OP_SDIV: mov ebp,edx xor edx,eax ; Check signs of the operands. cdq js sdiv_fiddle ; If the signs of the operands are different ; we'll have to fiddle around to achieve ; proper rounding towards minus infinity. or ebp,ebp ; check for divide by zero jz err_divide idiv ebp ; default behavior is right in the other cases ret sdiv_fiddle: or ebp,ebp ; check for divide by zero jz err_divide idiv ebp or edx,edx jz sdiv_goon ; If there's no remainder the result is correct add edx,ebp ; else fix the result values. dec eax ; Amazing, how simple this is... sdiv_goon: ret ALIGN 4 JIT_OP_RETN: _POP ebx ; pop frame add esi,4 ; get rid of the extra parameter from call mov frm,ebx _POP ebp add ebx,edi add esi,ebp ; remove data from stack ret JIT_OP_MOVS: ;length of block to copy is already in ECX push edi push esi lea esi,[edi+eax] lea edi,[edi+edx] push ecx ; I hope the blocks to copy are properly shr ecx,2 ; aligned, so I don't do anything about that. rep movsd pop ecx and ecx,3 rep movsb pop esi pop edi ret JIT_OP_CMPS: ;length of block to compare is already in ECX push edi push esi lea esi,[edi+edx] lea edi,[edi+eax] xor eax,eax ; This is surely not the fastest way to do this repe cmpsb ; but the most simple one. je cmps1 sbb eax,eax sbb eax,0ffffffffh cmps1: pop esi pop edi ret JIT_OP_FILL: ;length (in bytes) of block to fill is already in ECX push edi lea edi,[edi+edx] shr ecx,2 ;length in 32-bit cells rep stosd ;the value to use is already in EAX pop edi ret JIT_OP_BOUNDS: cmp eax,0 jl err_bounds cmp eax,ebp jg err_bounds ret err_bounds: mov eax,AMX_ERR_BOUNDS jmp _return_popstack _CHKDIVIDEZERO: or ebp,ebp ; check for divide by zero jz err_divide ret err_divide: mov eax,AMX_ERR_DIVIDE jmp _return_popstack JIT_OP_SYSREQ: push ecx push esi mov ebp,amx ; get amx into EBP sub esi,edi ; correct STK mov alt,edx ; save ALT mov [ebp+_stk],esi ; store values in AMX structure: STK, mov esi,hea ; HEA, mov ebx,frm ; and FRM mov [ebp+_hea],esi mov [ebp+_frm],ebx lea ebx,pri ; 3rd param: addr. of retval ;Our original esi is still pushed! push ebx push eax ; 2nd param: function number push ebp ; 1st param: amx call [ebp+_callback] _DROPARGS 12 ; remove args from stack pop esi pop ecx cmp eax,AMX_ERR_NONE jne _return_popstack .continue: mov eax,pri ; get retval into eax (PRI) mov edx,alt ; restore ALT mov ebx,frm ; restore FRM add ebx,edi ; relocate frame ret JIT_OP_SYSREQ_D: ; (TR) push ecx push esi mov ebp,amx ; get amx into EBP sub esi,edi ; correct STK mov alt,edx ; save ALT mov [ebp+_stk],esi ; store values in AMX structure: STK, mov esi,hea ; HEA, mov eax,frm ; and FRM mov [ebp+_hea],esi mov [ebp+_frm],eax ; eax & ecx are invalid by now ;esi is still pushed! push ebp ; 1st param: amx call ebx ; direct call _DROPARGS 8 ; remove args from stack pop ecx mov ebp,amx ; get amx into EBP cmp dword [ebp+_error],AMX_ERR_NONE jne _return_popstack; return error code, if any ; return value is in eax (PRI) mov edx,alt ; restore ALT mov ebx,frm ; restore FRM add ebx,edi ; relocate frame ret JIT_OP_BREAK: %ifdef DEBUGSUPPORT push ecx push esi mov ebp,amx ; get amx into EBP sub esi,edi ; correct STK mov [ebp+_pri],eax ; store values in AMX structure: PRI, mov [ebp+_alt],edx ; ALT, mov [ebp+_stk],esi ; STK, mov esi,hea ; HEA, mov ebx,frm ; and FRM mov [ebp+_hea],esi mov [ebp+_frm],ebx ; EBX & ECX are invalid by now ;??? storing CIP is not very useful, because the code changed (during JIT compile) push ebp ; 1st param: amx call [ebp+_debug] _DROPARGS 4 ; remove args from stack pop esi pop ecx cmp eax,AMX_ERR_NONE jne _return_popstack; return error code, if any mov ebp,amx ; get amx into EBP mov eax,[ebp+_pri] ; restore values mov edx,[ebp+_alt] ; ALT, mov edx,alt ; restore ALT mov ebx,frm ; restore FRM add ebx,edi ; relocate frame %endif ret JIT_OP_SWITCH: pop ebp ; pop return address = table address push ecx mov ecx,[ebp] ; ECX = number of records lea ebp,[ebp+ecx*8+8] ; set pointer _after_ LAST case ;if there are zero cases we should just skip this -- bail test ecx, ecx jz op_switch_jump op_switch_loop: cmp eax,[ebp-8] ; PRI == case label? je op_switch_jump ; found, jump sub ebp,8 ; position to preceding case loop op_switch_loop ; check next case, or fall through op_switch_jump: pop ecx %ifndef FORCERELOCATABLE jmp [ebp-4] ; jump to the case instructions %else add ebp,[ebp-4] ; add offset to make absolute adddress jmp ebp %endif ; The caller of asm_runJIT() can determine the maximum size of the compiled ; code by multiplying the result of this function by the number of opcodes in ; Pawn module. ; ; unsigned long getMaxCodeSize_(); ; getMaxCodeSize: _getMaxCodeSize: mov eax,MAXCODESIZE ret section .data ALIGN 4 ; This is essential to avoid misalignment stalls. end_code DD 0 ; pointer to the end of the source code compiled_code DD 0 ; pointer to compiled code (including preamble) amxhead DD 0 ; pointer to the AMX_HEADER struct (arg #1 to runJIT) reloc_num DD 0 ; counts the addresses in the relocation table (jumps) lodb_and DD 0ffh, 0ffffh, 0, 0ffffffffh ; ; A list of the "run-time-library" functions that are called via indirect calls. ; So these calls don't have to be relocated. This gives also the possibility to ; replace some of these with shorter/faster non-debug or non-checking versions, ; without changing the compiled code. Instead this table could be changed... ; verify_adr_eax DD _VERIFYADDRESS_eax verify_adr_edx DD _VERIFYADDRESS_edx chk_marginstack DD _CHKMARGIN_STACK chk_marginheap DD _CHKMARGIN_HEAP chk_dividezero DD _CHKDIVIDEZERO jit_return DD _return jit_retn DD JIT_OP_RETN jit_sdiv DD JIT_OP_SDIV jit_movs DD JIT_OP_MOVS jit_cmps DD JIT_OP_CMPS jit_fill DD JIT_OP_FILL jit_bounds DD JIT_OP_BOUNDS jit_sysreq DD JIT_OP_SYSREQ jit_sysreq_d DD JIT_OP_SYSREQ_D jit_break DD JIT_OP_BREAK jit_switch DD JIT_OP_SWITCH ; ; The table for the browser/relocator function. ; global g_flagsjit g_flagsjit: DD -1 DD 0 DD 1 global g_round_nearest g_round_nearest: DD 0.5 global amx_opcodelist_jit, _amx_opcodelist_jit amx_opcodelist_jit: _amx_opcodelist_jit: DD OP_INVALID DD OP_LOAD_PRI DD OP_LOAD_ALT DD OP_LOAD_S_PRI DD OP_LOAD_S_ALT DD OP_LREF_PRI DD OP_LREF_ALT DD OP_LREF_S_PRI DD OP_LREF_S_ALT DD OP_LOAD_I DD OP_LODB_I DD OP_CONST_PRI DD OP_CONST_ALT DD OP_ADDR_PRI DD OP_ADDR_ALT DD OP_STOR_PRI DD OP_STOR_ALT DD OP_STOR_S_PRI DD OP_STOR_S_ALT DD OP_SREF_PRI DD OP_SREF_ALT DD OP_SREF_S_PRI DD OP_SREF_S_ALT DD OP_STOR_I DD OP_STRB_I DD OP_LIDX DD OP_LIDX_B DD OP_IDXADDR DD OP_IDXADDR_B DD OP_ALIGN_PRI DD OP_ALIGN_ALT DD OP_LCTRL DD OP_SCTRL DD OP_MOVE_PRI DD OP_MOVE_ALT DD OP_XCHG DD OP_PUSH_PRI DD OP_PUSH_ALT DD OP_PUSH_R_PRI DD OP_PUSH_C DD OP_PUSH DD OP_PUSH_S DD OP_POP_PRI DD OP_POP_ALT DD OP_STACK DD OP_HEAP DD OP_PROC DD OP_RET DD OP_RETN DD OP_CALL DD OP_CALL_I DD OP_JUMP DD OP_JREL DD OP_JZER DD OP_JNZ DD OP_JEQ DD OP_JNEQ DD OP_JLESS DD OP_JLEQ DD OP_JGRTR DD OP_JGEQ DD OP_JSLESS DD OP_JSLEQ DD OP_JSGRTR DD OP_JSGEQ DD OP_SHL DD OP_SHR DD OP_SSHR DD OP_SHL_C_PRI DD OP_SHL_C_ALT DD OP_SHR_C_PRI DD OP_SHR_C_ALT DD OP_SMUL DD OP_SDIV DD OP_SDIV_ALT DD OP_UMUL DD OP_UDIV DD OP_UDIV_ALT DD OP_ADD DD OP_SUB DD OP_SUB_ALT DD OP_AND DD OP_OR DD OP_XOR DD OP_NOT DD OP_NEG DD OP_INVERT DD OP_ADD_C DD OP_SMUL_C DD OP_ZERO_PRI DD OP_ZERO_ALT DD OP_ZERO DD OP_ZERO_S DD OP_SIGN_PRI DD OP_SIGN_ALT DD OP_EQ DD OP_NEQ DD OP_LESS DD OP_LEQ DD OP_GRTR DD OP_GEQ DD OP_SLESS DD OP_SLEQ DD OP_SGRTR DD OP_SGEQ DD OP_EQ_C_PRI DD OP_EQ_C_ALT DD OP_INC_PRI DD OP_INC_ALT DD OP_INC DD OP_INC_S DD OP_INC_I DD OP_DEC_PRI DD OP_DEC_ALT DD OP_DEC DD OP_DEC_S DD OP_DEC_I DD OP_MOVS DD OP_CMPS DD OP_FILL DD OP_HALT DD OP_BOUNDS DD OP_SYSREQ_PRI DD OP_SYSREQ_C DD OP_FILE DD OP_LINE DD OP_SYMBOL DD OP_SRANGE DD OP_JUMP_PRI DD OP_SWITCH DD OP_CASETBL DD OP_SWAP_PRI ; TR DD OP_SWAP_ALT ; TR DD OP_PUSHADDR ; TR DD OP_NOP ; TR DD OP_SYSREQ_D ; TR DD OP_SYMTAG ; TR DD OP_BREAK ; TR DD OP_FLOAT_MUL ; DA DD OP_FLOAT_DIV ; DA DD OP_FLOAT_ADD ; DA DD OP_FLOAT_SUB ; DA DD OP_FLOAT_TO ; DA DD OP_FLOAT_ROUND ; DA DD OP_FLOAT_CMP ; DA END