Converting assembler from MASM/TASM

delorie.com/djgpp/faq/converting/asm.html

search

I have some code written in assembly which compiles under MASM and TASM, but gcc gives me a long list of error messages.

First off, do not trust gas! Check that it does what you expected it to. To be almost safe follow these guidelines:

Use explicit sizing, i.e., use "movl" not "mov" even if you think the arguments are clearly 32-bit. The fact that you are using byte registers doesn't seem to matter.
Code segment overrides as byte constants, not as "%cs:". According to Charles Sandmann gas uses the phase of the moon in deciding whether to ignore your prefixes.
Make sure the operands match the instruction, don't just assume you will get an error message.

To emphasize: gas at current can be trusted only to compile assembler code produced by gcc correctly. All other code -- yours -- is subject to introduction of subtle errors. Use a debugger to check the code (once). Note that even objdump doesn't do segment overrides correctly always.

Keeping these in mind, here are some tips for converting.

The GNU Assembler (as.exe) called by gcc accepts AT&T syntax, which is different from Intel syntax. Notable differences between the two syntaxes are:

AT&T immediate operands are preceded by $; Intel immediate operands are undelimited (Intel `push 4' is AT&T `pushl $4'). AT&T register operands are preceded by %; Intel register operands are undelimited. AT&T absolute (as opposed to PC relative) jump/call operands are prefixed by *; they are undelimited in Intel syntax.
AT&T and Intel syntax use the opposite order for source and destination operands. Intel `add eax, 4' is `addl $4, %eax'. The `source, dest' convention is maintained for compatibility with previous Unix assemblers.
In AT&T syntax the size of memory operands is determined from the last character of the opcode name. Opcode suffixes of b, w, and l specify byte (8-bit), word (16-bit), and long (32-bit) memory references. Intel syntax accomplishes this by prefixes memory operands (not the opcodes themselves) with `byte ptr', `word ptr', and `dword ptr'. Thus, Intel `mov al, byte ptr FOO' is `movb FOO, %al' in AT&T syntax.
Immediate form long jumps and calls are `lcall/ljmp $SECTION, $OFFSET' in AT&T syntax; the Intel syntax is `call/jmp far SECTION:OFFSET'. Also, the far return instruction is `lret $STACK-ADJUST' in AT&T syntax; Intel syntax is `ret far STACK-ADJUST'.
The AT&T assembler does not provide support for multiple section programs. Unix style systems expect all programs to be single sections.

An Intel syntax indirect memory reference of the form

  SECTION:[BASE + INDEX*SCALE + DISP]

is translated into the AT&T syntax

  SECTION:DISP(BASE, INDEX, SCALE)

Examples:

  Intel:  [ebp - 4]         AT&T:  -4(%ebp)
  Intel:  [foo + eax*4]     AT&T:  foo(,%eax,4)
  Intel:  [foo]             AT&T:  foo(,1)
  Intel:  gs:foo            AT&T:  %gs:foo

For a complete description of the differences, get and unzip the files named as.iN (where N is a digit) from the bnuXXXdc.zip archive, then read the chapter ``i386-Dependent'' in the GNU assembler documentation. If you use the stand-alone info reader, type at the DOS prompt:

  info as machine i386

You will see a menu of gas features specific to x86 architecture.

webmaster	delorie software privacy
Copyright © 1995	Updated Mar 1995