delorie.com/djgpp/faq/converting/asm.html
|
search
|
I have some code written in assembly which compiles under MASM and
TASM, but gcc gives me a long list of error messages.
First off, do not trust gas! Check that it does what you
expected it to. To be almost safe follow these guidelines:
- Use explicit sizing, i.e., use "movl" not "mov"
even if you think the arguments are clearly 32-bit. The fact that you
are using byte registers doesn't seem to matter.
- Code segment overrides as byte constants, not as "%cs:".
According to Charles Sandmann gas uses the phase of the moon in
deciding whether to ignore your prefixes.
- Make sure the operands match the instruction, don't just assume
you will get an error message.
To emphasize: gas at current can be trusted only to compile assembler
code produced by gcc correctly. All other code -- yours -- is subject
to introduction of subtle errors. Use a debugger to check the code
(once). Note that even objdump doesn't do segment overrides correctly
always.
Keeping these in mind, here are some tips for converting.
The GNU Assembler (as.exe) called by gcc accepts
AT&T syntax, which is different from Intel syntax. Notable
differences between the two syntaxes are:
- AT&T immediate operands are preceded by $; Intel
immediate operands are undelimited (Intel `push 4' is AT&T
`pushl $4'). AT&T register operands are preceded by
%; Intel register operands are undelimited. AT&T absolute
(as opposed to PC relative) jump/call operands are
prefixed by *; they are undelimited in Intel syntax.
- AT&T and Intel syntax use the opposite order for source and
destination operands. Intel `add eax, 4' is `addl $4,
%eax'. The `source, dest' convention is maintained for
compatibility with previous Unix assemblers.
- In AT&T syntax the size of memory operands is determined from the
last character of the opcode name. Opcode suffixes of b,
w, and l specify byte (8-bit), word (16-bit), and
long (32-bit) memory references. Intel syntax accomplishes this by
prefixes memory operands (not the opcodes themselves) with
`byte ptr', `word ptr', and `dword ptr'.
Thus, Intel `mov al, byte ptr FOO' is `movb FOO,
%al' in AT&T syntax.
- Immediate form long jumps and calls are `lcall/ljmp
$SECTION, $OFFSET' in AT&T syntax; the Intel syntax
is `call/jmp far SECTION:OFFSET'. Also, the
far return instruction is `lret $STACK-ADJUST' in AT&T
syntax; Intel syntax is `ret far STACK-ADJUST'.
- The AT&T assembler does not provide support for multiple section
programs. Unix style systems expect all programs to be single
sections.
- An Intel syntax indirect memory reference of the form
SECTION:[BASE + INDEX*SCALE + DISP]
is translated into the AT&T syntax
SECTION:DISP(BASE, INDEX, SCALE)
Examples:
Intel: [ebp - 4] AT&T: -4(%ebp)
Intel: [foo + eax*4] AT&T: foo(,%eax,4)
Intel: [foo] AT&T: foo(,1)
Intel: gs:foo AT&T: %gs:foo
For a complete description of the differences, get and unzip the files
named as.iN (where N is a digit) from the
bnuXXXdc.zip archive, then read the chapter
``i386-Dependent'' in the GNU assembler documentation. If you use the
stand-alone info reader, type at the DOS prompt:
info as machine i386
You will see a menu of gas features specific to x86
architecture.