.section __DATA,__data
.p2align 2
buffer:
.zero 4096
.section __TEXT,__text
.global _main
.build_version macos, 13, 0
.p2align 2
_main:
// x9: buf ptr
// x10: file descriptor storage
// x11: file size in bytes
//init ptr to buf
adrp x9, buffer@PAGE
add x9, x9, buffer@PAGEOFF
// open file
adr x0, file_path
mov x1, #0
mov x2, #444
mov x16, #5
svc 0
// copy file descriptor to x10
mov x10, x0
.p2align 2
stream_buffer:
// make syscall read, file descriptor is in x10
mov x0, x10
mov x1, x9
mov x2, #4096
mov x16, #3
svc 0
// if x0 == 0, exit, no bytes were read
cmp x0, #0
beq exit
blt error
// store number of bytes read
mov x11, x0
// write to stdout from buffer
mov x0, #1
mov x1, x9
mov x2, x11
mov x16, #4
svc 0
b stream_buffer
.p2align 2
exit:
// exit with status code 0
mov x0, #0
mov x16, #1
svc 0
.p2align 2
error:
mov x0, #1
adr x1, file_not_found_error_string
mov x2, #20
mov x16, #4
svc 0
b exit
.p2align 2
file_path:
.asciz "/test.txt"
.p2align 2
file_not_found_error_string:
.asciz "file was not found.\n"
I am trying to learn assembly by writing a simple program that models the ‘cat’ linux command. I am on a macbook air 2020 with an M1 chip. My program compiles fine, but when executing the binary, I am met with my program expecting input, to which it then echos whatever was input. I believe that I am misusing my file descriptors. Any help appreciated.
Oh this is hilarious.
The fact that your code ends up reading from stdin is the culmination of bugs in your code, paired with some unexpected OS behaviour.
Let’s look at this from a high-level perspective first:
- You open
/test.txt
for reading. - You read up to 4096 bytes from it.
- You write those bytes to stdout.
But you’re on arm64 macOS, which means that unless you’ve gone to great lengths to mess with the OS, the system volume is readonly and /test.txt
does not exist.
So your open
syscall is failing, but you don’t detect that because you don’t do error checking there. Bad!
Now, you might assume x0
to be -1
in that case because that’s what open()
does if called from C, but that’s not the syscall ABI. If you look at /usr/lib/system/libsystem_kernel.dylib
is a disassembler and seek to ___open
, you’ll see this:
;-- ___open:
;-- func.00002308:
0x00002308 b00080d2 mov x16, 5
0x0000230c 011000d4 svc 0x80
0x00002310 03010054 b.lo 0x2330
0x00002314 7f2303d5 pacibsp
0x00002318 fd7bbfa9 stp x29, x30, [sp, -0x10]!
0x0000231c fd030091 mov x29, sp
0x00002320 8a030094 bl sym._cerror
0x00002324 bf030091 mov sp, x29
0x00002328 fd7bc1a8 ldp x29, x30, [sp], 0x10
0x0000232c ff0f5fd6 retab
0x00002330 c0035fd6 ret
The key part here is the b.lo
. Syscalls use the carry flag (the “C” in NZCV) to signal whether there was an error or not. This means that:
b.lo
->x0
holds a file descriptorb.hs
->x0
holds anerrno
value
So your syscall fails and returns an error value in x0
. Specifically ENOENT
, since it can’t find the file you asked for. And ENOENT
happens to be 2
, so when you pass that error value to your next syscall, you end up reading from file descriptor 2
, which is stderr. But now, because you invoked your binary from the command line, file descriptors 0, 1 and 2 all happen to just be one and the same file descriptor, so reading from stderr in this case behaves like reading from stdin.
So how do you fix this? Put a b.hs error
after the first svc
.
And then pick a file path that actually exists.
Tracing system calls using dtrace should be helpful to see what system calls your program actually makes, decoding their args and (error) return values.