TEXT | CODE |
System Calls
When the operating system is in kernel mode, it has access to all its
internal data and code: it may call any internal function and, for example,
open files on disks or change hardware settings. Processes on the other hand
cannot do the same: even though ULIX-i386 maps the kernel memory in all page
tables, processes cannot access it because the protection bits in the page
table define that this memory area may only be used when the system runs in
ring 0---and processes run in ring 3 (user mode).
Even if a process was allowed to call kernel functions (by setting up the
page tables differently) that would not help much since privileged machine
instructions such as in and out (for talking to hardware devices)
cannot be executed in ring 3.
All operating systems provide system calls as a way to access these needed
kernel functions: on ULIX-i386 they allow a controlled switch from user mode
to kernel mode via the int instruction which switches to ring 0 and
executes a pre-defined interrupt handler.
...
While we implement system calls, we will also create functions
for the standard library that user mode programs must link in order
to conveniently talk to the operating system via functions such as
fork, open, read, etc.
There are several ways to implement system calls. Let's first look
at the way system calls can be called from user space. On 32 bit Intel
CPUs, Linux does
it via software interrupt 0x80 with arguments in registers:
| |
|
< example for system calls in linux > ≡ _start: ; tell linker entry point
mov edx,len ; message length
mov ecx,msg ; message to write
mov ebx,1 ; file descriptor (stdout)
mov eax,4 ; system call number (sys_write)
int 0x80 ; software interrupt 0x80
mov eax,1 ; system call number (sys_exit)
int 0x80 ; software interrupt 0x80
section .data
msg db 'Hello, world!',0xa ; the string to be printed
len equ $ - msg ; length of the string
|
(This example was taken from http://asm.sourceforge.net/intro/hello.html;
the comments were modified.)
On a Linux machine you could assemble, link, and run this file with
\begin{Verbatim}
$ nasm -f elf test.asm
$ ld test.o -o test
$ ./test
Hello, world!
\end{Verbatim}
In this program \register{EAX} always holds the system call number, the
other registers (in this example \register{EBX}, \register{ECX}, and
\register{EDX} are
used for arguments. System call 4 is the sys_write syscall.
Other operating systems put arguments on the stack or into
specific memory areas. We will stick with the Linux way
because it is simple to use registers.
Since adding assembler code to C programs for every system call,
standard libraries make things simpler for the application developer;
this can be done in two steps:
- Supplying a generic syscall function (that takes an arbitrary
number of arguments) reduces the above code to executing
\begin{Verbatim}
char *msg = "Hello, world!\n";
syscall (4, 1, msg, strlen(msg));
\end{Verbatim}
- But that is still unreadable, and also it is not portable because
system call numbers are not identical across different Unix versions.
Thus, for all standard system calls, some library provides the
better known functions (such as write) which allow the above code
to be written as
\begin{Verbatim}
char *msg = "Hello, world!\n";
write (STDOUT_FILENO, msg, strlen(msg));
\end{Verbatim}
(with the constant STDOUT_FILENO set to 1).
System Calls in ULIXI{}
ULIX-i386Â provides functions for adding (or modifying) system calls
to the system and a generic system call handler. For this purpose,
we create a system call table syscall_table that contains
pointers to functions, so for example, syscall_table[4] should
contain the address of ULIX-i386's sys_write function. If a
system call is not defined, the table entry is a null pointer, so
we can initialize the whole table with null bytes:
| |
|
< constants > ≡ #define MAX_SYSCALLS 0x8000 // max syscall number: 0x7fff
|
| |
|
< global variables > ≡ void *syscall_table[MAX_SYSCALLS];
|
Telling ULIX-i386 what function to execute when a specific system call
is made is as simple as writing the address into the proper array
entry. Nevertheless, we provide a function
| |
|
< function prototypes > ≡ void install_syscall_handler (int syscallno, void *syscall_handler);
|
which enters the handler address:
| |
|
< function implementations > ≡ void install_syscall_handler (int syscallno, void *syscall_handler) {
if (syscallno < MAX_SYSCALLS)
syscall_table[syscallno] = syscall_handler;
return;
};
|
So if we have already defined a function sys_write and declared
the system call number __NR_write, we could
activate the write system call by calling
| |
|
< syscall entry example > ≡ install_syscall_handler (__NR_write, sys_write);
|
The actual system call handler simply checks if there is a handler
for the given system call number and (if so) calls it:
| |
|
< function implementations > +≡ void syscall_handler (context_t *r) {
void (*handler) (context_t*); // handler is a function pointer
int number = r->eax;
handler = syscall_table[number];
if (handler != 0) {
handler (r);
} else {
printf ("Unknown syscall no. eax=0x%x; ebx=0x%x. eip=0x%x, esp=0x%x. "
"Continuing.\n", r->eax, r->ebx, r->eip, r->esp);
};
return;
}
|
We will later add system call handlers to a special code chunk
named < syscall functions > and put their prototypes in
< syscall prototypes >
| |
|
< function prototypes > +≡ < syscall prototypes >
|
| |
|
< function implementations > +≡ < syscall functions >
|
We add a handler for interrupt \hex{80} which looks just like
our regular interrupt handlers for hardware-generated interrupts
(and also like the fault handlers).
The difference is that in this case we call neither
irq_handler nor fault_handler, but our new C function
syscall_handler. Apart from that we perform the
same preparation as in the assembler code which you've already
seen: We store the context in the proper order on the stack so
that syscall_handler which takes a context_t *r as
argument can evaluate and possibly change them.
| |
|
< start.asm > ≡ [section .text]
extern syscall_handler
global isr128
isr128: push byte 0 ; put 128 on the stack so it looks the same
; push byte 128 ; as it does after a hardware interrupt
push byte -128 ; (getting rid of nasm error for signed byte)
< push registers onto the stack >
call syscall_handler
< pop registers from the stack >
add esp, 8 ; undo the two "push byte" commands from the start
iret
|
(In case you have forgotten it: < push registers onto the stack > pushes
the general purpose registers as well as \register{DS}, \register{ES}, \register{FS},
\register{GS}, and \register{ESP} onto the stack while
< pop registers from the stack > pops them back in reverse order. We used
this code when we introduced the interrupt handlers.)
Making System Calls
Actually making a system call works just like in the Linux example
we've shown earlier: load the system call number in \register{EAX},
load arguments for the syscall in the next registers (\register{EBX},
\register{ECX}, \dots), and execute int 0x80. The return value of
the system call can then be read from \register{EAX}.
The following functions standardize this process. We will not need them
in the kernel, but the user mode library will later use them:
| |
|
< standard functions for making system calls > ≡ inline int syscall1 (int eax) {
int result;
asm ( "int $0x80" : "=a" (result) : "a" (eax) );
return result ;
}
inline int syscall2 (int eax, int ebx) {
int result;
asm ( "int $0x80" : "=a" (result) : "a" (eax), "b" (ebx) );
return result ;
}
inline int syscall3 (int eax, int ebx, int ecx) {
int result;
asm ( "int $0x80" : "=a" (result) : "a" (eax), "b" (ebx), "c" (ecx) );
return result ;
}
inline int syscall4 (int eax, int ebx, int ecx, int edx) {
int result;
asm ( "int $0x80" : "=a" (result) : "a" (eax), "b" (ebx), "c" (ecx), "d" (edx) );
return result ;
}
|
As an example look at the write function which has the prototype
| |
|
< example: write() prototype > ≡ int write (int fd, const void *buf, int nbyte);
|
It takes three arguments, thus an implementation in a user mode
library would look like this:
| |
|
< example: write() implementation > ≡ int write (int fd, const void *buf, int nbyte) {
return syscall4 (__NR_write, fd, (int)buf, nbyte);
}
|
For increased Linux compatibility we will use the same system call numbers
as Linux does---at least for those calls that ULIX-i386 does also provide.
The following definitions were taken from the 32-bit
Linux\footnote{Ubuntu 11.10, http://www.ubuntu.com/} file
/usr/include/i386-linux-gnu/asm/unistd_32.h:
TODO: In the end: remove all numbers which are not used
{\small
| |
|
< linux system calls > ≡ #define __NR_yield 66 // not from Linux
#define __NR_exit 1
#define __NR_fork 2
#define __NR_read 3
#define __NR_write 4
#define __NR_open 5
#define __NR_close 6
#define __NR_waitpid 7
#define __NR_creat 8
#define __NR_link 9
#define __NR_unlink 10
#define __NR_execve 11
#define __NR_chdir 12
#define __NR_time 13
#define __NR_mknod 14
#define __NR_chmod 15
#define __NR_lchown 16
#define __NR_break 17
#define __NR_lseek 19
#define __NR_getpid 20
#define __NR_mount 21
#define __NR_umount 22
#define __NR_alarm 27
#define __NR_utime 30
#define __NR_access 33
#define __NR_nice 34
#define __NR_ftime 35
#define __NR_sync 36
#define __NR_kill 37
#define __NR_rename 38
#define __NR_mkdir 39
#define __NR_rmdir 40
#define __NR_dup 41
#define __NR_pipe 42
#define __NR_times 43
#define __NR_brk 45
#define __NR_signal 48
#define __NR_lock 53
#define __NR_ulimit 58
#define __NR_umask 60
#define __NR_chroot 61
#define __NR_dup2 63
#define __NR_getppid 64
#define __NR_sigaction 67
#define __NR_sigsuspend 72
#define __NR_sigpending 73
#define __NR_symlink 83
#define __NR_readlink 85
#define __NR_readdir 89
#define __NR_mmap 90
#define __NR_munmap 91
#define __NR_truncate 92
#define __NR_ftruncate 93
#define __NR_fchmod 94
#define __NR_fchown 95
#define __NR_getpriority 96
#define __NR_setpriority 97
#define __NR_stat 106
#define __NR_lstat 107
#define __NR_fstat 108
#define __NR_wait4 114
#define __NR_sigreturn 119
#define __NR_uname 122
#define __NR_sigprocmask 126
#define __NR_fchdir 133
#define __NR_getdents 141
#define __NR_nanosleep 162
#define __NR_mremap 163
#define __NR_chown 182
#define __NR_getcwd 183
#define __NR_lchown32 198
#define __NR_getuid32 199
#define __NR_getgid32 200
#define __NR_geteuid32 201
#define __NR_getegid32 202
#define __NR_setreuid32 203
#define __NR_setregid32 204
#define __NR_getgroups32 205
#define __NR_setgroups32 206
#define __NR_fchown32 207
#define __NR_setresuid32 208
#define __NR_getresuid32 209
#define __NR_setresgid32 210
#define __NR_getresgid32 211
#define __NR_chown32 212
#define __NR_setuid32 213
#define __NR_setgid32 214
#define __NR_setfsuid32 215
#define __NR_setfsgid32 216
#define __NR_waitid 284
#define __NR_openat 295
#define __NR_tee 315
#define __NR_dup3 330
#define __NR_pipe2 331
|
}
| |
|
< constants > +≡ < linux system calls >
< ulix system calls >
|
|