LiPPGen

The Literate-Programming-based Presentation Generator

Version 1.0, (c) 2013 Hans-Georg Eßer

Title:
Author:
Organisation:

TEXTCODE

System Calls

When the operating system is in kernel mode, it has access to all its internal data and code: it may call any internal function and, for example, open files on disks or change hardware settings. Processes on the other hand cannot do the same: even though ULIX-i386 maps the kernel memory in all page tables, processes cannot access it because the protection bits in the page table define that this memory area may only be used when the system runs in ring 0---and processes run in ring 3 (user mode). Even if a process was allowed to call kernel functions (by setting up the page tables differently) that would not help much since privileged machine instructions such as in and out (for talking to hardware devices) cannot be executed in ring 3. All operating systems provide system calls as a way to access these needed kernel functions: on ULIX-i386 they allow a controlled switch from user mode to kernel mode via the int instruction which switches to ring 0 and executes a pre-defined interrupt handler. ... While we implement system calls, we will also create functions for the standard library that user mode programs must link in order to conveniently talk to the operating system via functions such as fork, open, read, etc. There are several ways to implement system calls. Let's first look at the way system calls can be called from user space. On 32 bit Intel CPUs, Linux does it via software interrupt 0x80 with arguments in registers:
< example for system calls in linux > ≡
  _start:                         ; tell linker entry point
        mov edx,len               ; message length
        mov ecx,msg               ; message to write
        mov ebx,1                 ; file descriptor (stdout)
        mov eax,4                 ; system call number (sys_write)
        int 0x80                  ; software interrupt 0x80
        mov eax,1                 ; system call number (sys_exit)
        int 0x80                  ; software interrupt 0x80

  section .data
  msg   db 'Hello, world!',0xa    ; the string to be printed
  len   equ $ - msg               ; length of the string
(This example was taken from http://asm.sourceforge.net/intro/hello.html; the comments were modified.) On a Linux machine you could assemble, link, and run this file with \begin{Verbatim} $ nasm -f elf test.asm $ ld test.o -o test $ ./test Hello, world! \end{Verbatim} In this program \register{EAX} always holds the system call number, the other registers (in this example \register{EBX}, \register{ECX}, and \register{EDX} are used for arguments. System call 4 is the sys_write syscall. Other operating systems put arguments on the stack or into specific memory areas. We will stick with the Linux way because it is simple to use registers. Since adding assembler code to C programs for every system call, standard libraries make things simpler for the application developer; this can be done in two steps:
  • Supplying a generic syscall function (that takes an arbitrary number of arguments) reduces the above code to executing \begin{Verbatim} char *msg = "Hello, world!\n"; syscall (4, 1, msg, strlen(msg)); \end{Verbatim}
  • But that is still unreadable, and also it is not portable because system call numbers are not identical across different Unix versions. Thus, for all standard system calls, some library provides the better known functions (such as write) which allow the above code to be written as \begin{Verbatim} char *msg = "Hello, world!\n"; write (STDOUT_FILENO, msg, strlen(msg)); \end{Verbatim} (with the constant STDOUT_FILENO set to 1).

System Calls in ULIXI{

} ULIX-i386 provides functions for adding (or modifying) system calls to the system and a generic system call handler. For this purpose, we create a system call table syscall_table that contains pointers to functions, so for example, syscall_table[4] should contain the address of ULIX-i386's sys_write function. If a system call is not defined, the table entry is a null pointer, so we can initialize the whole table with null bytes:
< constants > ≡
  #define MAX_SYSCALLS 0x8000         // max syscall number: 0x7fff
< global variables > ≡
  void *syscall_table[MAX_SYSCALLS];
Telling ULIX-i386 what function to execute when a specific system call is made is as simple as writing the address into the proper array entry. Nevertheless, we provide a function
< function prototypes > ≡
  void install_syscall_handler (int syscallno, void *syscall_handler);
which enters the handler address:
< function implementations > ≡
  void install_syscall_handler (int syscallno, void *syscall_handler) {
    if (syscallno < MAX_SYSCALLS) 
      syscall_table[syscallno] = syscall_handler;
    return;
  };
So if we have already defined a function sys_write and declared the system call number __NR_write, we could activate the write system call by calling
< syscall entry example > ≡
  install_syscall_handler (__NR_write, sys_write);
The actual system call handler simply checks if there is a handler for the given system call number and (if so) calls it:
< function implementations > +≡
  void syscall_handler (context_t *r) {
    void (*handler) (context_t*);   // handler is a function pointer
    int number = r->eax;
    handler = syscall_table[number];
    if (handler != 0) {
      handler (r);
    } else {
      printf ("Unknown syscall no. eax=0x%x; ebx=0x%x. eip=0x%x, esp=0x%x. "
              "Continuing.\n", r->eax, r->ebx, r->eip, r->esp);
    };
    return;
  }
We will later add system call handlers to a special code chunk named < syscall functions > and put their prototypes in < syscall prototypes >
< function prototypes > +≡
  syscall prototypes >
< function implementations > +≡
  syscall functions >
We add a handler for interrupt \hex{80} which looks just like our regular interrupt handlers for hardware-generated interrupts (and also like the fault handlers). The difference is that in this case we call neither irq_handler nor fault_handler, but our new C function syscall_handler. Apart from that we perform the same preparation as in the assembler code which you've already seen: We store the context in the proper order on the stack so that syscall_handler which takes a context_t *r as argument can evaluate and possibly change them.
< start.asm > ≡
  [section .text]
           extern syscall_handler
           global isr128

  isr128:  push byte 0       ; put 128 on the stack so it looks the same
           ; push byte 128   ; as it does after a hardware interrupt
           push byte -128    ; (getting rid of nasm error for signed byte)
           
push registers onto the stack >
           call syscall_handler
           
pop registers from the stack >
           add  esp, 8       ; undo the two "push byte" commands from the start
           iret
(In case you have forgotten it: < push registers onto the stack > pushes the general purpose registers as well as \register{DS}, \register{ES}, \register{FS}, \register{GS}, and \register{ESP} onto the stack while < pop registers from the stack > pops them back in reverse order. We used this code when we introduced the interrupt handlers.)

Making System Calls

Actually making a system call works just like in the Linux example we've shown earlier: load the system call number in \register{EAX}, load arguments for the syscall in the next registers (\register{EBX}, \register{ECX}, \dots), and execute int 0x80. The return value of the system call can then be read from \register{EAX}. The following functions standardize this process. We will not need them in the kernel, but the user mode library will later use them:
< standard functions for making system calls > ≡
  inline int syscall1 (int eax) {
    int result;
    asm ( "int $0x80" : "=a" (result) : "a" (eax) );
    return result ;
  }

  inline int syscall2 (int eax, int ebx) {
    int result;
    asm ( "int $0x80" : "=a" (result) : "a" (eax), "b" (ebx) );
    return result ;
  }

  inline int syscall3 (int eax, int ebx, int ecx) {
    int result;
    asm ( "int $0x80" : "=a" (result) : "a" (eax), "b" (ebx), "c" (ecx) );
    return result ;
  }

  inline int syscall4 (int eax, int ebx, int ecx, int edx) {
    int result;
    asm ( "int $0x80" : "=a" (result) : "a" (eax), "b" (ebx), "c" (ecx), "d" (edx) );
    return result ;
  }
As an example look at the write function which has the prototype
< example: write() prototype > ≡
  int write (int fd, const void *buf, int nbyte);
It takes three arguments, thus an implementation in a user mode library would look like this:
< example: write() implementation > ≡
  int write (int fd, const void *buf, int nbyte) {
    return syscall4 (__NR_write, fd, (int)buf, nbyte);
  }
For increased Linux compatibility we will use the same system call numbers as Linux does---at least for those calls that ULIX-i386 does also provide. The following definitions were taken from the 32-bit Linux\footnote{Ubuntu 11.10, http://www.ubuntu.com/} file /usr/include/i386-linux-gnu/asm/unistd_32.h:

TODO: In the end: remove all numbers which are not used

{\small

< linux system calls > ≡
  #define __NR_yield               66   // not from Linux
  #define __NR_exit                 1
  #define __NR_fork                 2
  #define __NR_read                 3
  #define __NR_write                4
  #define __NR_open                 5
  #define __NR_close                6
  #define __NR_waitpid              7
  #define __NR_creat                8
  #define __NR_link                 9
  #define __NR_unlink              10
  #define __NR_execve              11
  #define __NR_chdir               12
  #define __NR_time                13
  #define __NR_mknod               14
  #define __NR_chmod               15
  #define __NR_lchown              16
  #define __NR_break               17
  #define __NR_lseek               19
  #define __NR_getpid              20
  #define __NR_mount               21
  #define __NR_umount              22
  #define __NR_alarm               27
  #define __NR_utime               30
  #define __NR_access              33
  #define __NR_nice                34
  #define __NR_ftime               35
  #define __NR_sync                36
  #define __NR_kill                37
  #define __NR_rename              38
  #define __NR_mkdir               39
  #define __NR_rmdir               40
  #define __NR_dup                 41
  #define __NR_pipe                42
  #define __NR_times               43
  #define __NR_brk                 45
  #define __NR_signal              48
  #define __NR_lock                53
  #define __NR_ulimit              58
  #define __NR_umask               60
  #define __NR_chroot              61
  #define __NR_dup2                63
  #define __NR_getppid             64
  #define __NR_sigaction           67
  #define __NR_sigsuspend          72
  #define __NR_sigpending          73
  #define __NR_symlink             83
  #define __NR_readlink            85
  #define __NR_readdir             89
  #define __NR_mmap                90
  #define __NR_munmap              91
  #define __NR_truncate            92
  #define __NR_ftruncate           93
  #define __NR_fchmod              94
  #define __NR_fchown              95
  #define __NR_getpriority         96
  #define __NR_setpriority         97
  #define __NR_stat               106
  #define __NR_lstat              107
  #define __NR_fstat              108
  #define __NR_wait4              114
  #define __NR_sigreturn          119
  #define __NR_uname              122
  #define __NR_sigprocmask        126
  #define __NR_fchdir             133
  #define __NR_getdents           141
  #define __NR_nanosleep          162
  #define __NR_mremap             163
  #define __NR_chown              182
  #define __NR_getcwd             183
  #define __NR_lchown32           198
  #define __NR_getuid32           199
  #define __NR_getgid32           200
  #define __NR_geteuid32          201
  #define __NR_getegid32          202
  #define __NR_setreuid32         203
  #define __NR_setregid32         204
  #define __NR_getgroups32        205
  #define __NR_setgroups32        206
  #define __NR_fchown32           207
  #define __NR_setresuid32        208
  #define __NR_getresuid32        209
  #define __NR_setresgid32        210
  #define __NR_getresgid32        211
  #define __NR_chown32            212
  #define __NR_setuid32           213
  #define __NR_setgid32           214
  #define __NR_setfsuid32         215
  #define __NR_setfsgid32         216
  #define __NR_waitid             284
  #define __NR_openat             295
  #define __NR_tee                315
  #define __NR_dup3               330
  #define __NR_pipe2              331
}
< constants > +≡
  linux system calls >
  
ulix system calls >