Synopsis: Linux kernel i386 SMP page fault handler privilege escalation Product: Linux kernel Version: 2.4 up to and including 2.4.29-rc1, 2.6 up to and including 2.6.10 Vendor: http://www.kernel.org/ URL: http://isec.pl/vulnerabilities/isec-0022-pagefault.txt CVE: CAN-2005-0001 Author: Paul Starzetz Date: Jan 12, 2005 Issue: ====== Locally exploitable flaw has been found in the Linux page fault handler code that allows users to gain root privileges if running on multiprocessor machine. Details: ======== The Linux kernel is the core software component of a Linux environment and is responsible for handling of machine resources. One of the functions of an operating system kernel is handling of virtual memory. On Linux virtual memory is provided on demand if an application accesses virtual memory areas. One of the core components of the Linux VM subsystem is the page fault handler that is called if applications try to access virtual memory currently not physically mapped or not available in their address space. The page fault handler has the function to properly identify the type of the requested virtual memory access and take the appropriate action to allow or deny application's VM request. Actions taken may also include a stack expansion if the access goes just below application's actual stack limit. An exploitable race condition exists in the page fault handler if two concurrent threads sharing the same virtual memory space request stack expansion at the same time. It is only exploitable on multiprocessor machines (that also includes systems with hyperthreading). Discussion: =========== The vulnerable code resides for the i386 architecture in arch/i386/mm/fault.c in your kernel source code tree: [186] down_read(&mm->mmap_sem); vma = find_vma(mm, address); if (!vma) goto bad_area; if (vma->vm_start <= address) goto good_area; if (!(vma->vm_flags & VM_GROWSDOWN)) goto bad_area; if (error_code & 4) { /* * accessing the stack below %esp is always a bug. * The "+ 32" is there due to some instructions (like * pusha) doing post-decrement on the stack and that * doesn't show up until later.. */ [*] if (address + 32 < regs->esp) goto bad_area; } if (expand_stack(vma, address)) goto bad_area; where the line number has been given for the kernel 2.4.28 version. Since the page fault handler is executed with the mmap_sem semaphore held for reading only, two concurrent threads may enter the section after the line 186. The checks following line 186 ensure that the VM request is valid and in case it goes just below the actual stack limit [*], that the stack is expanded accordingly. On Linux the notion of stack includes any VM_GROWSDOWN virtual memory area, that is, it need not to be the actual process's stack. The exploitable race condition scenario looks as follows: A. thread_1 accesses a VM_GROWSDOWN area just below its actual starting address, lets call it fault_1, B. thread_2 accesses the same area at address fault_2 where fault_2 + PAGE_SIZE <= fault_1, that is: [ NOPAGE ] [fault_1 ] [ VMA ] ---> higher addresses [fault_2 ] [ NOPAGE ] [ VMA ] where one [] bracket pair stands for a page frame in the application's page table. C. if thread_2 is slightly faster than thread_1 following happens: [ PAGE2 ] [PAGE1 VMA ] that is, the stack is first expanded inside the expand_stack() function to cover fault_2, however it is right after 'expanded' to cover only fault_1 since the necessary checks have already been passed. In other words, the process's page table includes now two page references (PTEs) but only one is covered by the virtual memory area descriptor (namely only page1). The race window is very small but it is exploitable. Once the reference to page2 is available in the page table, it can be freely read or written by both threads. It will also not be released to the virtual memory management on process termination. Similar techniques like in http://www.isec.pl/vulnerabilities/isec-0014-mremap-unmap.txt may be further used to inject these lost page frames into a setuid application in order to gain elevated privileges (due to kmod this is also possible without any executable setuid binaries). Impact: ======= Unprivileged local users can gain elevated (root) privileges on SMP machines. Credits: ======== Paul Starzetz has identified the vulnerability and performed further research. RedHat reported that a customer also pointed out some problems with the page fault handler on SMP about 20.12.2004 and they already included a patch for this vulnerability in the kernel-2.4.21-27.EL release, however the bug did not make it to the security division. COPYING, DISTRIBUTION, AND MODIFICATION OF INFORMATION PRESENTED HERE IS ALLOWED ONLY WITH EXPRESS PERMISSION OF ONE OF THE AUTHORS. Disclaimer: =========== This document and all the information it contains are provided "as is", for educational purposes only, without warranty of any kind, whether express or implied. The authors reserve the right not to be responsible for the topicality, correctness, completeness or quality of the information provided in this document. Liability claims regarding damage caused by the use of any information provided, including any kind of information which is incomplete or incorrect, will therefore be rejected. Appendix: ========= A proof of concept code won't be disclosed now. Special thanks goes to OSDL and Marcelo Tosatti for providing a SMP testbed.