从ucore来总结操作系统(3)----页面置换与Page Fault

页面置换与Page Fault

这篇是lab3的内容，主要就是实操一下页面置换算法和当缺页发生时的处理。

0X00 环境准备

lab3也是基于lab2和lab1的，所以我们先用meld工具把lab2(lab2是包含lab1的)合并到lab3中已有的文件上。

在lab2中，我们已经能够管理物理内存了(组织成块，并提供first fit分配算法、实现虚拟地址映射到PTE上)，有关内存的数据结构和相关操作都是直接针对实际存在的资源--物理内存空间的管理，没有从一般应用程序对内存的“需求”考虑，即需要有相关的数据结构和操作来体现一般应用程序对虚拟内存的“需求”。一般应用程序的对虚拟内存的“需求”与物理内存空间的“供给”没有直接的对应关系，ucore是通过page fault异常处理来间接完成这二者之间的衔接。因此，在lab3中，将进一步实现虚拟内存的缺页中断和页面置换。

比如程序可以预先申请出一些内存，但是操作系统并不分配，直到程序用到这些内存所在的页面时，操作系统才通过page fault异常来分配(linux下的写时分配)。

但是page_fault函数不知道哪些是“合法”的虚拟页，原因是ucore还缺少一定的数据结构来描述这种不在物理内存中的“合法”虚拟页。为此ucore通过建立mm_struct和vma_struct数据结构，描述了ucore模拟应用程序运行所需的合法内存空间。当访问内存产生page fault异常时，可获得访问的内存的方式（读或写）以及具体的虚拟内存地址，这样ucore就可以查询此地址，看是否属于vma_struct数据结构中描述的合法地址范围中，如果在，则可根据具体情况进行请求调页/页换入换出处理（这就是练习2涉及的部分）；如果不在，则报错。mm_struct和vma_struct数据结构结合页表表示虚拟地址空间和物理地址空间的示意图如下所示：


// 描述一个进程的虚拟地址空间 每个进程的 pcb 中 会有一个指针指向本结构体
struct mm_struct
{
    // 链接同一页目录表的虚拟内存空间 的 双向链表的 头节点
    list_entry_t mmap_list;
    // 当前正在使用的虚拟内存空间
    struct vma_struct *mmap_cache;
    // mm_struct 所维护的页表地址(拿来找 PTE)
    // 通过访问pgdir可以查找某虚拟地址对应的页表项是否存在以及页表项的属性等
    pde_t *pgdir;
    // 虚拟内存块的数目
    int map_count;
    // 记录访问情况链表头地址(用于置换算法)
    void *sm_priv;
};

// 虚拟内存空间
struct vma_struct
{
    // 虚拟内存空间属于的进程
    struct mm_struct *vm_mm;
    // 连续地址的虚拟内存空间的起始位置和结束位置
    uintptr_t vm_start;
    uintptr_t vm_end;
    // 虚拟内存空间的属性 (读/写/执行)
    uint32_t vm_flags;
    // 双向链表 从小到大将虚拟内存空间链接起来
    list_entry_t list_link;
};

mm_struct应该是整个进程的虚拟内存描述表，记录了整个进程所拥有的虚拟地址

vma_struct是描述了一个有效的虚拟内存空间的信息，比如虚拟内存空间的起始地址、结束地址、属性等。

由于进程的虚拟空间是固定的，所以只有一个mm_struct；但是进程的占用的虚拟空间是可以是不连续的、分块的，所以一个mm_struct可以包含多个vma_struct。

对于vma_struct，vm_start和vm_end描述了一个连续地址的虚拟内存空间的起始位置和结束位置，这两个值都应该是PGSIZE 对齐的，而且描述的是一个合理的地址空间范围（即严格确保 vm_start < vm_end的关系）；list_link是一个双向链表，按照从小到大的顺序把一系列用vma_struct表示的虚拟内存空间链接起来，并且还要求这些链起来的vma_struct应该是不相交的，即vma之间的地址空间无交集；vm_flags表示了这个虚拟内存空间的属性，有只读、可读写、可执行。vm_mm是一个指针，指向一个比vma_struct更高的抽象层次的数据结构mm_struct，这里把一个mm_struct结构的变量简称为mm变量。

对于mm_struct，这个数据结构表示了包含所有虚拟内存空间的共同属性。其中，mmap_list是双向链表头，链接了所有属于同一页目录表的虚拟内存空间(目前只有一个目录表PDT)，mmap_cache是指向当前正在使用的虚拟内存空间，由于操作系统执行的“局部性”原理，当前正在用到的虚拟内存空间在接下来的操作中可能还会用到，这时就不需要查链表，而是直接使用此指针就可找到下一次要用到的虚拟内存空间。由于mmap_cache 的引入，可使得 mm_struct 数据结构的查询加速 30% 以上。pgdir 所指向的就是 mm_struct数据结构所维护的页表。通过访问pgdir可以查找某虚拟地址对应的页表项是否存在以及页表项的属性等。map_count记录mmap_list 里面链接的 vma_struct的个数。sm_priv指向用来链接记录页访问情况的链表头，这建立了mm_struct和后续要讲到的swap_manager之间的联系。

总而言之就是 mm_struct 描述了整个进程的虚拟地址空间而 vma_struct 描述了进程中的一小部分虚拟内存空间

除此之外，page页面结构体页新增了两个成员


struct Page {
    // 引用计数
    int ref;                        // page frame's reference counter
    // 用两个bit表示两种属性
    // 1. PG_reserved位 此位为1时表示该页是内核保留的，不可分配
    // 2. PG_property位 此位为1时表示可分配，为0表示已分配
    uint32_t flags;                 // array of flags that describe the status of the page frame
    // 空闲块大小(以物理页位大小)，仅在当前页是空闲块的第一个物理页时有效
    unsigned int property;          // the num of free block, used in first fit pm manager
    list_entry_t page_link;         // free list link

    // 下面是新增的成员
    list_entry_t pra_page_link;     // 用于连接上一个和下一个*可交换已分配*的物理页
    uintptr_t pra_vaddr;            // 用于保存该物理页所对应的虚拟地址。
};

0X01 缺页中断、写时创建和页面置换算法

此时我们需要完成kern/mm/vmm.c中的，do_pgfault函数，这个函数的作用是处理缺页中断。当缺页发生的时候，CPU会先把产生缺页中断的虚拟地址存放进CR2寄存器，然后把错误码等压入堆栈，并通过__alltraps->trap->trap_dispatch->pgfault_handler->do_pgfault这样调用栈来最终由do_pgfault函数处理缺页中断。

缺页中断实质上还是需要硬件的支持。在保护模式下，CPU引用一个段需要查看段描述符，段描述符存在段描述符表中。但有的时候，对应的段并不存在于内存中(CPU允许段描述符表中注册的段不再内存中，这是它提供给软件的策略)，此时硬件产生缺页中断。

不过，不同体系的处理器产生缺页中断的过程不一致，所以系统如何检测到缺页发生可以当作是硬件黑盒

同样的，这个函数中同样存在大量的注释和已经写好的代码。


/* do_pgfault - interrupt handler to process the page fault execption
 * @mm         : the control struct for a set of vma using the same PDT
 * @error_code : the error code recorded in trapframe->tf_err which is setted by x86 hardware
 * @addr       : the addr which causes a memory access exception, (the contents of the CR2 register)
 *
 * CALL GRAPH: trap--> trap_dispatch-->pgfault_handler-->do_pgfault
 * The processor provides ucore's do_pgfault function with two items of information to aid in diagnosing
 * the exception and recovering from it.
 *   (1) The contents of the CR2 register. The processor loads the CR2 register with the
 *       32-bit linear address that generated the exception. The do_pgfault fun can
 *       use this address to locate the corresponding page directory and page-table
 *       entries.
 *   (2) An error code on the kernel stack. The error code for a page fault has a format different from
 *       that for other exceptions. The error code tells the exception handler three things:
 *         -- The P flag   (bit 0) indicates whether the exception was due to a not-present page (0)
 *            or to either an access rights violation or the use of a reserved bit (1).
 *         -- The W/R flag (bit 1) indicates whether the memory access that caused the exception
 *            was a read (0) or write (1).
 *         -- The U/S flag (bit 2) indicates whether the processor was executing at user mode (1)
 *            or supervisor mode (0) at the time of the exception.
 */
int do_pgfault(struct mm_struct *mm, uint32_t error_code, uintptr_t addr)
{
    int ret = -E_INVAL;
    // try to find a vma which include addr
    // 尝试找到一个包含addr的vma
    // 也就是看一下这个地址是不是在进程某个vma的范围内
    // vma管理的是所有有效地址，所以如果不在vma中，就是不合法的
    struct vma_struct *vma = find_vma(mm, addr);

    pgfault_num++;
    // If the addr is in the range of a mm's vma?
    // 没有vma包含这个地址，不合法
    if (vma == NULL || vma->vm_start > addr)
    {
        cprintf("not valid addr %x, and  can not find it in vma\n", addr);
        goto failed;
    }
    // check the error_code
    // 下面执行权限检查和判断访问页面是否存在物理页面中
    switch (error_code & 3)
    {
    default:
        // 进到此处，说明对页面是写请求，同时页面存在于物理内存中
        // 但是还没有检查权限，权限的检查是case 2中的代码
        // 如果权限也满足，就可以直接写入了，不需要置换
        /* error code flag : default is 3 ( W/R=1, P=1): write, present */
    case 2: /* error code flag : (W/R=1, P=0): write, not present */
        // 写页面，但是页面不存在
        // 检查下权限，没有写入权限就fail
        if (!(vma->vm_flags & VM_WRITE))
        {
            cprintf("do_pgfault failed: error code flag = write AND not present, but the addr's vma cannot write\n");
            goto failed;
        }
        break;
    case 1: /* error code flag : (W/R=0, P=1): read, present */
        // 读内存，且内存也存在于物理内存中
        // 肯定是可以直接读的，不应该进入缺页中断中，直接错误
        cprintf("do_pgfault failed: error code flag = read AND present\n");
        goto failed;
    case 0: /* error code flag : (W/R=0, P=0): read, not present */
        // 读内存，但是内存不存在于物理内存中
        // 检查下权限，没有读权限就fail
        if (!(vma->vm_flags & (VM_READ | VM_EXEC)))
        {
            cprintf("do_pgfault failed: error code flag = read AND not present, but the addr's vma cannot read or exec\n");
            goto failed;
        }
    }
    /* IF (write an existed addr ) OR
     *    (write an non_existed addr && addr is writable) OR
     *    (read  an non_existed addr && addr is readable)
     * THEN
     *    continue process
     */
    // 检查完权限，下面正式处理

    // 设置页表条目所对应的权限
    uint32_t perm = PTE_U;
    if (vma->vm_flags & VM_WRITE)
    {
        perm |= PTE_W;
    }
    // 内存对齐
    addr = ROUNDDOWN(addr, PGSIZE);

    ret = -E_NO_MEM;
    pte_t *ptep = NULL;
    /*LAB3 EXERCISE 1: YOUR CODE
     * Maybe you want help comment, BELOW comments can help you finish the code
     *
     * Some Useful MACROs and DEFINEs, you can use them in below implementation.
     * MACROs or Functions:
     *   get_pte : get an pte and return the kernel virtual address of this pte for la
     *             if the PT contians this pte didn't exist, alloc a page for PT (notice the 3th parameter '1')
     *   pgdir_alloc_page : call alloc_page & page_insert functions to allocate a page size memory & setup
     *             an addr map pa<--->la with linear address la and the PDT pgdir
     * DEFINES:
     *   VM_WRITE  : If vma->vm_flags & VM_WRITE == 1/0, then the vma is writable/non writable
     *   PTE_W           0x002                   // page table/directory entry flags bit : Writeable
     *   PTE_U           0x004                   // page table/directory entry flags bit : User can access
     * VARIABLES:
     *   mm->pgdir : the PDT of these vma
     *
     */
#if 1

    /*LAB3 EXERCISE 1: YOUR CODE*/
    // 先尝试获取pte，如果不存在，get_pte会自动创建一个pte表项
    ptep = get_pte(mm->pgdir, addr, 1); //(1) try to find a pte, if pte's PT(Page Table) isn't existed, then create a PT.
    if (ptep == NULL)
    {
        goto failed;
    }
    // 如果这个表项对应的物理地址不存在，那么就需要分配一个物理页
    if (*ptep == 0)
    {
        // 分配一个物理页，并让虚拟地址addr映射到这个物理页
        // 就是让之前分配的pte表项指向这个物理页
        if (pgdir_alloc_page(mm->pgdir, addr, perm) == NULL)
            goto failed;
        //(2) if the phy addr isn't exist, then alloc a page & map the phy addr with logical addr
    }
    else
    // 如果PTE存在，说明该页被置换到外存，需要换回来
    {
        /*LAB3 EXERCISE 2: YOUR CODE
         * Now we think this pte is a  swap entry, we should load data from disk to a page with phy addr,
         * and map the phy addr with logical addr, trigger swap manager to record the access situation of this page.
         *
         *  Some Useful MACROs and DEFINEs, you can use them in below implementation.
         *  MACROs or Functions:
         *    swap_in(mm, addr, &page) : alloc a memory page, then according to the swap entry in PTE for addr,
         *                               find the addr of disk page, read the content of disk page into this memroy page
         *    page_insert ： build the map of phy addr of an Page with the linear addr la
         *    swap_map_swappable ： set the page swappable
         */
        if (swap_init_ok)
        {
            struct Page *page = NULL;
            // 先把目标地址加载到内存page页上
            ret = swap_in(mm, addr, &page);
            if (ret != 0)
            {
                goto failed;
            }

            // 然后把page页映射到虚拟地址addr上，权限为perm
            ret = page_insert(mm->pgdir, page, addr, perm);

            // 最后把page页设置为可交换的
            swap_map_swappable(mm, addr, page, 1);

            // 然后设置一下page页的虚拟地址
            page->pra_vaddr = addr;
            //(1）According to the mm AND addr, try to load the content of right disk page
            //    into the memory which page managed.
            //(2) According to the mm, addr AND page, setup the map of phy addr <---> logical addr
            //(3) make the page swappable.
        }
        else

        {
            cprintf("no swap_init_ok but ptep is %x, failed\n", *ptep);
            goto failed;
        }
    }
#endif
    ret = 0;
failed:
    return ret;
}

这个函数就是处理缺页中断的函数，它做的事情很简单：

检查访问是否合法(vma、权限)
访问PTE检查物理页面是否存在，说明物理页面本身不存在，不存在则创建一个物理页面
如果PTE存在，那么说明物理页面之前被换出去了，那么再调用页面置换算法换回来

然后实现一下FIFO页面置换算法，主要是实现kern/mm/swap_fifo.c中完成map_swappable和swap_out_victim函数。


static int _fifo_map_swappable(struct mm_struct *mm, uintptr_t addr, struct Page *page, int swap_in)
{
    list_entry_t *head = (list_entry_t *)mm->sm_priv;
    list_entry_t *entry = &(page->pra_page_link);

    assert(entry != NULL && head != NULL);
    // record the page access situlation
    /*LAB3 EXERCISE 2: YOUR CODE*/
    // 就是将这一页加入到链表头中(最近访问过的放前面) 使其可以被置换算法使用到
    list_add(head, entry);
    //(1)link the most recent arrival page at the back of the pra_list_head qeueue.
    return 0;
}
/*
 *  (4)_fifo_swap_out_victim: According FIFO PRA, we should unlink the  earliest arrival page in front of pra_list_head qeueue,
 *                            then set the addr of addr of this page to ptr_page.
 */
// 选择页面置换出去
static int _fifo_swap_out_victim(struct mm_struct *mm, struct Page **ptr_page, int in_tick)
{
    list_entry_t *head = (list_entry_t *)mm->sm_priv;
    assert(head != NULL);
    assert(in_tick == 0);
    /* Select the victim */
    /*LAB3 EXERCISE 2: YOUR CODE*/
    // 选择最早的页面作为置换出去的页面
    list_entry_t *victim = head->prev;
    // 查找到该页面的链表项
    *ptr_page = le2page(victim, pra_page_link);
    // 将该页面从链表中删除
    list_del(victim);
    //(1)  unlink the  earliest arrival page in front of pra_list_head qeueue
    //(2)  set the addr of addr of this page to ptr_page
    return 0;
}

由于mm_struct中维护了访问的页面的顺序，所以通过其可以轻松的找出需要置换的页面并删除。

0X02 总结

通过这个lab3，我们学习了以下内容:

缺页中断的处理：检查权限、检查页面位置、置换或分配
使用FIFO算法实现页面置换