0
  • 聊天消息
  • 系統(tǒng)消息
  • 評(píng)論與回復(fù)
登錄后你可以
  • 下載海量資料
  • 學(xué)習(xí)在線課程
  • 觀看技術(shù)視頻
  • 寫文章/發(fā)帖/加入社區(qū)
會(huì)員中心
創(chuàng)作中心

完善資料讓更多小伙伴認(rèn)識(shí)你,還能領(lǐng)取20積分哦,立即完善>

3天內(nèi)不再提示

如何利用RaviKiranUVS編寫一個(gè)最簡(jiǎn)單的文件系統(tǒng)詳細(xì)概述

Linux閱碼場(chǎng) ? 來(lái)源:未知 ? 作者:易水寒 ? 2018-04-29 16:15 ? 次閱讀

Writing a Simple File System

Author: Ravi Kiran UVS

1. ObjectiveWe will write a file system with very basic functionality. This is to understand the working of certain kernel code paths of the Virtual File System (VFS). This filesystem has only one file 'hello.txt'. We can read/write into it.2. IntroductionThe Linux VFS supports multiple file systems. The kernel does most of the work while the file system specific tasks are delegated to the individual file systems through the handlers. Instead of calling the functions directly the kernel uses various Operation Tables, which are a collection of handlers for each operation (these are actually structures of function pointers for each handlers/callbacks). The kernel calls the handler present in the table for the operation. This enables different file systems to register different handlers. This also enables the common tasks to be done before calling the handlers. This reduces the burden on the handlers which can then focus on the operation specific to that file system.File systems are identified by their names. The supported file systems can be seen using 'cat /proc/filesystems'. The first step is to register the file system with the kernel. Since we are using a kernel module, the file system registration is done during the module initialization. This registers handlers which will be called to fill the super block structure while mounting, a handler to do the cleanup during unmounting the file system. There are other handlers but these two are essential.The super block operations are set at the time of mounting. The operation tables for inodes and files are set when the inode is opened. The first step before opening an inode is lookup. The inode of a file is looked up by calling the lookup handler of the parent inode. But what about the root-most inode of the new file system? This has to be allocated at the time of mounting i.e., during the super block initialization.Once the operation tables are set on the data structures, the kernel calls the handlers depending on the operation.3. Data StructuresThis is a brief description about the data structures used in implementing our file system.a. File System Type (struct file_system_type)Definition found in include/linux/fs.hThis structure is used to register the filesystem with the kernel. This data structure is used by the kernel at the time of mounting a file system. We have to fill the 'name' field with the name of our file system (example "rkfs") and the handlers get_sb and kill_sb to allocate and release the super block objects.b. Super Block (struct super_block)Definition found in include/linux/fs.hThis stores the information about the mounted file system. The important fields to be filled are the operation table (s_ops field) and the root dentry (s_root). At the time of mounting a file system, the kernel calls the get_sb field of the file_system_type object (it identifies the correct file_system_type object based on the file system name) to get a super block object.c. Super Block Operations (struct super_operations)Definition found in include/linux/fs.hSuper block operations table.d. Inode (struct inode)Definition found in include/linux/fs.hInode object is the kernel representation of the low level file. We return the dentry of the root of our file system. We have to attach a proper inode also to the dentry.This structure has two operation tables i_op, i_fop i.e., inode operations and file operations respectively. We will implement one operation in the inode_operations - lookup.This is called when the kernel is resolving a path. The kernel starts from the ancestor (this can be the current working directory for relative paths or the root most directory for the absolute paths) and gets the dentry (also the inode) of a name component of the path from its parent. This is achieved by calling inode_operations.lookup on the inode of the parent entry.For example, when the kernel is resolving /parentdir/subdir, the lookup operation reaches the root most inode of the file system. This was already allocated during the super block initialization and stored in the s_root field. To resolve the 'parentdir' under the root most inode, the kernel creates a new dentry object, sets the name as 'parentdir' and calls lookup handler on inode of the root most inode. The handler is supposed to attach the inode to the dentry using d_add and return NULL if it was successful or an error code otherwise. Similarly, the lookup for 'subdir' is done by the parentdir inode. The dentry cache and the inode cache saves repeated lookups and boosts the performance.It is important for us to implement the lookup callback. This will be called by the open system call.e. Inode Operations (struct inode_operations)Definition found in include/linux/fs.hThis is the inode operations table with each field corresponding to a function pointer to handle the task. It has fields like mkdir, lookup etc. We are interested in lookup.f. Address Space Operations (struct address_space_operations)Definition found in include/linux/fs.hAddress space operations table.g. DEntry (struct dentry)Definition found in include/linux/dcache.hThe kernel uses dentries to represent the file system structure. dentries point to inode objects. This has pointers to store the parent-child relationship of the files. Inodes and files do not store any information about the hierarchy.h. File (struct file)Definition found in include/linux/fs.hFile object is used to store the process's information about the file. We dont have to fill any fields of the files directly. The kernel takes care of filling the proper fields but we have to implement the file operation callbacks. We register the file operation table when we return the inode object during lookup. The file operations are copied from the i_fop field of the inode object to the file object by the kernel.We will implement readdir in case of directories (while returning the inode, we have to set the file operation table based on the type of the file) and read/write in the case of regular files. We will have two file operation tables one for directories and the other for regular files.The relationship between files, dentries and objects is like this:

如何利用RaviKiranUVS編寫一個(gè)最簡(jiǎn)單的文件系統(tǒng)詳細(xì)概述

The following figure shows the important relationships between various VFS data structures. It does not show all the relationships though. Note that super block structure has a list of all open file objects, a list of dirty inodes and another list of locked inodes of the file system. Also the lists used for cache, lru and free lists are not shown.

如何利用RaviKiranUVS編寫一個(gè)最簡(jiǎn)單的文件系統(tǒng)詳細(xì)概述

i. File Operations (struct file_operations)Definition found in include/linux/fs.hThis is the file operations table with each field corresponding to a function pointer to handle the task. It has fields like read, write, readdir, llseek etc.All these structures have fields used by the kernel in maintaining internal data structures like lists and hash tables etc. So, we cannot use local/global obects. Kernel allocates the object and passes it to our functions so that we can fill the required fields. If we have to allocate the objects, we need to use the corresponding allocator functions.3. ImplementationFile System TypeThe filesystem is registered with the kernel during the module initialization. During this step, two handlers get_sb and kill_sb. get_sb is called at the time of mounting the file system. kill_sb is called at the time of unmounting the file system. The get_sb handler uses the kernel helper function get_sb_single to allocate the super block. We pass a callback, rkfs_fill_super, to this function which will be called to fill the super block structure. Since we dont have any specific task to do in the the kill_sb handler, we can use the kernel helper function kill_anon_super.Super Block OperationsWe'll register the read_inode and write_inode callbacks in the super operations table. read_inode will be called when the inode object is newly allocated. Inodes are identified by the inode numbers. Based on the inode number, the file system will have to resolve the inode object on the storage and fill the fields of the inode with the contents on the storage. In our case it is simple as we are implementing a memory based file system. File systems like ext2 will have to read the inode from the disk using the inode number. Depending on the file type, different handlers can be registered for the file operations table.write_inode handler will be called to sync the inode contents to the storage. In our case, we will update the file size variable in memory.Inode OperationsWe'll register the lookup callback in the inode operations table. A dentry object will be allocated by the kernel and passed to the handler. The name component is set on the dentry by the kernel. In the handler, we have to check whether an entry by that name exists under the parent inode. If an entry exists, the inode object is obtained by passing the super block object and the inode number to the function iget. This uses the inode cache and will allocate a new inode if it is not avaiable in the inode cache. (Note that when the inode is newly allocated, the read_inode handler of the super block is called to fill the inode). This inode is added to the dentry object using d_add. The return value should be a NULL on successful lookup or an error code otherwise.File OperationsWe'll register read, write and readdir handlers in the file operations table. read and write handlers are called to read and write data into the file. readdir is called to read the contents of a directory.The following table shows the fields we need to fill in the above data structures.

如何利用RaviKiranUVS編寫一個(gè)最簡(jiǎn)單的文件系統(tǒng)詳細(xì)概述

The following table shows the operation tables and the handlers used.

如何利用RaviKiranUVS編寫一個(gè)最簡(jiǎn)單的文件系統(tǒng)詳細(xì)概述

如何利用RaviKiranUVS編寫一個(gè)最簡(jiǎn)單的文件系統(tǒng)詳細(xì)概述

4. Entry pointsa. init_moduleThis is called when the module is loaded. We have to register our file system here. Fill the file_system_type strucure with name and read_super fields and call register_filesystem with the structure. For example,

如何利用RaviKiranUVS編寫一個(gè)最簡(jiǎn)單的文件系統(tǒng)詳細(xì)概述

b. file_system_type.get_sbThis will be called when the file system is mounted. We have to return a super block. We use the helper function get_sb_simple to do the super block allocation and also passing rkfs_fill_super callback to fill the super block object. The s_op field is set with the address of the super block operations table rkfs_sops. The root most inode of the file system has to be allocated at this stage. The dentry for it should be set on the s_root field of the super block. As mentioned earlier, this is the entry point of lookup operations into the file system.The inode object is allocated using the function iget. After initializing the inode, the dentry is allocated using the function d_alloc_root. This dentry is set to the s_root field of the super block.

如何利用RaviKiranUVS編寫一個(gè)最簡(jiǎn)單的文件系統(tǒng)詳細(xì)概述

如何利用RaviKiranUVS編寫一個(gè)最簡(jiǎn)單的文件系統(tǒng)詳細(xì)概述

Let us assume that our file system is mounted under /mnt/rkfs.c. file_system_type.kill_sbThis is called at the time of unmounting the file system. We use the helper function kill_anon_super for this.d. super_operations.read_inodeInodes objects should be allocated using the iget function. This uses the inode cache, adjusts the relationships of the inode with various data structures, updates the count etc. If the inode is not cached, it allocates a new inode and calls the read_inode handler of the super block if present. Inodes are identified by the inode numbers. The handler is expected to initialize the inode with the contents of the inode on the backend.e. super_operations.write_inodeThis will be called when the dirty inodes are flushed. The handler has to sync the inode contents to the backend.d. inode_operations.lookupThis will be called when the kernel is resolving a path. The lookup handler of the inode operation table of the parent inode is called to resolve a child. Remember that the dentry for the root most inode is already available in s_root field of the super block.For example, after mounting the file system under '/mnt/rkfs' if we want to see the contents using 'ls /mnt/rkfs', the kernel has to create a file object for the inode '/mnt/rkfs'. The kernel will create a file object with the dentry of the root most inode of the file system. For the command 'ls -l /mnt/rkfs/hello.txt', the kernel name lookup reaches the root most inode and the lookup handler will be called to set the inode of 'hello.txt'. The kernel allocates the dentry object and passes to the handler. If an inode exists for the name component, the inode has to be added to the dentry using d_add and NULL should be returned. If there is some problem, a suitable error code has to be returned.

如何利用RaviKiranUVS編寫一個(gè)最簡(jiǎn)單的文件系統(tǒng)詳細(xì)概述

e. file_operations.readdirThis will be called when the kernel wants to read the contents of a directory. The readdir handler of the file operations table of the file will be called to show the contents of the directory. The kernel passes the file object, the dirent structure and a callback to fill the dirent structure with the values of the contents of the directory. The values are added to the dirent structure using the 'filldir' callback. Since we support one hardcoded file 'hello.txt', we just have to return the values, '.', '..' and 'hello.txt'. File systems like ext2 will have to fetch the contents from the disk.

如何利用RaviKiranUVS編寫一個(gè)最簡(jiǎn)單的文件系統(tǒng)詳細(xì)概述

In our file system, we are supporting only one file i.e., hello.txt. So, the result of 'ls /mnt/rkfs' will be. .. hello.txtf. file_operations.readThis will be called when the kernel gets a read request for a file in our file system. The file object of the file to be read, the user-space buffer address, the maximum size of the buffer and the address of the offset (which contains the current offset and which has to be updated after successful read operation) are passed. The contents of the file have to be written to the buffer. Note that this is in the user-space. The data is copied to the user-space buffer using the function copy_to_user.There are two ways of supporting this operation. One way is to provide a read handler which writes the data to the buffer. But the drawback is that we cannot take advantage of the page cache. Files can also be read/written using the mmap (memory mapping). With this approach, we cannot support the mmap way of accessing the file (or it is very difficult to provide transparency i.e., file written after mapping and read with 'read' system call).The second way is to provide a unified way to read/write to the file for both the approaches i.e., calling the system calls directly or by mapping the file and reading/writing the contents in memory). This takes the advantage of page cache also. This is applicable to the write operation also.Let us take the second approach (the code for the first approach is also provided later). The approach is slightly different in this case. The contents of an inode are seen as chunks of pages and represented by addess_space object. This 'mapping' between the inode and the address space object is stored in the i_mapping field of the inode. To read some data from the page, the corresponding chunks/pages which holds the data are loaded into memory.Address Space Operations table is used to perform different operations on the address space object (a_ops field). The readpage handler of the table is used to read the contents of a page of the inode into memory. For example, if the page size is 4096, the data from 5000 to 6000 bytes is present in the 2nd page of the inode (similarly, the data from 4000 to 5000 is present in the pages 1 and 2).Since the actual work of reading the data is moved to address_space_operations.readpage handler, we can use the generic_file_read helper function as the read handler. This function get the pages of the data and copies to the user-space buffer. If the pages are not in the cache, it waits till the pages are loaded with the data using the 'readpage' handler.g. address_space_operations.readpageThe readpage handler has to fill the page with the contents of the inode. The index of the page is obtained from the 'index' field of the page structure.

如何利用RaviKiranUVS編寫一個(gè)最簡(jiǎn)單的文件系統(tǒng)詳細(xì)概述

h. file_operations.writeWe register generic_file_write as the write handler. This allocates pages and calls prepare_write handler on the address space object so that buffer head objects can be allocated for the page to perform the I/O to the device later. It copies the data from the user-space to the pages and calls commit_write on the address space object.File systems like ext2 generally have a specific implementation of prepare_write. The allocation of buffer head objects for the write operation will be same for most of the file systems except for the location of the buffer heads (buffer head is the kernel's copy of a disk block). In this case, they use the helper/wrapper function block_prepare_write and passing a callback (in case of ext2, it is ext2_get_block) which will give the block number for the file offset.Since writing the buffers associated with the pages to the device is similar to most of the file systems, they normally use generic_commit_write helper function. This marks the buffer as dirty so that it will be flushed later to the device by the block device layer.This writes the data from user-space to the pages. The pages are synced later. Note that the read/write happens on the cached pages.i. address_space_operations.commit_writeThe generic_commit_write helper function sets up the buffers to be written to the disk. Since we are not using any device, this has been modified to write into the memory buffer of the file.

如何利用RaviKiranUVS編寫一個(gè)最簡(jiǎn)單的文件系統(tǒng)詳細(xì)概述

i. address_space_operations.writepageThis will be called when the dirty pages are flushed.

如何利用RaviKiranUVS編寫一個(gè)最簡(jiǎn)單的文件系統(tǒng)詳細(xì)概述

k. cleanup_moduleThis will be called when the module is removed. We have to unregister our file system at this point. The module count will be incremented and decremented by the file system calls. So the module will not be removed if the module use count is not zero. The kernel takes care of this, so we need not do anything to check if our file system is in use.5. codeNote: this has some debugging messages... you can remove all the 'printk's.rkfs.c6. Instructions to use the codeThis code works on the 2.6 kernels. I haven't tested it on 2.4 kernels.

Compile using the 2.6 build system. Create a Makefile like this: (or you can download the code herehttp://www2.comp.ufscar.br/~helio/fs/src.zip)

如何利用RaviKiranUVS編寫一個(gè)最簡(jiǎn)單的文件系統(tǒng)詳細(xì)概述

make

This generates a file rkfs.o. Load the module as root using

insmod rkfs.o

Mount the file system using

mount -t rkfs none /mnt/rkfs

Unmount using

umount /mnt/rkfs

Unload the module using

rmmod rkfs

聲明:本文內(nèi)容及配圖由入駐作者撰寫或者入駐合作網(wǎng)站授權(quán)轉(zhuǎn)載。文章觀點(diǎn)僅代表作者本人,不代表電子發(fā)燒友網(wǎng)立場(chǎng)。文章及其配圖僅供工程師學(xué)習(xí)之用,如有內(nèi)容侵權(quán)或者其他違規(guī)問題,請(qǐng)聯(lián)系本站處理。 舉報(bào)投訴
  • 文件系統(tǒng)
    +關(guān)注

    關(guān)注

    0

    文章

    284

    瀏覽量

    19871
  • root
    +關(guān)注

    關(guān)注

    1

    文章

    85

    瀏覽量

    21361

原文標(biāo)題:Ravi Kiran UVS: 寫一個(gè)最簡(jiǎn)單的文件系統(tǒng)

文章出處:【微信號(hào):LinuxDev,微信公眾號(hào):Linux閱碼場(chǎng)】歡迎添加關(guān)注!文章轉(zhuǎn)載請(qǐng)注明出處。

收藏 人收藏

    評(píng)論

    相關(guān)推薦

    STM32CubeMx入門教程(10):Fatfs文件系統(tǒng)的應(yīng)用

    導(dǎo)語(yǔ)"fatfs是個(gè)小型的文件系統(tǒng),在小型的嵌入式系統(tǒng)中使用非常的廣泛,STM32CubeMx自帶該文件系統(tǒng),我們通過
    發(fā)表于 07-12 11:39 ?4859次閱讀
    STM32CubeMx入門教程(10):Fatfs<b class='flag-5'>文件系統(tǒng)</b>的應(yīng)用

    嵌入式文件系統(tǒng)概述

    嵌入式Linux系統(tǒng)移植-(Linux文件系統(tǒng))嵌入式文件系統(tǒng)概述·文件是有組織、有次序地存儲(chǔ)在某種介質(zhì)上的
    發(fā)表于 11-05 06:29

    如何利用busybox制作個(gè)最小根文件系統(tǒng)

    本文介紹如何利用busybox制作個(gè)最小根文件系統(tǒng),系統(tǒng)包含ls、cd、ifconfig等基本指令,
    發(fā)表于 12-16 06:49

    基于μC/OS-II的文件系統(tǒng)設(shè)計(jì)

    本文提出了基于μC/OS-II 的個(gè)文件系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)方法。通過分析文件系統(tǒng)中的 層次結(jié)構(gòu)和功能模塊,給出了文件系統(tǒng)
    發(fā)表于 06-17 10:48 ?10次下載

    設(shè)備文件系統(tǒng)devfs詳細(xì)解析

    設(shè)備,到處都是設(shè)備 Devfs,也叫設(shè)備文件系統(tǒng)(Device Filesystem),設(shè)計(jì)它的唯目的就是提供個(gè)新的(更理性的)方式管理通常位于 /dev 的所有塊設(shè)備和字符設(shè)備。
    發(fā)表于 11-01 15:36 ?0次下載

    文件系統(tǒng)是什么?淺談EXT文件系統(tǒng)歷史

    在先前關(guān)于Linux文件系統(tǒng)的文章中,我很想去深入地討論更多EXT文件系統(tǒng)的特性的信息。所以,首先讓我們來(lái)回答這個(gè)問題:什么是文件系統(tǒng)?個(gè)
    發(fā)表于 06-28 09:03 ?5655次閱讀
    <b class='flag-5'>文件系統(tǒng)</b>是什么?淺談EXT<b class='flag-5'>文件系統(tǒng)</b>歷史

    簡(jiǎn)單介紹Linux虛擬文件系統(tǒng)–VFS

    Linux中可以支持多種文件系統(tǒng),而且支持各種文件系統(tǒng)之間相互訪問,這是因?yàn)橛?b class='flag-5'>一個(gè)虛擬文件系統(tǒng)。虛擬文件
    發(fā)表于 04-24 14:35 ?1323次閱讀

    嵌入式Linux文件系統(tǒng)詳細(xì)介紹

    Linux支持多種文件系統(tǒng),包括ext2、ext3、vfat、ntfs、iso9660、jffs、romfs和nfs等,為了對(duì)各類文件系統(tǒng)進(jìn)行統(tǒng)管理,Linux引入了虛擬文件系統(tǒng)VF
    發(fā)表于 04-27 19:23 ?3857次閱讀
    嵌入式Linux<b class='flag-5'>文件系統(tǒng)</b><b class='flag-5'>詳細(xì)</b>介紹

    如何去自制文件系統(tǒng)?開發(fā)文件系統(tǒng)為什么難?

    我們先從什么是文件系統(tǒng)講起,簡(jiǎn)單介紹些探索文件系統(tǒng)的基礎(chǔ)知識(shí)。
    的頭像 發(fā)表于 06-11 16:27 ?3441次閱讀
    如何去自制<b class='flag-5'>文件系統(tǒng)</b>?開發(fā)<b class='flag-5'>文件系統(tǒng)</b>為什么難?

    文件系統(tǒng)中的日志系統(tǒng)是如何實(shí)現(xiàn)的

    日志 本文來(lái)聊聊文件系統(tǒng)中的日志系統(tǒng),來(lái)看個(gè)簡(jiǎn)單的日志系統(tǒng)是如何實(shí)現(xiàn)的。本文是接著前面的 xv
    的頭像 發(fā)表于 09-29 11:04 ?2158次閱讀
    <b class='flag-5'>文件系統(tǒng)</b>中的日志<b class='flag-5'>系統(tǒng)</b>是如何實(shí)現(xiàn)的

    Linux overlayfs文件系統(tǒng)概述

    OverlayFS,顧名思義是種堆疊文件系統(tǒng),可以將多個(gè)目錄的內(nèi)容疊加到另一個(gè)目錄上。OverlayFS并不直接涉及磁盤空間結(jié)構(gòu),看起來(lái)像是將多個(gè)目錄的文件按照規(guī)則合并到同
    的頭像 發(fā)表于 10-20 10:21 ?2537次閱讀
    Linux overlayfs<b class='flag-5'>文件系統(tǒng)</b><b class='flag-5'>概述</b>

    FATFS文件系統(tǒng)詳解

    采用的獨(dú)特的文件系統(tǒng)結(jié)構(gòu)CDFS:CDFS是大部分的光盤的文件系統(tǒng)exFATFATFS文件系統(tǒng)FATFS是個(gè)完全免費(fèi)開源的FAT
    發(fā)表于 11-29 09:51 ?29次下載
    FATFS<b class='flag-5'>文件系統(tǒng)</b>詳解

    文件系統(tǒng)概述及代碼移植

    FATFS是個(gè)完全免費(fèi)開源的FAT文件系統(tǒng)模塊,專門為小型的嵌入式系統(tǒng)而設(shè)計(jì)。它完全用標(biāo)準(zhǔn)C語(yǔ)言編寫,所以具有良好的硬件平臺(tái)獨(dú)立性,甚至可
    的頭像 發(fā)表于 01-31 17:57 ?1787次閱讀
    <b class='flag-5'>文件系統(tǒng)</b><b class='flag-5'>概述</b>及代碼移植

    文件系統(tǒng)FatFs的移植

    FATFS是個(gè)完全免費(fèi)開源的FAT文件系統(tǒng)模塊,專門為小型的嵌入式系統(tǒng)而設(shè)計(jì)。它完全用標(biāo)準(zhǔn)C語(yǔ)言編寫,所以具有良好的硬件平臺(tái)獨(dú)立性,甚至可
    的頭像 發(fā)表于 03-01 14:38 ?1827次閱讀
    <b class='flag-5'>文件系統(tǒng)</b>FatFs的移植

    Linux的文件系統(tǒng)特點(diǎn)

    Linux的文件系統(tǒng)特點(diǎn) 文件系統(tǒng)要有嚴(yán)格的組織形式,使得文件能夠以塊為單位進(jìn)行存儲(chǔ)。 文件系統(tǒng)中也要有索引區(qū),用來(lái)方便查找
    的頭像 發(fā)表于 11-09 14:48 ?1099次閱讀
    Linux的<b class='flag-5'>文件系統(tǒng)</b>特點(diǎn)