2020-12-02 硬盘如何存储文件

2020-12-02 硬盘如何存储文件,第1张

系统中所有内容是以文件(文件夹是特殊的文件)存在的,而文件分为属性(元信息)和内容两部分,磁盘一部分被 *** 作系统虚拟为块用来存储数据,同时也分出一部分虚拟为Inode用来存储文件属性,这样磁盘就分为块区和inode区。

扇区:磁盘存储数据的最小物理单元,每个扇区很小512字节左右。

读取数据:OS要想读取磁盘数据,首先让磁头哪乎径向寻道(最慢),然后旋转磁盘(较快),使磁头到达目标扇区,开始读取数据。

磁盘块:OS日常工作中,一个扇区的512字节数据很小,不足以支撑绝大部分工作场景,所以需要频繁读取单个扇区,而磁盘读取数据速度相对CPU处理太慢了,所以读磁盘时一次就多拿出几个扇区(临近的,无需耗费额外时间)的数据,于是在OS层面逻辑虚拟出磁盘块(簇)的概念,一个磁盘块一般对应8个连续扇区(也可4、16个等,由OS决定),这样OS层面就使用磁盘块作为最小数据存储单元。

这样的好处当然是更高效,缺点则是会?

inode:用于存储文件的元信息(除名称外的所有属性,名称存在文件夹的内容中尘缓羡派拍)

Inode number is also known as index number. An inode is a unique number assigned to files and directories while it is created. The inode number will be unique to entire filesystem.

Disk inodes contain the following information:

Owner identifier

Type of file (regular, directory, character or block device)

Access permissions

Times and dates

·file creation time

·last file access time

·last inode modification time

Number of links to the file

Array of pointers to data blocks on disk

File size (in bytes, sometimes also in blocks)

文件:

上文提及文件属性存在磁盘inode区的inode(每个都有编号)内,而内容存储在块区的块中。

文件夹:

作为特殊文件,其组织文件及目录,属性也是存在inode内,而存储的内容是一个包含多个{ 文件名:对应inode Id} 的列表,内容亦存在块区的块中。

这样在OS中查看一个文件(比如/etc/fstab)的内容,大概是:

首先OS获取到根目录的inodeId >在inode区中读取到其属性(某项是内容所在块)>在块区读取到根目录内容>在内容中找到名为/etc对应发inodeId>/etc在inode区的属性>读取到块中/etc的内容(包含/etc/fstab对应inodeId)>/etc/fstab Inode Id >在inode区读取到/etc/fstab属性 >/etc/fstab块。

可能有误,望指点。

Within each file system, the mapping from names to blocks is handled through a structure called an i-node. There's a pool of these things near the "bottom" (lowest-numbered blocks) of each file system (the very lowest ones are used for housekeeping and labeling purposes we won't describe here). Each i-node describes one file. File data blocks (including directories) live above the i-nodes (in higher-numbered blocks).

Every i-node contains a list of the disk block numbers in the file it describes. (Actually this is a half-truth, only correct for small files, but the rest of the details aren't important here.) Note that the i-node does not contain the name of the file.

Names of files live in directory structures. A directory structure just maps names to i-node numbers. This is why, in Unix, a file can have multiple true names (or hard links)they're just multiple directory entries that happen to point to the same i-node.

refer: https://unix.stackexchange.com/questions/432655/why-does-using-indirect-pointers-in-inodes-not-incur-the-same-amount-of-space

less:by direct list blocks in node?.

large :by two-level indirect block

larger : multi-level indirect block.

The original hierarchy of the inodes levels works roughly like this:

You can store one or a few block numbers directly in the inode. This means you use a few bytes more for the inode, but for small files, you don't have to allocate a complete block, which is mostly empty.

The next level is one indirection: You allocate a block to store the block pointers. Only the address of this indirect block is stored in the inode. This doesn't use somehow "less space", and most filesystems, even early ones, worked like that (have a pointer near the inode/filename which points to a block, which stores the block numbers of the file).

But what do you do when the space in this block runs out? You have to allocate another block, but where do you store the reference to this block? You could just add those references to the inode, but to store largers files, the inode would get large. And you want small inodes, so as many as possible inodes can fit into a single block (less disk access to read more inodes).

So you use a two-level indirect block: You just add one pointer to the inode, then you have a whole block to store pointers to indirect blocks, and the indirect blocks store the block address of the file itself.

And so on, you can add higher-level indirect blocks, or stop at some stage, until you reach the maximal size of a file possible with the structure you want.

So the point is not "use up less space in total", but "use a scheme that uses blocks efficiently for the expected distribution a files wrt. to size, i.e. many small files, some larger files, and very few huge files".

Page tables on the other hand work very differently.

Edit

To answer the questions in the comment:

Data blocks are of fixed sizes (originally 512 bytes, IIRC), which is a multiple of the block size of the underlying harddisks. So data block size can't "decrease".

As I tried to describe above, the whole point of having the inodes not use up too much space is to make inode access faster (or, alternatively, make caching inodes use up less memory - back then when the unix file system with inodes was invented, computers had a lot less memory than today). It's not about somehow saving space in total. As you say yourself, everything has to be stored somewhere, and if it doesn't use up space at location X, it will use up space at location Y.

Just adding a variable number of block pointers to the inode is not practical, because the inode must take up a fixed amount of space - you want to use the inode number to calculate the block address and the offset inside the block where the inode information is stored. You can't do that if every inode has a different size. So there must be some form of indirection.

Page tables work differently because hardware implements them differently - that's just how it is. The hierarchy has a fixed depth, always the same (though sometimes configurable. And while reading a block from disk is slow, that doesn't matter for page tables. So the design issues are completely different.

http://www.cems.uwe.ac.uk/~irjohnso/coursenotes/lrc/internals/filestore/fs3.htm

Assuming, for the purposes of illustration, that each disk data block is 1024 bytes in size, then these ten data block pointers will allow files to be created that are up to 10 Kb in size. As you can see, for the large majority of files it should be possible to access the data with nothing more than a direct lookup required to find the data block that contains any particular data byte.

With this scheme, once a file has grown to 10 Kb, there are only three block pointers in the inode left to use, whatever the eventual size of the file. Obviously, some new arrangement must be found so that the three remaining block pointers will suffice for any realistic file size, while at the same time not degrading the data access time too much.

This goal is achieved by using the idea of indirect block pointers. Specifically, when an 11th data block needs to be allocated to the file, the 11th inode block pointer is used, but instead of pointing to the block which will contain the data, the 11th pointer is a single indirect pointer which points to a data block filled with a list of direct block pointers. In our example, if we assume that a data block number is a 32-bit value, then a list of 256 of them will fit into the single indirect block. This list will point directly to the data blocks for the next 256 Kb of our file. This means that with 11 block pointers in the inode, files of up to 266 Kb (10 + 256) can be created. True, it takes a little longer to access the data beyond the first 10 Kb in the file, but it takes only one extra disk block read to find the position on the disk of the required data.

For files bigger than 266 Kb the double indirect (12th) inode block pointer is used. This is the same idea as the previous inode pointer except that the double indirect pointer points to a list of pointers in a data block, each of which is itself a single indirect block pointer which points to a list of 256 direct block pointers. This means that the 12th inode block pointer gives access to the next 65536 Kb (256x256) of data in our file.

By now, you should be able to spot the pattern and see that when the file grows bigger than 64 Mb (actually 65802 Kb), the inode's 13th data block pointer will be used, but this time as a triple indirect pointer, which will give access to a staggering 16 Gb (256x256x256 Kb) of extra file space. A single file bigger than 16Gb sounds huge. However, even though the calculation we have just done suggests that this file size is possible with the inode layout as given, in fact there are other factors which limit the maximum size of a file to a smaller value than this. For example, the size of a file, in bytes, is stored separately in its inode in a field of type unsigned long. This is a 32-bit number which limits the size of a file to 4 Gb, so that 13 data block pointers in an inode really are enough.

10.4. How a file gets looked up

Now we can look at the file system from the top down. When you open a file (such as, say, /home/esr/WWW/ldp/fundamentals.xml) here is what happens:

Your kernel starts at the root of your Unix file system (in the root partition). It looks for a directory there called ‘home’. Usually ‘home’ is a mount point to a large user partition elsewhere, so it will go there. In the top-level directory structure of that user partition, it will look for a entry called ‘esr’ and extract an i-node number. It will go to that i-node, notice that its associated file data blocks are a directory structure, and look up ‘WWW’. Extracting that i-node, it will go to the corresponding subdirectory and look up ‘ldp’. That will take it to yet another directory i-node. Opening that one, it will find an i-node number for ‘fundamentals.xml’. That i-node is not a directory, but instead holds the list of disk blocks associated with the file.

The surface area of your disk, where it stores data, is divided up something like a dartboard — into circular tracks which are then pie-sliced into sectors. Because tracks near the outer edge have more area than those close to the spindle at the center of the disk, the outer tracks have more sector slices in them than the inner ones. Each sector (or disk block ) has the same size, which under modern Unixes is generally 1 binary K (1024 8-bit bytes). Each disk block has a unique address or disk block number .

Unix divides the disk into disk partitions . Each partition is a continuous span of blocks that's used separately from any other partition, either as a file system or as swap space. The original reasons for partitions had to do with crash recovery in a world of much slower and more error-prone disksthe boundaries between them reduce the fraction of your disk likely to become inaccessible or corrupted by a random bad spot on the disk. Nowadays, it's more important that partitions can be declared read-only (preventing an intruder from modifying critical system files) or shared over a network through various means we won't discuss here. The lowest-numbered partition on a disk is often treated specially, as a boot partition where you can put a kernel to be booted.

Each partition is either swap space (used to implement virtual memory ) or a file system used to hold files. Swap-space partitions are just treated as a linear sequence of blocks. File systems, on the other hand, need a way to map file names to sequences of disk blocks. Because files grow, shrink, and change over time, a file's data blocks will not be a linear sequence but may be scattered all over its partition (from wherever the operating system can find a free block when it needs one). This scattering effect is called fragmentation .

Within each file system, the mapping from names to blocks is handled through a structure called an i-node . There's a pool of these things near the "bottom" (lowest-numbered blocks) of each file system (the very lowest ones are used for housekeeping and labeling purposes we won't describe here). Each i-node describes one file. File data blocks (including directories) live above the i-nodes (in higher-numbered blocks).

Every i-node contains a list of the disk block numbers in the file it describes. (Actually this is a half-truth, only correct for small files, but the rest of the details aren't important here.) Note that the i-node does not contain the name of the file.

Names of files live in directory structures . A directory structure just maps names to i-node numbers. This is why, in Unix, a file can have multiple true names (or hard links )they're just multiple directory entries that happen to point to the same i-node.

In the simplest case, your entire Unix file system lives in just one disk partition. While you'll see this arrangement on some small personal Unix systems, it's unusual. More typical is for it to be spread across several disk partitions, possibly on different physical disks. So, for example, your system may have one small partition where the kernel lives, a slightly larger one where OS utilities live, and a much bigger one where user home directories live.

The only partition you'll have access to immediately after system boot is your root partition , which is (almost always) the one you booted from. It holds the root directory of the file system, the top node from which everything else hangs.

The other partitions in the system have to be attached to this root in order for your entire, multiple-partition file system to be accessible. About midway through the boot process, your Unix will make these non-root partitions accessible. It will mount each one onto a directory on the root partition.

For example, if you have a Unix directory called <tt class="filename">/usr</tt>, it is probably a mount point to a partition that contains many programs installed with your Unix but not required during initial boot.

1.简介

分布式文件系统(Distributed File System)是指文件系统管理的物理存储资源不一定直接连接在本地节点上,而是通过计算机网络与节点相连。分布式文件系统的设计基于客户机/服务器模式。一个典型的网络可能包括多个供多用户访问的服务器。另外,对等特性允许一些系统扮演客户机和服务器的双重角色。例如,用户可以“发表”一个允许其他客户机访问的目录,一旦被访问,这个目录仿指对客户机来说就像使用本地驱动器一样。

当下我们处在一个互联网飞速发展的信息 社会 ,在海量并发连接的驱动下每天所产生的数据量必然以几何方式增长,随着信息连接方式日益多样化,数据存储的结构也随着发生了变化。在这样的压力下使得人们不得不重新审视大量数据的存储所带来的挑战,例如:数据采集、数据存储、数据搜索、数据共享、数据传输、数据分析、数据可视化等一系列问题。

传统存储在面对海量数据存储表现出的力不从心已经是不争的事实,例如:纵向扩展受阵列空间限制、横向扩展受交换设备限制、节点受文件系统限制。

然而分布式存储的出现在一定程度上有效的缓解了这一问题,之所以称之为缓解碰氏是因为分布式存储在面对海量数据存储时也并非十全十美毫无压力,依然存在的难点与挑战例如:节点间通信、数据存储、数据空间平衡、容错、文件系统支持等一系列问题仍处在不断摸索和完善中。

2.分布式文件系统的一些解决方案

Google Filesystem适合存储海量大个文件,元数据存储与内存中

HDFS(Hadoop Filesystem)GFS的山寨版,适合存储大笑大散量大个文件

TFS(Taobao Filesystem)淘宝的文件系统,在名称节点上将元数据存储与关系数据库中,文件数量不在受限于名称节点的内容空间,可以存储海量小文件LustreOracle开发的企业级分布式系统,较重量级MooseFS基于FUSE的格式,可以进行挂载使用MogileFS

擅长存储海量的小数据,元数据存储与关系型数据库中

1.简介

MogileFS是一个开源的分布式文件系统,用于组建分布式文件集群,由LiveJournal旗下DangaInteractive公司开发,Danga团队开发了包括 Memcached、MogileFS、Perlbal等不错的开源项目:(注:Perlbal是一个强大的Perl写的反向代理服务器)。MogileFS是一个开源的分布式文件系统。

目前使用 MogileFS 的公司非常多,比如国外的一些公司,日本前几名的公司基本都在使用这个.

国内所知道的使用 MogileFS 的公司有图片托管网站 yupoo又拍,digg, 土豆, 豆瓣,1 号店, 大众点评,搜狗,安居客等等网站.基本很多网站容量,图片都超过 30T 以上。

2.MogileFS特性

1) 应用层提供服务,不需要使用核心组件

2)无单点失败,主要有三个组件组成,分为tracker(跟踪节点)、mogstore(存储节点)、database(数据库节点)

3)自动复制文件,复制文件的最小单位不是文件,而是class

4)传输中立,无特殊协议,可以通过NFS或HTTP实现通信

5)简单的命名空间:没有目录,直接存在与存储空间上,通过域来实现

6)不用共享任何数据

3.MogileFS的组成

1)Tracker--跟踪器,调度器

MogileFS的核心,是一个调度器,mogilefsd进程就是trackers进程程序,trackers的主要职责有:删除数据、复制数据、监控、查询等等.这个是基于事件的( event-based ) 父进程/消息总线来管理所有来之于客户端应用的交互(requesting operations to be performed), 包括将请求负载平衡到多个"query workers"中,然后让 mogilefs的子进程去处理.

mogadm,mogtool的所有 *** 作都要跟trackers打交道,Client的一些 *** 作也需要定义好trackers,因此最好同时运行多个trackers来做负载均衡.trackers也可以只运行在一台机器上,使用负载均衡时可以使用搞一些简单的负载均衡解决方案,如haproxy,lvs,nginx等,

tarcker的配置文件为/etc/mogilefs/mogilefsd.conf,监听在TCP的7001端口

2)Database--数据库部分

主要用来存储mogilefs的元数据,所有的元数据都存储在数据库中,因此,这个数据相当重要,如果数据库挂掉,所有的数据都不能用于访问,因此,建议应该对数据库做高可用

3)mogstored--存储节点

数据存储的位置,通常是一个HTTP(webDAV)服务器,用来做数据的创建、删除、获取,任何 WebDAV 服务器都可以, 不过推荐使用 mogstored . mogilefsd可以配置到两个机器上使用不同端口… mogstored 来进行所有的 DAV *** 作和流量,IO监测, 并且你自己选择的HTTP服务器(默认为 perlbal)用来做 GET *** 作给客户端提供文件.

典型的应用是一个挂载点有一个大容量的SATA磁盘. 只要配置完配置文件后mogstored程序的启动将会使本机成为一个存储节点.当然还需要mogadm这个工具增加这台机器到Cluster中.

配置文件为/etc/mogilefs/mogstored.conf,监听在TCP的7500端口

4.基本工作流程

应用程序请求打开一个文件 (通过RPC 通知到 tracker, 找到一个可用的机器). 做一个 “create_open” 请求.

tracker 做一些负载均衡(load balancing)处理,决定应该去哪儿,然后给应用程序一些可能用的位置。

应用程序写到其中的一个位置去 (如果写失败,他会重新尝试并写到另外一个位置去).

应用程序 (client) 通过”create_close” 告诉tracker文件写到哪里去了.

tracker 将该名称和域命的名空间关联 (通过数据库来做的)

tracker, 在后台, 开始复制文件,知道他满足该文件类别设定的复制规则

然后,应用程序通过 “get_paths” 请求 domain+key (key == “filename”) 文件, tracker基于每一位置的I/O繁忙情况回复(在内部经过 database/memcache/etc 等的一些抉择处理), 该文件可用的完整 URLs地址列表.

应用程序然后按顺序尝试这些URL地址. (tracker’持续监测主机和设备的状态,因此不会返回死连接,默认情况下他对返回列表中的第一个元素做双重检查,除非你不要他这么做..)

1.拓扑图

说明:1.用户通过URL访问前端的nginx

2.nginx根据特定的挑选算法,挑选出后端一台tracker来响应nginx请求

3.tracker通过查找database数据库,获取到要访问的URL的值,并返回给nginx

4.nginx通过返回的值及某种挑选算法挑选一台mogstored发起请求

5.mogstored将结果返回给nginx

6.nginx构建响应报文返回给客户端

2.ip规划

角色运行软件ip地址反向代理nginx192.168.1.201存储节点与调度节点1

mogilefs192.168.1.202存储节点与调度节点2

mogilefs192.168.1.203数据库节点

MariaDB192.168.1.204

3.数据库的安装 *** 作并为授权

关于数据库的编译安装,请参照本人相关博文http://wangfeng7399.blog.51cto.com/3518031/1393146,本处将不再累赘,本处使用的为yum源的安装方式安装mysql

4.安装mogilefs. 安装mogilefs,可以使用yum安装,也可以使用编译安装,本处通过yum安装

5.初始化数据库

可以看到在数据库中创建了一些表

6.修改配置文件,启动服务

7.配置mogilefs

添加存储主机

添加存储设备

添加域

添加class

8.配置192.168.1.203的mogilefs 。切记不要初始化数据库,配置应该与192.168.1.202一样

9.尝试上传数据,获取数据,客户端读取数据

上传数据,在任何一个节点上传都可以

获取数据

客户端查看数据

我们可以通过任何一个节点查看到数据

要想nginx能够实现对后端trucker的反向代理,必须结合第三方模块来实现

1.编译安装nginx

2.准备启动脚本

3.nginx与mofilefs互联

查看效果

5.配置后端truckers的集群

查看效果

大功告成了,后续思路,前段的nginx和数据库都存在单点故障,可以实现高可用集群

filesystem是系统文件... 一般手机下载的歌敏差睁曲庆让是存在music文件夹的..这样手机就能识别出音乐.. 没有music这个文件夹的话你就自己新建一个.. 在把音乐桥岁放进去....


欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/tougao/8204577.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2023-04-14
下一篇 2023-04-14

发表评论

登录后才能评论

评论列表(0条)

保存