HDFS概述以及HDFS的shell *** 作和API *** 作_随笔

HDFS概述以及HDFS的shell *** 作和API *** 作 HDFS概述以及HDFS的shell *** 作和API *** 作

HDFS概述以及HDFS的shell *** 作和API *** 作

一、HDFS概述

1.HDFS背景

2.HDFS定义

3.HDFS的优缺点

4.HDFS组成架构

5.文件块大小

二、HDFS的shell *** 作

1.基本语法

2.命令大全

3.常用命令

三、HDFS的API *** 作

1.客户端环境准备

2.HDFS的SPI实例 *** 作

一、HDFS概述 1.HDFS背景

随着数据量越来越大，在一个 *** 作系统存不下所有的数据，那么就分配到更多的 *** 作系统管理的磁盘中，但是不方便管理和维护，迫切需要一种系统来管理多台机器上的文件，这就是分布式文件管理系统。HDFS只是分布式文件管理系统中的一种。

2.HDFS定义

HDFS（Hadoop Distributed File System），它是一个文件系统，用于存储文件，通过目录树来定位文件；其次，它是分布式的，由很多服务器联合起来实现其功能，集群中的服务器有各自的角色。

HDFS的使用场景：适合一次写入，多次读出的场景。一个文件经过创建、写入和关闭之后就不需要改变。

3.HDFS的优缺点

优点：

a.高容错性

数据自动保存多个副本，通过添加副本的形式来提高容错性。

某一个副本丢失后，它可以自动恢复。

b.适合处理大数据

数据规模：能够处理数据规模达到GB、TB、甚至PB级别的数据。

文件规模：能够处理百万规模以上的文件数量。

c.可构建在廉价机器上，通过多副本机制，提高可靠性

缺点：

a.不适合低延时数据访问，比如毫秒级的存储数据

b.无法高效对大量小文件进行存储。

存储大量的小文件的话，它会占用NameNode大量的内存来存储文件目录和块信息，但是NameNode的内存是有限的。

存储过多的小文件的话，小文件存储的寻址地址会超过读取时间，违反了HDFS的设计目标。

c.不支持并发写入、文件随机修改

一个文件只能有一个写，不允许多个线程同时写。

仅支持数据append，不支持文件的随机修改。

4.HDFS组成架构

5.文件块大小

思考：为什么块的大小不能设置太小，也不能设置太大了

（1）HDFS的块设置太小，会增加寻址时间，程序会一直在找块开始的位置

（2）如果块设置的太大，从磁盘传输数据的时间会明显大于定为这个块初始位置所需的时间，导致程序在处理这块数据时，会非常慢。

总结：HDFS块的大小设置主要取决于磁盘传输速率。

二、HDFS的shell *** 作 1.基本语法

hadoop fs 具体命令

hdfs dfs 具体命令

2.命令大全

[atguigu@hadoop102 hadoop-3.1.3]$ bin/hadoop fs

[-appendToFile  ... ]
        [-cat [-ignoreCrc]  ...]
        [-chgrp [-R] GROUP PATH...]
        [-chmod [-R]  PATH...]
        [-chown [-R] [OWNER][:[GROUP]] PATH...]
        [-copyFromLocal [-f] [-p]  ... ]
        [-copyToLocal [-p] [-ignoreCrc] [-crc]  ... ]
        [-count [-q]  ...]
        [-cp [-f] [-p]  ... ]
        [-df [-h] [ ...]]
        [-du [-s] [-h]  ...]
        [-get [-p] [-ignoreCrc] [-crc]  ... ]
        [-getmerge [-nl]  ]
        [-help [cmd ...]]
        [-ls [-d] [-h] [-R] [ ...]]
        [-mkdir [-p]  ...]
        [-moveFromLocal  ... ]
        [-moveToLocal  ]
        [-mv  ... ]
        [-put [-f] [-p]  ... ]
        [-rm [-f] [-r|-R] [-skipTrash]  ...]
        [-rmdir [--ignore-fail-on-non-empty]  ...]
 ]]
        [-setrep [-R] [-w]   ...]
        [-stat [format]  ...]
        [-tail [-f] ]
        [-test -[defsz] ]
        [-text [-ignoreCrc]  ...]

3.常用命令

1.启动Hadoop集群

[atguigu@hadoop102 hadoop-3.1.3]$ sbin/start-dfs.sh
[atguigu@hadoop103 hadoop-3.1.3]$ sbin/start-yarn.sh

2.输出-help命令

[atguigu@hadoop102 hadoop-3.1.3]$ hadoop fs -help rm

3.创建文件夹

[atguigu@hadoop102 hadoop-3.1.3]$ hadoop fs -mkdir /sanguo

4.上传

a. -moveFromLocal：从本地剪切到HDFS

[atguigu@hadoop102 hadoop-3.1.3]$ hadoop fs  -moveFromLocal  ./shuguo.txt  /sanguo

b. -copyFromLocal：从本地复制文件到HDFS

[atguigu@hadoop102 hadoop-3.1.3]$ hadoop fs -copyFromLocal weiguo.txt /sanguo

c. -put和-copyFromLocal作用一样

[atguigu@hadoop102 hadoop-3.1.3]$ hadoop fs -put ./wuguo.txt /sanguo

d. -appendToFile：追加一个文件到另一个文件尾末

[atguigu@hadoop102 hadoop-3.1.3]$ hadoop fs -appendToFile liubei.txt /sanguo/shuguo.txt

5.下载

a. -copyToLocal：从HDFS复制文件到本地

[atguigu@hadoop102 hadoop-3.1.3]$ hadoop fs -copyToLocal /sanguo/shuguo.txt ./

b. -get作用和-copyToLocal作用相同

[atguigu@hadoop102 hadoop-3.1.3]$ hadoop fs -get /sanguo/shuguo.txt ./shuguo2.txt

6.HDFS中的直接 *** 作

a. -ls：显示目录信息

[atguigu@hadoop102 hadoop-3.1.3]$ hadoop fs -ls /sanguo

b. -cat：显示文件内容

[atguigu@hadoop102 hadoop-3.1.3]$ hadoop fs -cat /sanguo/shuguo.txt

c. -chrap：改变组 -chmod：改变权限 -chown：改变用户和组

[atguigu@hadoop102 hadoop-3.1.3]$ hadoop fs  -chmod 666  /sanguo/shuguo.txt
[atguigu@hadoop102 hadoop-3.1.3]$ hadoop fs  -chown  atguigu:atguigu   /sanguo/shuguo.txt

d. -mkdir创建路径

[atguigu@hadoop102 hadoop-3.1.3]$ hadoop fs -mkdir /jinguo

e. -cp：从HDFS的一个路径中复制到HDFS另一个路径

[atguigu@hadoop102 hadoop-3.1.3]$ hadoop fs -cp /sanguo/shuguo.txt /jinguo

f. -mv：在HDFS中移动文件

[atguigu@hadoop102 hadoop-3.1.3]$ hadoop fs -mv /sanguo/wuguo.txt /jinguo

g. -tail -head 显示文件末尾/开头1kb的内容

[atguigu@hadoop102 hadoop-3.1.3]$ hadoop fs -tail /jinguo/shuguo.txt

h. -rm：删除文件或目录 -r递归删除

[atguigu@hadoop102 hadoop-3.1.3]$ hadoop fs -rm /sanguo/shuguo.txt
[atguigu@hadoop102 hadoop-3.1.3]$ hadoop fs -rm -r /sanguo

i. -du 统计文件夹大小信息注：说明：27表示文件大小；81表示27*3个副本；/jinguo表示查看的目录

[atguigu@hadoop102 hadoop-3.1.3]$ hadoop fs -du -s -h /jinguo
27  81  /jinguo

[atguigu@hadoop102 hadoop-3.1.3]$ hadoop fs -du  -h /jinguo
14  42  /jinguo/shuguo.txt
7   21   /jinguo/weiguo.txt
6   18   /jinguo/wuguo.tx

j. -setrep：设置HDFS中文件的副本数量

[atguigu@hadoop102 hadoop-3.1.3]$ hadoop fs -setrep 10 /jinguo/shuguo.txt

注：这里设置的副本数只是记录在NameNode的元数据中，是否真的会有这么多副本，还得看DataNode的数量。因为目前只有3台设备，最多也就3个副本，只有节点数的增加到10台时，副本数才能达到10。

三、HDFS的API *** 作 1.客户端环境准备

1.下载hadoop3.1.0

2.配置环境变量

3.在IDEA中创建一个Maven工程HdfsClientDemo，并导入相应的依赖坐标+日志添加


    
        org.apache.hadoop
        hadoop-client
        3.1.3
    
    
        junit
        junit
        4.12
    
    
        org.slf4j
        slf4j-log4j12
        1.7.30

4.在项目的src/main/resources目录下，新建一个文件，命名为“log4j.properties”，在文件中填入

log4j.rootLogger=INFO, stdout  
log4j.appender.stdout=org.apache.log4j.ConsoleAppender  
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout  
log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n  
log4j.appender.logfile=org.apache.log4j.FileAppender  
log4j.appender.logfile.File=target/spring.log  
log4j.appender.logfile.layout=org.apache.log4j.PatternLayout  
log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n

5.创建包和类

public class hdfs_demo01 {
    FileSystem fs;
    Configuration configuration;
    @Before
    public void test_hdfs() throws URISyntaxException, IOException, InterruptedException {
        // 1 获取文件系统
        configuration = new Configuration();
        configuration.set("dfs.replication", "3");
        String user = "atguigu";
        //URI.create("hdfs://hadoop102:8020");
        fs = FileSystem.get(new URI("hdfs://hadoop102:8020"), configuration, user);
    }
    @Test
    public void close() throws IOException {
        fs.close();
    }
    //新建路径
    @Test
    public void mkdir() throws IOException {
        fs.mkdirs(new Path("/sanguo/shuguo"));
    }
}

注意：客户端去 *** 作HDFS时，是有一个用户身份的。默认情况下，HDFS客户端API会从采用Windows默认用户访问HDFS，会报权限异常错误。所以在访问HDFS时，一定要配置用户。

2.HDFS的SPI实例 *** 作

1.上传文件

a.代码

 @Test
    public void put() throws IOException {
        fs.copyFromLocalFile(true,true,new Path("F:/lb.txt"),new Path("/sanguo"));
    }

b.将hdfs-site.xml拷贝到项目的resources资源目录下





    
        dfs.replication
         1

c.参数优先级排序：（1）客户端代码中设置的值 >（2）ClassPath下的用户自定义配置文件 >（3）然后是服务器的自定义配置（xxx-site.xml） >（4）服务器的默认配置（xxx-default.xml）

2.HDFS文件下载

//.sha1 .crc 为下载时的校验文件
    @Test
    public void load() throws IOException {
        fs.copyToLocalFile(new Path("/sanguo/lb.txt"),new Path("/F:"));
    }

3.HDFS文件更名和移动

 @Test
    public void testRename() throws IOException {
        //文件的更名
        //fs.rename(new Path("/sanguo/lb.txt"),new Path("/sanguo/dc.txt"));
        //文件的移动
        //fs.rename(new Path("/sanguo/dc.txt"),new Path("/ds.txt"));
        //文件的更名和移动
        fs.rename(new Path("/ds.txt"),new Path("/sanguo/lb.txt"));
    }

4.HDFS删除文件和目录

//文件和目录的删除 boolean为是否递归
    @Test
    public void testDelete() throws IOException {
        fs.delete(new Path("/xiyou"),false);
    }

5.HDFS文件查看和 *** 作

 //文件的查看
    @Test
    public void testListFile() throws IOException {
        RemoteIterator files = fs.listFiles(new Path("/"),true);

        while (files.hasNext()){
            LocatedFileStatus next = files.next();
            System.out.println("==========="+next.getPath()+"============");
            System.out.println(next.getPermission());//权限
            System.out.println(next.getOwner());//主人
            System.out.println(next.getGroup());//组名
            System.out.println(next.getLen());//大小
            SimpleDateFormat simpleDateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
            System.out.println(simpleDateFormat.format(next.getModificationTime()));//时间
            System.out.println(next.getModificationTime());//时间戳
            System.out.println(next.getReplication());//副本数
            System.out.println(next.getBlockSize());//块大小
            System.out.println(next.getPath().getName());//名字
            BlockLocation[] blockLocations = next.getBlockLocations();
            System.out.println(Arrays.toString(blockLocations));//块位置

        }
   }

6.文件和文件夹判断

//文件和文件夹判断
    @Test
    public void testListStatus() throws IOException {
        FileStatus[] fileStatuses = fs.listStatus(new Path("/"));
        for(FileStatus fileStatus: fileStatuses){
            if (fileStatus.isFile()){
                System.out.println("文件："+fileStatus.getPath().getName());
            }else{
                System.out.println("目录："+fileStatus.getPath().getName());
            }
        }
    }
    
    //递归文件和文件夹判断
    public void testListStatus01(Path path) throws IOException {
        FileStatus[] fileStatuses = fs.listStatus(path);
        for(FileStatus fileStatus: fileStatuses){
            if (fileStatus.isFile()){
                System.out.println("文件："+fileStatus.getPath().getName());
            }else{
                System.out.println("目录："+fileStatus.getPath().getName());
                testListStatus01(fileStatus.getPath());
            }
        }
    }
    @Test
    public void test() throws IOException {
        testListStatus01(new Path("/"));
    }

7.手动上传和手动下载

//手动上传
    @Test
    public void testPutIO() throws IOException {
        //开始输入流
        FileInputStream fis = new FileInputStream("F:/lb.txt");
        //开启输出流
        FSDataOutputStream fos = fs.create(new Path("/input/lb.txt"));
        //流的对拷
        IOUtils.copyBytes(fis,fos,configuration);
        //流的关闭
        IOUtils.closeStreams(fos,fis);
    }

    //手动下载
    @Test
    public void testLoadIO() throws IOException {
        FSDataInputStream fdis = fs.open(new Path("/input/word.txt"));
        FileOutputStream fos = new FileOutputStream("F:/word.txt");
        IOUtils.copyBytes(fdis,fos,configuration);
        IOUtils.closeStreams(fdis,fos);
    }

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/zaji/5655964.html

HDFS概述以及HDFS的shell * 作和API * 作

发表评论

评论列表（0条）