hive核心组件及流程(一)

hive核心组件及流程(一),第1张

依赖第三方组件: Meta store(mysql),hdfs,MapReduce

hive:

Client客户端 CLI、JDBC

Driver连接客户端与服务端的桥梁

SQL Pareser解析器,将SQL转换为抽象语法树AST

1.将HQL语句转换为Token

2.对Token进行解析,生成AST

Physical Plan编译器将AST编译生成逻辑执行计划

Query Optimizer优化器,对逻辑执行计划进行优化

1.将AST转换为QueryBlock

2.将QueryBlock转换为OperatorTree

3.OperatorTree进行逻辑优化

4.生成TaskTree

5.TaskTree执行物理优化

Execution执行器把逻辑执行计划转换成可以运行的物理计划

1.获取MR临时工作目录

3.定义Mapper和Reducer

2.定义Partitioner

4.实例化Job

5.提交Job

1.以Antlr定义的语法规则,对SQL完成词法解析,将SQL转换为AST

2.遍历AST,抽象出查询基本组成单元QueryBlock。

3.遍历QueryBlock,将其转换为OperatorTree,逻辑执行单元

4.利用逻辑优化器对OperatorTree进行逻辑优化。

5.遍历OperatorTree转换为TaskTree,将逻辑执行计划转化为物理执行计划

6.使用物理优化器对TaskTree进行物理优化

7.生成最终的执行计划,提交执行

$HIVE_HOME/bin/hive可以进入客户端

$HIVE_HOME/bin/hive -e "{SQL语句}"可以执行SQL语句

$HIVE_HOME/bin/hive -f {SQL文件名.sql}可以执行sql文件

开启hiveserver2服务,可以通过JDBC提交SQL

创建Driver

创建OptionsProcessor

初始化log4j

标准输入输出以及错误输出流的定义,后续需要输入 SQL 以及打印控制台信息

解析输入的参数,包含"-e -f -v -database"

读取输入的sql

按照""分割的方式解析

解析单行SQL

遇到为"quit"或者"exit"退出

遇到为"source"开头,执行 SQL 文件,读取文件并解析

如果命令以"!"开头,则表示用户需要执行 shell命令

以上三种都不是的情况下执行SQL,进行SQL解析

获取当前系统时间

获取系统结束时间

编译SQL语句

SQL生成AST,构建词法解析器,将关键词替换为TOKEN,进行语法解析,生成最终AST

处理AST,转换为QueryBlock然后转换为OperatorTree,对Operator进行逻辑优化,然后转换为任务树,然后进行物理优化。

根据任务树构建MrJob

添加启动任务,根据是否可以并行来决定是否并行启动Task

设置MR任务的InputFormat、OutputFormat 等等这些 MRJob 的执行类

构建执行MR任务的命令

向yarn提交任务

打印头信息

获取结果集并获取抓取到的条数

打印SQL执行时间及数据条数

mysql耗内存吗?很多人都说MySQL占用了很大的虚拟内存,那么这个问题应该怎么解决呢?下面是我收集整理的一些方法,现在分享给大家!

解决mysql耗内存的具体方法一:

在分析的过程中发现最耗内存的是MySQL,其中近1GB的内存被它吞了,而且不在任务管理器体现出来。这个数据库软件是EMS要用到了,所以必须要运行。这个软件在安装的时候会根据机器的实际内存自动进行配置,PC机物理内存越多,它默认占有的内存就越多,难怪3GB的内存被它给吞了近1GB。

优化方法:

1. 退出EMS client&server

2. 在CMD里运行:net stop mysql

3. 找到MySQL\MySQL Server的安装目录,里面有个my.ini文件,参考附件的配置对参数query_cache_size tmp_table_size myisam_sort_buffer_size key_buffer_size innodb_buffer_pool_size进行修改,注意不要改动innodb_log_file_size,修改前备份my.ini

4. 在CMD里运行:net start mysql,如果提示成功,则说明修改的参数没有什么问题,如果失败,重新调整一下上面的参数

5. 找到EMS 安装目录runGUI.bat runServer.bat脚本,找到-Xmx700m,改为-Xmx256m,注意修改前备份这两个文件,感谢Liping Sun提供帮助

6. 重新运行EMS

前后对比,对于3GB的PC,发现可以节省近1GB的内存。对于2GB的PC,也可以节省600-800MB。优化后发现EMS启动稍微慢一些,但是其它的软件运行速度提高了很多,不在经常出现卡机现象了。如果在运行过程中发现EMS特别慢的话,自己也可以适当放大上面提到的一些参数。

my.ini

# MySQL Server Instance Configuration File

# ----------------------------------------------------------------------

# Generated by the MySQL Server Instance Configuration Wizard

#

#

# Installation Instructions

# ----------------------------------------------------------------------

#

# On Linux you can copy this file to /etc/my.cnf to set global options,

# mysql-data-dir/my.cnf to set server-specific options

# (@localstatedir@ for this installation) or to

# ~/.my.cnf to set user-specific options.

#

# On Windows you should keep this file in the installation directory

# of your server (e.g. C:\Program Files\MySQL\MySQL Server X.Y). To

# make sure the server reads the config file use the startup option

# "--defaults-file".

#

# To run run the server from the command line, execute this in a

# command line shell, e.g.

# mysqld --defaults-file="C:\Program Files\MySQL\MySQL Server X.Y\my.ini"

#

# To install the server as a Windows service manually, execute this in a

# command line shell, e.g.

# mysqld --install MySQLXY --defaults-file="C:\Program Files\MySQL\MySQL Server X.Y\my.ini"

#

# And then execute this in a command line shell to start the server, e.g.

# net start MySQLXY

#

#

# Guildlines for editing this file

# ----------------------------------------------------------------------

#

# In this file, you can use all long options that the program supports.

# If you want to know the options a program supports, start the program

# with the "--help" option.

#

# More detailed information about the individual options can also be

# found in the manual.

#

#

# CLIENT SECTION

# ----------------------------------------------------------------------

#

# The following options will be read by MySQL client applications.

# Note that only client applications shipped by MySQL are guaranteed

# to read this section. If you want your own MySQL client program to

# honor these values, you need to specify it as an option during the

# MySQL client library initialization.

#

[client]

port=3306

[mysql]

default-character-set=utf8

# SERVER SECTION

# ----------------------------------------------------------------------

#

# The following options will be read by the MySQL Server. Make sure that

# you have installed the server correctly (see above) so it reads this

# file.

#

[mysqld]

# The TCP/IP Port the MySQL Server will listen on

port=3306

#Path to installation directory. All paths are usually resolved relative to this.

basedir="D:/Program Files/MySQL/MySQL Server 5.1/"

#Path to the database root

datadir="C:/Documents and Settings/All Users/Application Data/MySQL/MySQL Server 5.1/Data/"

# The default character set that will be used when a new schema or table is

# created and no character set is defined

character-set-server=utf8

# The default storage engine that will be used when create new tables when

default-storage-engine=INNODB

# Set the SQL mode to strict

sql-mode="STRICT_TRANS_TABLES,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION"

# The maximum amount of concurrent sessions the MySQL server will

# allow. One of these connections will be reserved for a user with

# SUPER privileges to allow the administrator to login even if the

# connection limit has been reached.

max_connections=1510

# Query cache is used to cache SELECT results and later return them

# without actual executing the same query once again. Having the query

# cache enabled may result in significant speed improvements, if your

# have a lot of identical queries and rarely changing tables. See the

# "Qcache_lowmem_prunes" status variable to check if the current value

# is high enough for your load.

# Note: In case your tables change very often or if your queries are

# textually different every time, the query cache may result in a

# slowdown instead of a performance improvement.

query_cache_size=16M

# The number of open tables for all threads. Increasing this value

# increases the number of file descriptors that mysqld requires.

# Therefore you have to make sure to set the amount of open files

# allowed to at least 4096 in the variable "open-files-limit" in

# section [mysqld_safe]

table_cache=3020

# Maximum size for internal (in-memory) temporary tables. If a table

# grows larger than this value, it is automatically converted to disk

# based table This limitation is for a single table. There can be many

# of them.

tmp_table_size=4M

# How many threads we should keep in a cache for reuse. When a client

# disconnects, the client's threads are put in the cache if there aren't

# more than thread_cache_size threads from before. This greatly reduces

# the amount of thread creations needed if you have a lot of new

# connections. (Normally this doesn't give a notable performance

# improvement if you have a good thread implementation.)

thread_cache_size=64

#*** MyISAM Specific options

# The maximum size of the temporary file MySQL is allowed to use while

# recreating the index (during REPAIR, ALTER TABLE or LOAD DATA INFILE.

# If the file-size would be bigger than this, the index will be created

# through the key cache (which is slower).

myisam_max_sort_file_size=100G

# If the temporary file used for fast index creation would be bigger

# than using the key cache by the amount specified here, then prefer the

# key cache method. This is mainly used to force long character keys in

# large tables to use the slower key cache method to create the index.

myisam_sort_buffer_size=4M

# Size of the Key Buffer, used to cache index blocks for MyISAM tables.

# Do not set it larger than 30% of your available memory, as some memory

# is also required by the OS to cache rows. Even if you're not using

# MyISAM tables, you should still set it to 8-64M as it will also be

# used for internal temporary disk tables.

key_buffer_size=16M

# Size of the buffer used for doing full table scans of MyISAM tables.

# Allocated per thread, if a full scan is needed.

read_buffer_size=64K

read_rnd_buffer_size=256K

# This buffer is allocated when MySQL needs to rebuild the index in

# REPAIR, OPTIMZE, ALTER table statements as well as in LOAD DATA INFILE

# into an empty table. It is allocated per thread so be careful with

# large settings.

sort_buffer_size=256K

#*** INNODB Specific options ***

# Use this option if you have a MySQL server with InnoDB support enabled

# but you do not plan to use it. This will save memory and disk space

# and speed up some things.

#skip-innodb

# Additional memory pool that is used by InnoDB to store metadata

# information. If InnoDB requires more memory for this purpose it will

# start to allocate it from the OS. As this is fast enough on most

# recent operating systems, you normally do not need to change this

# value. SHOW INNODB STATUS will display the current amount used.

innodb_additional_mem_pool_size=9M

# If set to 1, InnoDB will flush (fsync) the transaction logs to the

# disk at each commit, which offers full ACID behavior. If you are

# willing to compromise this safety, and you are running small

# transactions, you may set this to 0 or 2 to reduce disk I/O to the

# logs. Value 0 means that the log is only written to the log file and

# the log file flushed to disk approximately once per second. Value 2

# means the log is written to the log file at each commit, but the log

# file is only flushed to disk approximately once per second.

innodb_flush_log_at_trx_commit=1

# The size of the buffer InnoDB uses for buffering log data. As soon as

# it is full, InnoDB will have to flush it to disk. As it is flushed

# once per second anyway, it does not make sense to have it very large

# (even with long transactions).

innodb_log_buffer_size=5M

# InnoDB, unlike MyISAM, uses a buffer pool to cache both indexes and

# row data. The bigger you set this the less disk I/O is needed to

# access data in tables. On a dedicated database server you may set this

# parameter up to 80% of the machine physical memory size. Do not set it

# too large, though, because competition of the physical memory may

# cause paging in the operating system. Note that on 32bit systems you

# might be limited to 2-3.5G of user level memory per process, so do not

# set it too high.

innodb_buffer_pool_size=32M

# Size of each log file in a log group. You should set the combined size

# of log files to about 25%-100% of your buffer pool size to avoid

# unneeded buffer pool flush activity on log file overwrite. However,

# note that a larger logfile size will increase the time needed for the

# recovery process.

innodb_log_file_size=88M

# Number of threads allowed inside the InnoDB kernel. The optimal value

# depends highly on the application, hardware as well as the OS

# scheduler properties. A too high value may lead to thread thrashing.

innodb_thread_concurrency=8

   解决mysql耗内存的具体方法二:

更改后如下:

innodb_buffer_pool_size=576M ->256M InnoDB引擎缓冲区占了大头,首要就是拿它开刀

query_cache_size=100M ->16M 查询缓存

tmp_table_size=102M ->64M 临时表大小

key_buffer_size=256m ->32M

重启mysql服务后,虚拟内存降到200以下.

另外mysql安装目录下有几个文件:my-huge.ini 、my-large.ini、my-medium.ini...这几个是根据内存大小作的建议配置,新手在设置的时候也可以参考一下。

2G内存的MYSQL数据库服务器 my.ini优化 (my.ini)

2G内存,针对站少,优质型的设置,试验特:

table_cache=1024 物理内存越大,设置就越大.默认为2402,调到512-1024最佳

innodb_additional_mem_pool_size=8M 默认为2M

innodb_flush_log_at_trx_commit=0 等到innodb_log_buffer_size列队满后再统一储存,默认为1

innodb_log_buffer_size=4M 默认为1M

innodb_thread_concurrency=8 你的服务器CPU有几个就设置为几,默认为8

key_buffer_size=256M 默认为218 调到128最佳

tmp_table_size=64M 默认为16M 调到64-256最挂

read_buffer_size=4M 默认为64K

read_rnd_buffer_size=16M 默认为256K

sort_buffer_size=32M 默认为256K

max_connections=1024 默认为1210

试验一:

table_cache=512或1024

innodb_additional_mem_pool_size=2M

innodb_flush_log_at_trx_commit=0

innodb_log_buffer_size=1M

innodb_thread_concurrency=8 你的服务器CPU有几个就设置为几,默认为8

key_buffer_size=128M

tmp_table_size=128M

read_buffer_size=64K或128K

read_rnd_buffer_size=256K

sort_buffer_size=512K

max_connections=1024

试验二:

table_cache=512或1024

innodb_additional_mem_pool_size=8M

innodb_flush_log_at_trx_commit=0

innodb_log_buffer_size=4M

innodb_thread_concurrency=8

key_buffer_size=128M

tmp_table_size=128M

read_buffer_size=4M

read_rnd_buffer_size=16M

sort_buffer_size=32M

max_connections=1024

一般:

table_cache=512

innodb_additional_mem_pool_size=8M

innodb_flush_log_at_trx_commit=0

innodb_log_buffer_size=4M

innodb_thread_concurrency=8

key_buffer_size=128M

tmp_table_size=128M

read_buffer_size=4M

read_rnd_buffer_size=16M

sort_buffer_size=32M

max_connections=1024

经过测试.没有特殊情况,最好还是用默认的.

2G内存,针对站多,抗压型的设置,最佳:

table_cache=1024 物理内存越大,设置就越大.默认为2402,调到512-1024最佳

innodb_additional_mem_pool_size=4M 默认为2M

innodb_flush_log_at_trx_commit=1

(设置为0就是等到innodb_log_buffer_size列队满后再统一储存,默认为1)

innodb_log_buffer_size=2M 默认为1M

innodb_thread_concurrency=8 你的服务器CPU有几个就设置为几,建议用默认一般为8

key_buffer_size=256M 默认为218 调到128最佳

tmp_table_size=64M 默认为16M 调到64-256最挂

read_buffer_size=4M 默认为64K

read_rnd_buffer_size=16M 默认为256K

sort_buffer_size=32M 默认为256K

max_connections=1024 默认为1210

thread_cache_size=120 默认为60

query_cache_size=64M

优化mysql数据库性能的十个参数

(1)、max_connections:

允许的同时客户的数量。增加该值增加 mysqld 要求的文件描述符的数量。这个数字应该增加,否则,你将经常看到 too many connections 错误。 默认数值是100,我把它改为1024 。

(2)、record_buffer:

每个进行一个顺序扫描的线程为其扫描的每张表分配这个大小的一个缓冲区。如果你做很多顺序扫描,你可能想要增加该值。默认数值是131072(128k),我把它改为16773120 (16m)

(3)、key_buffer_size:

索引块是缓冲的并且被所有的线程共享。key_buffer_size是用于索引块的缓冲区大小,增加它可得到更好处理的索引(对所有读和多重写),到你能负担得起那样多。如果你使它太大,系统将开始换页并且真的变慢了。默认数值是8388600(8m),我的mysql主机有2gb内存,所以我把它改为 402649088(400mb)。

4)、back_log:

要求 mysql 能有的连接数量。当主要mysql线程在一个很短时间内得到非常多的连接请求,这就起作用,然后主线程花些时间(尽管很短)检查连接并且启动一个新线程。

back_log 值指出在mysql暂时停止回答新请求之前的短时间内多少个请求可以被存在堆栈中。只有如果期望在一个短时间内有很多连接,你需要增加它,换句话说,这值对到来的tcp/ip连接的侦听队列的大小。你的 *** 作系统在这个队列大小上有它自己的限制。试图设定back_log高于你的 *** 作系统的限制将是无效的。

当你观察你的主机进程列表,发现大量 264084 | unauthenticated user | xxx.xxx.xxx.xxx | null | connect | null | login | null 的待连接进程时,就要加大 back_log 的值了。默认数值是50,我把它改为500。

(5)、interactive_timeout:

服务器在关闭它前在一个交互连接上等待行动的秒数。一个交互的客户被定义为对 mysql_real_connect()使用 client_interactive 选项的客户。 默认数值是28800,我把它改为7200。

(6)、sort_buffer:

每个需要进行排序的线程分配该大小的一个缓冲区。增加这值加速order by或group by *** 作。默认数值是2097144(2m),我把它改为 16777208 (16m)。

(7)、table_cache:

为所有线程打开表的数量。增加该值能增加mysqld要求的文件描述符的数量。mysql对每个唯一打开的表需要2个文件描述符。默认数值是64,我把它改为512。

(8)、thread_cache_size:

可以复用的保存在中的线程的数量。如果有,新的线程从缓存中取得,当断开连接的时候如果有空间,客户的线置在缓存中。如果有很多新的线程,为了提高性能可以这个变量值。通过比较 connections 和 threads_created 状态的变量,可以看到这个变量的作用。我把它设置为 80。

(9)mysql的搜索功能

用mysql进行搜索,目的是能不分大小写,又能用中文进行搜索

只需起动mysqld时指定 --default-character-set=gb2312

(10)、wait_timeout:

服务器在关闭它之前在一个连接上等待行动的秒数。 默认数值是28800,我把它改为7200。

注:参数的调整可以通过修改 /etc/my.cnf 文件并重启 mysql 实现。这是一个比较谨慎的工作,上面的结果也仅仅是我的一些看法,你可以根据你自己主机的硬件情况(特别是内存大小)进一步修改。


欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/zaji/7408735.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2023-04-05
下一篇 2023-04-05

发表评论

登录后才能评论

评论列表(0条)

保存