记一次重大的生产事故

记一次重大的生产事故,第1张

高高兴兴上班来,突闻任务大面积报错,经过一番排查,发现服务器上某个用户不见了。

这下可坏了,某azkaban上的所有任务都是切到此用户执行,而且此用户下还有大量的crontab任务,全部都执行不了,包括给大boss发的邮件。


我第一反应是赶紧把这个用户添加上,领导说等ldap自动同步就行,检查了下ldap服务,发现没问题,但是连不上ldap服务器,网络丢包严重,于是我就等网络组处理了~
期间我用其它用户执行脚本,也因为网络问题无法连接。

等啊等,等啊等…
运维老哥等不及了,手动添加了此用户,但脚本还是无法执行。报如下错误:
Exception in thread “main” java.lang.RuntimeException: java.io.IOException: Permission denied
Caused by: java.io.IOException: Permission denied
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createTempFile(File.java:2024)
查了下hive参数hive.exec.scratchdir设置的目录:/tmp/hive
权限没有问题
然后查看hive日志/tmp/xx用户/hive.log,发现要创建本地目录

而此本地目录的用户及用户组全是数字,数字是被删用户的uid和gid,虽然新加用户的名称与被删用户相同,但是id不一样。
解决办法:
修改目录及文件的所属用户和组
当然,报上面的错不一定是同样的原因,具体原因要看日志。


至于用户为什么不见了,运维给出的结论是网络异常,ldap服务器无法访问。这我不太能理解,就算网络不通也不至于删用户吧?被删了无法同步还有点可能。
Anyway,有人背锅就行了,到底是不是人为的不那么重要。

PS:
我测试了一下,只要用户被删,原属于此用户的文件就会变成id

被删前👆 被删后👇

详细日志:
INFO [main] session.SessionState: Created HDFS directory: /tmp/hive/xx/28651db7-6f1b-4162-9ea4-cafe969c335b
INFO [main] session.SessionState: Created local directory: /hive/local/xx/28651db7-6f1b-4162-9ea4-cafe969c335b
INFO [main] session.SessionState: Created HDFS directory: /tmp/hive/xx/28651db7-6f1b-4162-9ea4-cafe969c335b/_tmp_space.db
INFO [main] conf.HiveConf: Using the default value passed in for log id: 28651db7-6f1b-4162-9ea4-cafe969c335b
INFO [main] session.SessionState: Updating thread name to 28651db7-6f1b-4162-9ea4-cafe969c335b main
INFO [28651db7-6f1b-4162-9ea4-cafe969c335b main] CliDriver: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.

2022-04-20T21:26:21,302 INFO [21ec89d4-93b5-44e3-a8ba-0fc7ecfbbb46 main] CliDriver: Time taken: 4.602 seconds, Fetched: 1 row(s)
2022-04-20T21:26:21,302 INFO [21ec89d4-93b5-44e3-a8ba-0fc7ecfbbb46 main] conf.HiveConf: Using the default value passed in for log id: 21ec89d4-93b5-44e3-a8ba-0fc7ecfbbb46
2022-04-20T21:26:21,302 INFO [21ec89d4-93b5-44e3-a8ba-0fc7ecfbbb46 main] session.SessionState: Resetting thread name to main
2022-04-20T21:26:21,302 INFO [main] conf.HiveConf: Using the default value passed in for log id: 21ec89d4-93b5-44e3-a8ba-0fc7ecfbbb46
2022-04-20T21:26:21,306 INFO [main] session.SessionState: Deleted directory: /tmp/hive/xx/21ec89d4-93b5-44e3-a8ba-0fc7ecfbbb46 on fs with scheme hdfs
2022-04-20T21:26:21,311 INFO [main] session.SessionState: Deleted directory: /hive/local/xx/21ec89d4-93b5-44e3-a8ba-0fc7ecfbbb46 on fs with scheme file
2022-04-20T21:26:21,312 INFO [main] metastore.HiveMetaStore: 0: Cleaning up thread local RawStore…
2022-04-20T21:26:21,312 INFO [main] HiveMetaStore.audit: ugi=xx ip=unknown-ip-addr cmd=Cleaning up thread local RawStore…
2022-04-20T21:26:21,312 INFO [main] metastore.HiveMetaStore: 0: Done cleaning up thread local RawStore
2022-04-20T21:26:21,312 INFO [main] HiveMetaStore.audit: ugi=xx ip=unknown-ip-addr cmd=Done cleaning up thread local RawStore
2022-04-21T00:00:14,722 INFO [main] metadata.Hive: Registering function row_seq com.bigdata.hive.udf.impl.HLSequenceGenerator
2022-04-21T00:00:14,723 WARN [main] metadata.Hive: Failed to register persistent function row_seq:com.bigdata.hive.udf.impl.HLSequenceGenerator. Ignore and continue.
2022-04-21T00:00:14,723 INFO [main] metadata.Hive: Registering function decrypt_hyd com.credithc.udf.DecryptHydDataFunction
2022-04-21T00:00:14,724 INFO [main] metadata.Hive: Registering function udfsharesc org.yjy.udfshare.udfsharesc
2022-04-21T00:00:14,725 INFO [main] metadata.Hive: Registering function fieldcrc32 com.cn.HiveUDF
2022-04-21T00:00:14,726 WARN [main] metadata.Hive: Failed to register persistent function fieldcrc32:com.cn.HiveUDF. Ignore and continue.
2022-04-21T00:00:14,726 INFO [main] metadata.Hive: Registering function decrypt_mobile com.credithc.udf.DecryptBdpDataFunction
2022-04-21T00:00:14,726 INFO [main] metadata.Hive: Registering function decrypt_mobile com.credithc.udf.DecryptBdpDataFunction
2022-04-21T00:00:14,727 INFO [main] metadata.Hive: Registering function parse_json com.hc.udf.JsonPar
2022-04-21T00:00:14,727 WARN [main] metadata.Hive: Failed to register persistent function parse_json:com.hc.udf.JsonPar. Ignore and continue.
2022-04-21T00:00:14,727 INFO [main] metadata.Hive: Registering function parse_json com.hc.udf.JsonPar
2022-04-21T00:00:14,728 WARN [main] metadata.Hive: Failed to register persistent function parse_json:com.hc.udf.JsonPar. Ignore and continue.
2022-04-21T00:00:14,728 INFO [main] metadata.Hive: Registering function getduration com.hc.HUGetDuration
2022-04-21T00:00:14,734 INFO [main] metadata.Hive: Registering function encrypt_bdp com.credithc.udf.EncryptBdpDataFunction
2022-04-21T00:00:14,734 INFO [main] metadata.Hive: Registering function addfive com.atguigu.myudf.MyUdf
2022-04-21T00:00:14,735 WARN [main] metadata.Hive: Failed to register persistent function addfive:com.atguigu.myudf.MyUdf. Ignore and continue.
2022-04-21T00:00:14,735 INFO [main] metadata.Hive: Registering function splitstr com.atguigu.myudf.MyUdf
2022-04-21T00:00:14,736 WARN [main] metadata.Hive: Failed to register persistent function splitstr:com.atguigu.myudf.MyUdf. Ignore and continue.
2022-04-21T00:00:14,736 INFO [main] metadata.Hive: Registering function splitstr2 com.atguigu.myudtf.MyUdtf
2022-04-21T00:00:14,736 WARN [main] metadata.Hive: Failed to register persistent function splitstr2:com.atguigu.myudtf.MyUdtf. Ignore and continue.

欢迎分享,转载请注明来源:内存溢出

原文地址: https://outofmemory.cn/langs/719590.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-04-25
下一篇 2022-04-25

发表评论

登录后才能评论

评论列表(0条)

保存