Python怎么获取HDFS文件的编码格式

Python怎么获取HDFS文件的编码格式,第1张

你好,你可以利用python3的python3-magic来获得文件的编码格式。下面是对应的代码

import magic

blob = open('unknown-file').read()

m = magic.open(magic.MAGIC_MIME_ENCODING)

m.load()

encoding = m.buffer(blob) # "utf-8" "us-ascii" etc

Python3 使用hdfs分布式文件储存系统

from pyhdfs import *

client = HdfsClient(hosts="testhdfs.org, 50070",

user_name="web_crawler")    #    创建一个连接

client.get_home_directory()    # 获取hdfs根路径

client.listdir(PATH)    # 获取hdfs指定路径下的文件列表

client.copy_from_local(file_path, hdfs_path, overwrite=True)    # 把本地文件拷贝到服务器,不支持文件夹;overwrite=True表示存在则覆盖

​client.delete(PATH, recursive=True)    # 删除指定文件

hdfs_path必须包含文件名及其后缀,不然不会成功

如果连接

HdfsClient

报错

Traceback (most recent call last):

  File "C:\Users\billl\AppData\Local\Continuum\anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2963, in run_code

    exec(code_obj, self.user_global_ns, self.user_ns)

  File "

    client.get_home_directory()

  File "C:\Users\billl\AppData\Local\Continuum\anaconda3\lib\site-packages\pyhdfs.py", line 565, in get_home_directory

    return _json(self._get('/', 'GETHOMEDIRECTORY', **kwargs))['Path']

  File "C:\Users\billl\AppData\Local\Continuum\anaconda3\lib\site-packages\pyhdfs.py", line 391, in _get

    return self._request('get', *args, **kwargs)

  File "C:\Users\billl\AppData\Local\Continuum\anaconda3\lib\site-packages\pyhdfs.py", line 377, in _request

    _check_response(response, expected_status)

  File "C:\Users\billl\AppData\Local\Continuum\anaconda3\lib\site-packages\pyhdfs.py", line 799, in _check_response

    remote_exception = _json(response)['RemoteException']

  File "C:\Users\billl\AppData\Local\Continuum\anaconda3\lib\site-packages\pyhdfs.py", line 793, in _json

    "Expected JSON. Is WebHDFS enabled? Got {!r}".format(response.text))

pyhdfs.HdfsException: Expected JSON. Is WebHDFS enabled? Got '\n\n\n\n

502 Server dropped connection

\n

The following error occurred while trying to access http://%2050070:50070/webhdfs/v1/?user.name=web_crawler&op=GETHOMEDIRECTORY :

\n 502 Server dropped connection

\n

Generated Fri, 21 Dec 2018 02:03:18 GMT by Polipo on .\n\r\n'

则一般是访问认证错误,可能原因是账户密码不正确或者无权限,或者本地网络不在可访问名单中


欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/tougao/8060644.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2023-04-13
下一篇 2023-04-13

发表评论

登录后才能评论

评论列表(0条)

保存