如何运行自带wordcount_软件运维

1.找到examples例子

我们需要找打这个例子的位置：首先需要找到你的hadoop文件夹，然后依照下面路径：

/hadoop/share/hadoop/mapreduce会看到如下图：

hadoop-mapreduce-examples-2.2.0.jar

第二步：

我们需要需要做一下运行需要的工作，比如输入输出路径，上传什么文件等。

1.先在HDFS创建几个数据目录：

hadoop fs -mkdir -p /data/wordcount

hadoop fs -mkdir -p /output/

2.目录/data/wordcount用来存放Hadoop自带的WordCount例子的数据文件，运行这个MapReduce任务的结果输出到/output/wordcount目录中。

首先新建文件inputWord：

vi /usr/inputWord

新建完毕，查看内容：

cat /usr/inputWord

将本地文件上传到HDFS中：

hadoop fs -put /usr/inputWord /data/wordcount/

可以查看上传后的文件情况，执行如下命令：

hadoop fs -ls /data/wordcount

可以看到上传到HDFS中的文件。

通过命令

hadoop fs -text /data/wordcount/inputWord

看到如下内容：

下面，运行WordCount例子，执行如下命令：

hadoop jar /usr/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /data/wordcount /output/wordcount

可以看到控制台输出程序运行的信息

查看结果，执行如下命令：

hadoop fs -text /output/wordcount/part-r-00000

结果数据示例如下：

登录到Web控制台，访问链接http.//master:8088/可以看到任务记录情况。

一、编写目的

开发的MapReduce在提交到Hadoop集群运行之前，测试是否有bug，希望能在本地使用启动main方法的形式查看是否有错误存在，方便程序的检查和修改。本文档主要针对Windows环境下进行MapReduce开发。

二、环境

系统：Windows7

开发环境：eclipse

Hadoop版本：2.6.0

准备工作：下载hadoop-2.6.0.tar.gz，解压到磁盘某目录下，然后需要将Hadoop2.6.0加入到环境变量中，设置如下：

然后再Path中增加：%HADOOP_HOME%\bin

三、以WordCount为例详述运行过程及遇到的问题

1、开发WordCount程序

public class WordCount {

public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{

private final static IntWritable one = new IntWritable(1)

private Text word = new Text()

public void map(Object key, Text value, Context context) throws IOException, InterruptedException {

StringTokenizer itr = new StringTokenizer(value.toString())

while (itr.hasMoreTokens()) {

word.set(itr.nextToken())

context.write(word, one)

}

public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable>{

private IntWritable result = new IntWritable()

public void reduce(Text key, Iterable<IntWritable>values, Context context) throws IOException, InterruptedException {

int sum = 0

for (IntWritable val : values) {

sum += val.get()

}

result.set(sum)

context.write(key, result)

}

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration() // 这里这么设置就可以了

String[] otherArgs = {"hdfs://imageHandler1:9000/tmp/log/test.log", "hdfs://imageHandler1:9000/tmp/testout111"}// 可以是hdfs上的路径

//String[] otherArgs = {"D:/test.log", "D:/test/wordcountout"}// 也可以是本地路径

Job job = Job.getInstance(conf, "word count")

job.setJarByClass(LocalWordCount.class)

job.setMapperClass(TokenizerMapper.class)

job.setCombinerClass(IntSumReducer.class)

job.setReducerClass(IntSumReducer.class)

job.setOutputKeyClass(Text.class)

job.setOutputValueClass(IntWritable.class)

for (int i = 0i <otherArgs.length - 1++i) {

FileInputFormat.addInputPath(job, new Path(otherArgs[i]))

}

FileOutputFormat.setOutputPath(job, new Path(otherArgs[otherArgs.length - 1]))

System.exit(job.waitForCompletion(true) ? 0 : 1)

}

2、运行WordCount

（1）此时使用Run as->Java Application运行，会报如下类似错误：

2015-01-22 15:31:47,782 [main] WARN org.apache.hadoop.util.NativeCodeLoader (NativeCodeLoader.java:62) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

2015-01-22 15:31:47,793 [main] ERROR org.apache.hadoop.util.Shell (Shell.java:373) - Failed to locate the winutils binary in the hadoop binary path

java.io.IOException: Could not locate executable D:\hbl_study\hadoop2\hadoop-2.6.0\bin\winutils.exe in the Hadoop binaries.

at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:355)

at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:370)

......

该错误是找不到winutils.exe，需要将winutils.exe拷贝到hadoop2.6.0/bin目录下，winutils.exe如下：

（2）再次运行报错类似：

Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/StringI)Z

at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)

at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:557)

at org.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:977)

......

该错误是缺少hadoop.dll(hadoop2.6.0编译的版本)文件，需要将hadoop.dll拷贝到hadoop2.6.0/bin目录下，hadoop.dll如下：

再次运行没有报错。

说明：在网上有很多hadoop.dll资源，我开始下载了一个，放入hadoop2.6.0/bin后报错如下：

java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSumsByteArray(II[BI[BIILjava/lang/StringJZ)V

at org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSumsByteArray(Native Method)

at org.apache.hadoop.util.NativeCrc32.calculateChunkedSumsByteArray(NativeCrc32.java:86)

at org.apache.hadoop.util.DataChecksum.calculateChunkedSums(DataChecksum.java:430)

......

这是由于我下载的hadoop.dll是hadoop2.2.0编译成的文件，网上大部分hadoop.dll都是hadoop2.2.0编译而成的，因此在使用hadoop2.6.0运行程序时会报错，推测可能是版本不匹配或者对应的类已经发生了改变，原来版本编译的hadoop.dll已经不适用。因此我自己编译了一个hadoop2.6.0对应的hadoop.dll，问题得到解决。如果以后hadoop继续进行升级，我编译好的hadoop.dll也不再使用，因此下面我分享一下我的编译方法，以供版本变化后可以自己编译该文件。

四、Window7 编译Hadoop2.6.0源码生成hadoop.dll

说明：在Windows7环境中我并没有将源码完全编译成功，只是成功生成了hadoop.dll。我暂没有找到在Windows7下编译全部hadoop源码成功的方法。

1、准备工作：

（1）下载hadoop-2.6.0-src.tar.gz

（2）Microsoft Windows SDK v7.1或Visual Studio 2010

（3）Maven3.0以上，我使用的3.1.1，安装后需要配置环境变量如下

在Path中加入：%maven_home%\bin

输入mvn -version验证。

（4）Protocol Buffers 2.5.0，现在已经下载不到，附上该附件如下：

安装方法：解压protobuf-2.5.0.tar.gz到某目录下，例如D:\protobuf-2.5.0，解压protoc-2.5.0-win32.zip获得protoc.exe，将protoc.exe放入D:/protobuf-2.5.0目录下，并在环境变量Path中加入D:\protobuf-2.5.0。打开命令行输入“protoc --version”验证，若显示libprotoc 2.5.0代表安装成功。

（5）Cygwin

（6）JDK 1.6+，我使用的是JDK1.7.0_60

（7）CMake2.6以上，我用的版本是3.1.0，cmake-3.1.0-win32-x86.zip

解压后配置环境变量：

在Path中加入%CMAKE_HOME%\bin

（8）畅通的网络

2、开始编译

如果使用Microsoft Windows SDK v7.1，需要打开“开始”--“所有程序”--“Microsoft Windows SDK v7.1”--“Windows SDK 7.1 Command Prompt”，进入VC++的命令行工具（一定要从此处进入方可顺利编译Hadoop源代码，记着是以管理员身份运行）。

切换至源代码根目录，执行编译命令：mvn package -Pdist,native-win -DskipTests -Dtar

等待一段时间会有一个类似下面的报错：

[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2:exec (compile-ms-winutils) on project hadoop-common: Command execution failed. Process exited with an error: 1(Exit value: 1) ->[Help 1]

[ERROR]

[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.

[ERROR] Re-run Maven using the -X switch to enable full debug logging.

[ERROR]

[ERROR] For more information about the errors and possible solutions, please read the following articles:

[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException

[ERROR]

[ERROR] After correcting the problems, you can resume the build with the command

[ERROR] mvn <goals>-rf :hadoop-common

目前我没有找到好的解决办法，只能修改hadoop2.6.0\hadoop-common-project\hadoop-common目录下的pom.xml文件：搜索“${basedir}/src/main/winutils/winutils.sln”，将这段代码所在的<execution>注释掉。

<!--<execution>

<id>compile-ms-winutils</id>

<phase>compile</phase>

<goals>

</goals>

<executable>msbuild</executable>

<argument>${basedir}/src/main/winutils/winutils.sln</argument>

<argument>/nologo</argument>

<argument>/p:Configuration=Release</argument>

<argument>/p:OutDir=${project.build.directory}/bin/</argument>

<argument>/p:IntermediateOutputPath=${project.build.directory}/winutils/</argument>

<argument>/p:WsceConfigDir=${wsce.config.dir}</argument>

<argument>/p:WsceConfigFile=${wsce.config.file}</argument>

</arguments>

</configuration>

</execution>-->

再进行编译，后来会报错，但是此时在hadoop2.6.0\hadoop-common-project\hadoop-common\target\hadoop-common-2.6.0\bin目录下已经生成了hadoop.dll文件，我们的目的达到了。

如果使用VS2010，需要在Path中加入“C:\Windows\Microsoft.NET\Framework64\v4.0.30319”。然后打开命令提示符进入到源码根目录输入编译命令即可。

这两种方式我都亲测过。

以上任何对环境变量的修改，都需要重新启动电脑使配置生效，因此可将所需软件全部安装配置好后再重启电脑。

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/yw/11826542.html

如何运行自带wordcount

发表评论

评论列表（0条）