使用命令行编译打包运行自己的MapReduce程序(2.4.1) 2015-02-13 20:00

说明

基于Hadoop2.4.1版本。

编译

  • 依赖的JAR包

运行MapReduce程序需要依赖如下Jar包:

  • $HADOOP_HOME/share/hadoop/common/hadoop-common-2.4.1.jar
  • $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.4.1.jar
  • $HADOOP_HOME/share/hadoop/common/lib/commons-cli-1.2.jar
  • CLASSPATH环境变量

将如上各个JAR包和当前目录加入CLASSPATH中:

export HADOOP_HOME=/opt/hadoop/client/hadoop-2.4.1
    export CLASSPATH=$CLASSPATH:$HADOOP_HOME/share/hadoop/common/hadoop-common-2.4.1.jar:$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.4.1.jar:$HADOOP_HOME/share/hadoop/common/lib/commons-cli-1.2.jar:.
  • 编译
1
javac WordCount.java
  • 打包成JAR
1
jar -cvf WordCount.jar ./WordCount*.class

运行

  • 准备相关HDFS文件

以hadoop的系统管理员账号给HDFS的/tmp目录分配权限,保证执行MapReduce程序的用户cheyo有读写/tmp目录的权限:

1
2
3
su - hadoop
hdfs dfs -mkdir -p /tmp
hdfs dfs -chmod -R 777 /tmp

创建几个文本文件,并上传至HDFS:

1
2
3
4
5
6
su - cheyo
echo "echo of the rainbow" > file1
echo "the waiting game" > file2
hdfs dfs -mkdir /tmp/input
hdfs dfs -mkdir /tmp/output
hdfs dfs -put file* /tmp/input
  • 开始运行
1
hadoop jar WordCount.jar org.apache.hadoop.examples.WordCount /tmp/input /tmp/output
  • 查看结果
1
2
3
4
5
6
7
8
[cheyo@cheyo ~]$ hdfs dfs -cat /tmp/output/*
echo    1
game    1
of      1
rainbow 1
the     2
waiting 1
[cheyo@cheyo ~]$

参考文档

Tags: #MapReduce    Post on Hadoop