经验首页 前端设计 程序设计 Java相关 移动开发 数据库/运维 软件/图像 大数据/云计算 其他经验
当前位置:技术经验 » 大数据/云/AI » Hadoop » 查看文章
MapReduce基础
来源:cnblogs  作者:zhouhb  时间:2019/2/12 9:27:33  对本文有异议

1. WordCount程序

1.1 WordCount源程序

  1. import java.io.IOException;
  2. import java.util.Iterator;
  3. import java.util.StringTokenizer;
  4. import org.apache.hadoop.conf.Configuration;
  5. import org.apache.hadoop.fs.Path;
  6. import org.apache.hadoop.io.IntWritable;
  7. import org.apache.hadoop.io.Text;
  8. import org.apache.hadoop.mapreduce.Job;
  9. import org.apache.hadoop.mapreduce.Mapper;
  10. import org.apache.hadoop.mapreduce.Reducer;
  11. import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
  12. import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
  13. import org.apache.hadoop.util.GenericOptionsParser;
  14. public class WordCount {
  15. public WordCount() {
  16. }
  17. public static void main(String[] args) throws Exception {
  18. Configuration conf = new Configuration();
  19. String[] otherArgs = (new GenericOptionsParser(conf, args)).getRemainingArgs();
  20. if(otherArgs.length < 2) {
  21. System.err.println("Usage: wordcount <in> [<in>...] <out>");
  22. System.exit(2);
  23. }
  24. Job job = Job.getInstance(conf, "word count");
  25. job.setJarByClass(WordCount.class);
  26. job.setMapperClass(WordCount.TokenizerMapper.class);
  27. job.setCombinerClass(WordCount.IntSumReducer.class);
  28. job.setReducerClass(WordCount.IntSumReducer.class);
  29. job.setOutputKeyClass(Text.class);
  30. job.setOutputValueClass(IntWritable.class);
  31. for(int i = 0; i < otherArgs.length - 1; ++i) {
  32. FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
  33. }
  34. FileOutputFormat.setOutputPath(job, new Path(otherArgs[otherArgs.length - 1]));
  35. System.exit(job.waitForCompletion(true)?0:1);
  36. }
  37. public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
  38. private static final IntWritable one = new IntWritable(1);
  39. private Text word = new Text();
  40. public TokenizerMapper() {
  41. }
  42. public void map(Object key, Text value, Mapper<Object, Text, Text, IntWritable>.Context context) throws IOException, InterruptedException {
  43. StringTokenizer itr = new StringTokenizer(value.toString());
  44. while(itr.hasMoreTokens()) {
  45. this.word.set(itr.nextToken());
  46. context.write(this.word, one);
  47. }
  48. }
  49. }
  50. public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
  51. private IntWritable result = new IntWritable();
  52. public IntSumReducer() {
  53. }
  54. public void reduce(Text key, Iterable<IntWritable> values, Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
  55. int sum = 0;
  56. IntWritable val;
  57. for(Iterator i$ = values.iterator(); i$.hasNext(); sum += val.get()) {
  58. val = (IntWritable)i$.next();
  59. }
  60. this.result.set(sum);
  61. context.write(key, this.result);
  62. }
  63. }
  64. }

 

1.2 运行程序,Run As->Java Applicatiion

1.3 编译打包程序,产生Jar文件

 

2 运行程序

2.1 建立要统计词频的文本文件

wordfile1.txt

Spark Hadoop

Big Data

wordfile2.txt

Spark Hadoop

Big Cloud

2.2 启动hdfs,新建input文件夹,上传词频文件

cd /usr/local/hadoop/

./sbin/start-dfs.sh 

./bin/hadoop fs -mkdir input

./bin/hadoop fs -put /home/hadoop/wordfile1.txt input

./bin/hadoop fs -put /home/hadoop/wordfile2.txt input

2.3 查看已上传的词频文件:

hadoop@dblab-VirtualBox:/usr/local/hadoop$ ./bin/hadoop fs -ls .
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2019-02-11 15:40 input
-rw-r--r-- 1 hadoop supergroup 5 2019-02-10 20:22 test.txt
hadoop@dblab-VirtualBox:/usr/local/hadoop$ ./bin/hadoop fs -ls ./input
Found 2 items
-rw-r--r-- 1 hadoop supergroup 27 2019-02-11 15:40 input/wordfile1.txt
-rw-r--r-- 1 hadoop supergroup 29 2019-02-11 15:40 input/wordfile2.txt

2.4 运行WordCount

./bin/hadoop jar /home/hadoop/WordCount.jar input output

屏幕上会输入大段信息

 然后可以查看运行结果:

hadoop@dblab-VirtualBox:/usr/local/hadoop$ ./bin/hadoop fs -cat output/*
Hadoop 2
Spark 2
---

 

原文链接:http://www.cnblogs.com/zhouhb/p/10362327.html

 友情链接:直通硅谷  点职佳  北美留学生论坛

本站QQ群:前端 618073944 | Java 606181507 | Python 626812652 | C/C++ 612253063 | 微信 634508462 | 苹果 692586424 | C#/.net 182808419 | PHP 305140648 | 运维 608723728

W3xue 的所有内容仅供测试,对任何法律问题及风险不承担任何责任。通过使用本站内容随之而来的风险与本站无关。
关于我们  |  意见建议  |  捐助我们  |  报错有奖  |  广告合作、友情链接(目前9元/月)请联系QQ:27243702 沸活量
皖ICP备17017327号-2 皖公网安备34020702000426号