常用的Hadoop测试程序和Benchmark(2.7.1) 2016-03-20 21:30

简介

PI计算

一般用于测试能否运行Job

# 第一个参数是mapper个数,第二个参数是每一个mapper取样的个数
hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar pi 16 1000

结果:

[hadoop@ctrl ~]$ hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar pi 16 1000
Number of Maps  = 16
Samples per Map = 1000
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Wrote input for Map #10
Wrote input for Map #11
Wrote input for Map #12
Wrote input for Map #13
Wrote input for Map #14
Wrote input for Map #15
Starting Job
16/03/20 21:04:55 INFO input.FileInputFormat: Total input paths to process : 16
16/03/20 21:04:55 INFO mapreduce.JobSubmitter: number of splits:16
16/03/20 21:04:55 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1458479071758_0001
16/03/20 21:04:56 INFO impl.YarnClientImpl: Submitted application application_1458479071758_0001
16/03/20 21:04:56 INFO mapreduce.Job: The url to track the job: http://0.0.0.0:8888/proxy/application_1458479071758_0001/
16/03/20 21:04:56 INFO mapreduce.Job: Running job: job_1458479071758_0001
16/03/20 21:05:13 INFO mapreduce.Job: Job job_1458479071758_0001 running in uber mode : false
16/03/20 21:05:13 INFO mapreduce.Job:  map 0% reduce 0%
16/03/20 21:05:32 INFO mapreduce.Job:  map 6% reduce 0%
16/03/20 21:05:43 INFO mapreduce.Job:  map 13% reduce 0%
16/03/20 21:05:52 INFO mapreduce.Job:  map 19% reduce 0%
16/03/20 21:06:00 INFO mapreduce.Job:  map 25% reduce 0%
16/03/20 21:06:10 INFO mapreduce.Job:  map 31% reduce 0%
16/03/20 21:06:18 INFO mapreduce.Job:  map 38% reduce 0%
16/03/20 21:06:27 INFO mapreduce.Job:  map 44% reduce 0%
16/03/20 21:06:36 INFO mapreduce.Job:  map 50% reduce 0%
16/03/20 21:06:45 INFO mapreduce.Job:  map 56% reduce 0%
16/03/20 21:06:53 INFO mapreduce.Job:  map 63% reduce 0%
16/03/20 21:07:02 INFO mapreduce.Job:  map 69% reduce 0%
16/03/20 21:07:11 INFO mapreduce.Job:  map 75% reduce 0%
16/03/20 21:07:21 INFO mapreduce.Job:  map 81% reduce 0%
16/03/20 21:07:29 INFO mapreduce.Job:  map 88% reduce 0%
16/03/20 21:07:37 INFO mapreduce.Job:  map 94% reduce 0%
16/03/20 21:07:54 INFO mapreduce.Job:  map 100% reduce 0%
16/03/20 21:07:57 INFO mapreduce.Job:  map 100% reduce 100%
16/03/20 21:08:06 INFO mapreduce.Job: Job job_1458479071758_0001 completed successfully
16/03/20 21:08:07 INFO mapreduce.Job: Counters: 50
    File System Counters
        FILE: Number of bytes read=358
        FILE: Number of bytes written=2132064
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=4102
        HDFS: Number of bytes written=215
        HDFS: Number of read operations=67
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=3
    Job Counters
        Launched map tasks=16
        Launched reduce tasks=1
        Data-local map tasks=8
        Rack-local map tasks=8
        Total time spent by all maps in occupied slots (ms)=545376
        Total time spent by all reduces in occupied slots (ms)=131712
        Total time spent by all map tasks (ms)=136344
        Total time spent by all reduce tasks (ms)=16464
        Total vcore-seconds taken by all map tasks=136344
        Total vcore-seconds taken by all reduce tasks=16464
        Total megabyte-seconds taken by all map tasks=139616256
        Total megabyte-seconds taken by all reduce tasks=33718272
    Map-Reduce Framework
        Map input records=16
        Map output records=32
        Map output bytes=288
        Map output materialized bytes=448
        Input split bytes=2214
        Combine input records=0
        Combine output records=0
        Reduce input groups=2
        Reduce shuffle bytes=448
        Reduce input records=32
        Reduce output records=0
        Spilled Records=64
        Shuffled Maps =16
        Failed Shuffles=0
        Merged Map outputs=16
        GC time elapsed (ms)=5715
        CPU time spent (ms)=22850
        Physical memory (bytes) snapshot=4568145920
        Virtual memory (bytes) snapshot=26378506240
        Total committed heap usage (bytes)=3193962496
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters
        Bytes Read=1888
    File Output Format Counters
        Bytes Written=97
Job Finished in 193.403 seconds
Estimated value of Pi is 3.14250000000000000000

DFSIO

一个Hadoop内置的benchmark,用于HDFS I/O测试。

测试结果存放在HDFS的如下路径中:/benchmarks/TestDFSIO

测试读:

hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.1.jar TestDFSIO -read -nrFiles 16 -fileSize 1GB -resFile /tmp/$USER-dfsio-read.txt

测试写:

hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.1.jar TestDFSIO -write -nrFiles 16 -fileSize 1GB -resFile /tmp/$USER-dfsio-write.txt

结果:

[hadoop@ctrl ~]$ hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.1.jar TestDFSIO -write -nrFiles 16 -fileSize 32MB -resFile /tmp/$USER-dfsio-write.txt
16/03/20 22:11:41 INFO fs.TestDFSIO: TestDFSIO.1.8
16/03/20 22:11:41 INFO fs.TestDFSIO: nrFiles = 16
16/03/20 22:11:41 INFO fs.TestDFSIO: nrBytes (MB) = 32.0
16/03/20 22:11:41 INFO fs.TestDFSIO: bufferSize = 1000000
16/03/20 22:11:41 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
16/03/20 22:11:43 INFO fs.TestDFSIO: creating control file: 33554432 bytes, 16 files
16/03/20 22:11:45 INFO fs.TestDFSIO: created control files for: 16 files
16/03/20 22:11:46 INFO mapred.FileInputFormat: Total input paths to process : 16
16/03/20 22:11:46 INFO mapreduce.JobSubmitter: number of splits:16
16/03/20 22:11:47 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1458479071758_0003
16/03/20 22:11:47 INFO impl.YarnClientImpl: Submitted application application_1458479071758_0003
16/03/20 22:11:47 INFO mapreduce.Job: The url to track the job: http://0.0.0.0:8888/proxy/application_1458479071758_0003/
16/03/20 22:11:47 INFO mapreduce.Job: Running job: job_1458479071758_0003
16/03/20 22:17:04 INFO mapreduce.Job: Job job_1458479071758_0003 running in uber mode : false
16/03/20 22:17:04 INFO mapreduce.Job:  map 0% reduce 0%
16/03/20 22:17:17 INFO mapreduce.Job:  map 6% reduce 0%
16/03/20 22:17:30 INFO mapreduce.Job:  map 13% reduce 0%
16/03/20 22:17:45 INFO mapreduce.Job:  map 19% reduce 0%
16/03/20 22:17:56 INFO mapreduce.Job:  map 25% reduce 0%
16/03/20 22:18:09 INFO mapreduce.Job:  map 31% reduce 0%
16/03/20 22:18:26 INFO mapreduce.Job:  map 38% reduce 0%
16/03/20 22:18:38 INFO mapreduce.Job:  map 44% reduce 0%
16/03/20 22:18:49 INFO mapreduce.Job:  map 50% reduce 0%
16/03/20 22:18:59 INFO mapreduce.Job:  map 56% reduce 0%
16/03/20 22:19:13 INFO mapreduce.Job:  map 60% reduce 0%
16/03/20 22:19:16 INFO mapreduce.Job:  map 63% reduce 0%
16/03/20 22:19:26 INFO mapreduce.Job:  map 69% reduce 0%
16/03/20 22:19:39 INFO mapreduce.Job:  map 75% reduce 0%
16/03/20 22:19:57 INFO mapreduce.Job:  map 81% reduce 0%
16/03/20 22:20:08 INFO mapreduce.Job:  map 88% reduce 0%
16/03/20 22:20:23 INFO mapreduce.Job:  map 94% reduce 0%
16/03/20 22:20:40 INFO mapreduce.Job:  map 98% reduce 0%
16/03/20 22:20:43 INFO mapreduce.Job:  map 100% reduce 31%
16/03/20 22:20:46 INFO mapreduce.Job:  map 100% reduce 100%
16/03/20 22:20:47 INFO mapreduce.Job: Job job_1458479071758_0003 completed successfully
16/03/20 22:20:48 INFO mapreduce.Job: Counters: 50
    File System Counters
        FILE: Number of bytes read=1338
        FILE: Number of bytes written=2135928
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=3628
        HDFS: Number of bytes written=536870989
        HDFS: Number of read operations=67
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=18
    Job Counters
        Launched map tasks=16
        Launched reduce tasks=1
        Data-local map tasks=12
        Rack-local map tasks=4
        Total time spent by all maps in occupied slots (ms)=729984
        Total time spent by all reduces in occupied slots (ms)=157040
        Total time spent by all map tasks (ms)=182496
        Total time spent by all reduce tasks (ms)=19630
        Total vcore-seconds taken by all map tasks=182496
        Total vcore-seconds taken by all reduce tasks=19630
        Total megabyte-seconds taken by all map tasks=186875904
        Total megabyte-seconds taken by all reduce tasks=40202240
    Map-Reduce Framework
        Map input records=16
        Map output records=80
        Map output bytes=1172
        Map output materialized bytes=1428
        Input split bytes=1830
        Combine input records=0
        Combine output records=0
        Reduce input groups=5
        Reduce shuffle bytes=1428
        Reduce input records=80
        Reduce output records=5
        Spilled Records=160
        Shuffled Maps =16
        Failed Shuffles=0
        Merged Map outputs=16
        GC time elapsed (ms)=4364
        CPU time spent (ms)=43120
        Physical memory (bytes) snapshot=4851724288
        Virtual memory (bytes) snapshot=26460102656
        Total committed heap usage (bytes)=3534225408
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters
        Bytes Read=1798
    File Output Format Counters
        Bytes Written=77
16/03/20 22:20:48 WARN hdfs.DFSClient: DFSInputStream has been closed already
16/03/20 22:20:48 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
16/03/20 22:20:48 INFO fs.TestDFSIO:            Date & time: Sun Mar 20 22:20:48 CST 2016
16/03/20 22:20:48 INFO fs.TestDFSIO:        Number of files: 16
16/03/20 22:20:48 INFO fs.TestDFSIO: Total MBytes processed: 512.0
16/03/20 22:20:48 INFO fs.TestDFSIO:      Throughput mb/sec: 9.298102242803958
16/03/20 22:20:48 INFO fs.TestDFSIO: Average IO rate mb/sec: 11.259085655212402
16/03/20 22:20:48 INFO fs.TestDFSIO:  IO rate std deviation: 5.537918118173138
16/03/20 22:20:48 INFO fs.TestDFSIO:     Test exec time sec: 542.623
16/03/20 22:20:48 INFO fs.TestDFSIO:
[hadoop@ctrl ~]$
[hadoop@ctrl ~]$ hdfs dfs -cat /benchmarks/TestDFSIO/io_write/part-*
f:rate    180145.38
f:sqrate    2518968.8
l:size    536870912
l:tasks    16
l:time    55065
[hadoop@ctrl ~]$

测试完成后需要清理空间,否则占用太多HDFS空间。

清理方法:

hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.1.jar TestDFSIO -clean
Tags: #HDFS    Post on Hadoop