Yarn的基于标签转发策略Label-based Scheduling(2.7.1) 2015-10-28 21:02

概述

举例:共3个数据节点,第1个节点无标签,第2个节点带fastcpu标签,第3个节点带highmem标签。

注意:目前还不支持一个节点有多个标签,目前只能一个节点一个标签。

使用

  • 首先对于所有的标签都需要通过如下命令先添加,再使用:
yarn rmadmin -addToClusterNodeLabels fastcpu,highmem
  • 给各个节点打标签
#data01不打标签,不需要执行此命令
yarn rmadmin -replaceLabelsOnNode "data02,fastcpu"
yarn rmadmin -replaceLabelsOnNode "data03,highmem"
  • 查询标签情况
yarn cluster --list-node-labels
yarn node -status data01:45454
  • 查询队列情况
hadoop queue -list

通过Yarn Web的Node Labels也可以查看各个节点的标签。

  • 在yarn-site中添加相关配置
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
<property>
    <name>yarn.node-labels.enabled</name>
    <value>true</value>
</property>
<property>
    <name>yarn.nodemanager.address</name>
    <value>0.0.0.0:45454</value>
    <description>配置节点的Node ID</description>
</property>
<property>
    <name>yarn.node-labels.manager-class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager</value>
</property>
<property>
    <name>yarn.node-labels.fs-store.root-dir</name>
    <value>hdfs://myns1/yarn/node-labels</value>
    <description>标签数据在HDFS上的存储位置</description>
</property>
  • 配置Yarn的Capacity Scheduler

在队列中设置标签信息。

举例:

yarn.scheduler.capacity.root.queues=default,guangzhou,shenzhen,zhuhai
yarn.scheduler.capacity.root.default.capacity=10
yarn.scheduler.capacity.root.guangzhou.capacity=30
yarn.scheduler.capacity.root.shenzhen.capacity=30
yarn.scheduler.capacity.root.zhuhai.capacity=30

yarn.scheduler.capacity.root.default.maximum-capacity=100
yarn.scheduler.capacity.root.guangzhou.maximum-capacity=100
yarn.scheduler.capacity.root.shenzhen.maximum-capacity=100
yarn.scheduler.capacity.root.zhuhai.maximum-capacity=100

#*表示所有标签
yarn.scheduler.capacity.root.accessible-node-labels=*
yarn.scheduler.capacity.root.shenzhen.accessible-node-labels=fastcpu
yarn.scheduler.capacity.root.zhuhai.accessible-node-labels=highmem

yarn.scheduler.capacity.root.accessible-node-labels.fastcpu.capacity=60
yarn.scheduler.capacity.root.accessible-node-labels.highmem.capacity=40
yarn.scheduler.capacity.root.shenzhen.accessible-node-labels.fastcpu.capacity=100
yarn.scheduler.capacity.root.zhuhai.accessible-node-labels.highmem.capacity=100

#注意:第一个逗号前的空格不可少,表示无标签
yarn.scheduler.capacity.root.default-node-label-expression= ,fastcpu,highmem
#注意:值是一个空格
yarn.scheduler.capacity.root.default.default-node-label-expression= 
#注意:值是一个空格
yarn.scheduler.capacity.root.guangzhou.default-node-label-expression= 
yarn.scheduler.capacity.root.shenzhen.default-node-label-expression=fastcpu
yarn.scheduler.capacity.root.zhuhai.default-node-label-expression=highmem

相应的配置如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
<property>
  <name>yarn.scheduler.capacity.root.queues</name>
  <value>default,guangzhou,shenzhen,zhuhai</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.default.capacity</name>
  <value>10</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.guangzhou.capacity</name>
  <value>30</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.shenzhen.capacity</name>
  <value>30</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.zhuhai.capacity</name>
  <value>30</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
  <value>100</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.guangzhou.maximum-capacity</name>
  <value>100</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.shenzhen.maximum-capacity</name>
  <value>100</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.zhuhai.maximum-capacity</name>
  <value>100</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.accessible-node-labels</name>
  <value>*</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.shenzhen.accessible-node-labels</name>
  <value>fastcpu</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.zhuhai.accessible-node-labels</name>
  <value>highmem</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.accessible-node-labels.fastcpu.capacity</name>
  <value>60</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.accessible-node-labels.highmem.capacity</name>
  <value>40</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.shenzhen.accessible-node-labels.fastcpu.capacity</name>
  <value>100</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.zhuhai.accessible-node-labels.highmem.capacity</name>
  <value>100</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.default-node-label-expression</name>
  <value> ,fastcpu,highmem</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.default.default-node-label-expression</name>
  <value> </value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.guangzhou.default-node-label-expression</name>
  <value> </value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.shenzhen.default-node-label-expression</name>
  <value>fastcpu</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.zhuhai.default-node-label-expression</name>
  <value>highmem</value>
</property>
  • 提交任务到指定队列
hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar pi -Dmapreduce.job.queuename=shenzhen 16 1000
hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount -Dmapreduce.job.queuename=default /tmp/data/ /tmp/output1

注意:目前还不支持提交应用程序时指定标签,只能通过指定队列,并设置队列的默认标签达到目的。

参考文档

  1. YARN Node Labels
Tags: #Yarn    Post on Yarn