使用Ganglia监控Hadoop 2015-05-22 16:00

介绍

Ganglia是一个开源的监控软件,可以监控Hadoop集群,实时了解Hadoop集群的CPU、内存等运行状态信息。 Ganglia采用主从架构,主节点是Gmetad进程,从节点是Gmond进程。

  • Gmond:部署在各个Hadoop节点上的代理进程,收集所在节点的信息供Gmetad获取。
  • Gmetad:从各个Gmond pull获取数据。
  • Ganglia-Web:与Gmetad部署在同一个节点上,提供UI。

本例中的节点部署规划如下:

  • ctrl(192.168.1.10): Gmond
  • data01(192.168.1.11): Gmond
  • data02(192.168.1.11): Gmond
  • data03(192.168.1.11): Gmond、Gmetad、Ganglia-Web

以下简单部署了Gmond、Gmetad、Ganglia-Web三个进程的节点为Gmetad节点。其他三个节点为Gmond节点。

安装

所有节点:

rpm -Uvh http://dl.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm
yum install -y ganglia*

配置

配置Ganglia

  • Gmetad节点配置:

vi /etc/httpd/conf.d/ganglia.conf

<Location /ganglia>
  Order deny,allow
  # Deny from all
  Allow from all           # 增加此行
  # Allow from 127.0.0.1   # 注释掉此行
  # Allow from ::1         # 注释掉此行
  # Allow from .example.com
</Location>

vi /etc/ganglia/gmetad.conf

data_source "QingHadoop" ctrl data01 data02 data03

vi /etc/ganglia/gmond.conf

cluster {
  name = "QingHadoop"          # 修改此行
  owner = "unspecified"
  latlong = "unspecified"
  url = "unspecified"
}

udp_send_channel {
  #bind_hostname = yes # Highly recommended, soon to be default.
                       # This option tells gmond to use a source address
                       # that resolves to the machine's hostname.  Without
                       # this, the metrics may appear to come from any
                       # interface and the DNS names associated with
                       # those IPs will be used to create the RRDs.
  #mcast_join = 239.2.11.71    # 注释掉此行
  host = data03                # 增加此行
  port = 8649
  ttl = 1
}

udp_recv_channel {
  #mcast_join = 239.2.11.71    # 注释掉此行
  port = 8649
  bind = data03                # 修改此行
  retry_bind = true
  # Size of the UDP buffer. If you are handling lots of metrics you really
  # should bump it up to e.g. 10MB or even higher.
  # buffer = 10485760
}
  • 其他节点配置:

vi /etc/ganglia/gmond.conf

cluster {
  name = "QingHaoop"              #修改此行与gmetad中的配置对应
  owner = "unspecified"
  latlong = "unspecified"
  url = "unspecified"
}

udp_send_channel {
  #bind_hostname = yes # Highly recommended, soon to be default.
                       # This option tells gmond to use a source address
                       # that resolves to the machine's hostname.  Without
                       # this, the metrics may appear to come from any
                       # interface and the DNS names associated with
                       # those IPs will be used to create the RRDs.
  #mcast_join = 239.2.11.71    # 注释掉此行
  host = data03                # 增加此行
  port = 8649
  ttl = 1
}

udp_recv_channel {
  #mcast_join = 239.2.11.71    # 注释掉此行
  port = 8649
  #bind = data03                # 注释掉此行
  retry_bind = true
  # Size of the UDP buffer. If you are handling lots of metrics you really
  # should bump it up to e.g. 10MB or even higher.
  # buffer = 10485760
}

配置Hadoop

各节点配置Hadoop的集群发送到Gmond进程。NameNode/ResourceManager节点与DataNode/NodeManager节点采集的信息不同。

  • ctrl节点(NameNode和ResourceManager节点)

vi /opt/hadoop/etc/hadoop/hadoop-metrics2.properties

namenode.sink.ganglia.servers=ctrl:8649
resourcemanager.sink.ganglia.servers=ctrl:8649
mrappmaster.sink.ganglia.servers=ctrl:8649
jobhistoryserver.sink.ganglia.servers=ctrl:8649
*.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
*.sink.ganglia.period=10
*.sink.ganglia.supportsparse=true
*.sink.ganglia.slope=jvm.metrics.gcCount=zero,jvm.metrics.memHeapUsedM=both
*.sink.ganglia.dmax=jvm.metrics.threadsBlocked=70,jvm.metrics.memHeapUsedM=40
  • data01节点:
datanode.sink.ganglia.servers=data01:8649
nodemanager.sink.ganglia.servers=data01:8649
*.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
*.sink.ganglia.period=10
*.sink.ganglia.supportsparse=true
*.sink.ganglia.slope=jvm.metrics.gcCount=zero,jvm.metrics.memHeapUsedM=both
*.sink.ganglia.dmax=jvm.metrics.threadsBlocked=70,jvm.metrics.memHeapUsedM=40
  • data02节点:
datanode.sink.ganglia.servers=data02:8649
nodemanager.sink.ganglia.servers=data02:8649
*.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
*.sink.ganglia.period=10
*.sink.ganglia.supportsparse=true
*.sink.ganglia.slope=jvm.metrics.gcCount=zero,jvm.metrics.memHeapUsedM=both
*.sink.ganglia.dmax=jvm.metrics.threadsBlocked=70,jvm.metrics.memHeapUsedM=40
  • data03节点:
datanode.sink.ganglia.servers=data03:8649
nodemanager.sink.ganglia.servers=data03:8649
*.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
*.sink.ganglia.period=10
*.sink.ganglia.supportsparse=true
*.sink.ganglia.slope=jvm.metrics.gcCount=zero,jvm.metrics.memHeapUsedM=both
*.sink.ganglia.dmax=jvm.metrics.threadsBlocked=70,jvm.metrics.memHeapUsedM=40

启动

  • 重启Hadoop
[root@ctrl ~]# stop-yarn.sh
[root@ctrl ~]# stop-dfs.sh
[root@ctrl ~]# start-dfs.sh
[root@ctrl ~]# start-yarn.sh
  • 启动Ganglia

Gmetad节点:

service gmetad start
service gmond start
service httpd start

Gmond节点

service gmond start

使用

http://ip:80/ganglia/

另一种部署方式

Hadoop支持将统计数据以sink的方式发送到网络中的其他主机上。这样,可以只在data03节点部署Gmond进程。在hadoop中配置中统计数据直接发送给data03。这样ctrl,data01,data02节点不需要部署gmond进程。

1
2
3
4
namenode.sink.ganglia.servers=data03:8649
resourcemanager.sink.ganglia.servers=data03:8649
mrappmaster.sink.ganglia.servers=data03:8649
jobhistoryserver.sink.ganglia.servers=data03:8649

注意:不管是ctrl、data01、data02还是data03节点,这里都配置成发送到data03:8649

对应的,在data03的/etc/ganglia/gmetad.conf,data_source的配置只需要保存data03即可:

data_source "QingHadoop" data03
Tags: #Ganglia #Hadoop    Post on Hadoop