Sqoop2软件安装使用(1.99.3) 2015-07-20 21:00

安装

说明

本文档将Sqoop2 Server安装在data03节点上,client装在data01节点上。

Sqoop2 Server所在的节点要求支持连接Hadoop的节点。可以通过如下命令查询出HDFS:hadoop dfs -ls

下载

1
2
3
4
cd /opt
wget http://archive.apache.org/dist/sqoop/1.99.3/sqoop-1.99.3-bin-hadoop200.tar.gz
tar -xvf sqoop-1.99.3-bin-hadoop200.tar.gz
ln -s sqoop-1.99.3-bin-hadoop200 sqoop

修改sqoop配置

1
2
3
vi server/conf/sqoop.properties
BASEDIR=/opt/sqoop/base
org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/opt/hadoop/etc/hadoop/

修改sqoop读取hadoop的jar包的路径

vi /sqoop/server/conf/catalina.properties 将common.loader行后的/usr/lib/hadoop/lib/*.jar改成自己的hadoop jar包:

  • /opt/hadoop/share/hadoop/common/*.jar,
  • /opt/hadoop/share/hadoop/common/lib/*.jar,
  • /opt/hadoop/share/hadoop/hdfs/*.jar,
  • /opt/hadoop/share/hadoop/hdfs/lib/*.jar,
  • /opt/hadoop/share/hadoop/mapreduce/*.jar,
  • /opt/hadoop/share/hadoop/mapreduce/lib/*.jar,
  • /opt/hadoop/share/hadoop/tools/*.jar,
  • /opt/hadoop/share/hadoop/tools/lib/*.jar,
  • /opt/hadoop/share/hadoop/yarn/*.jar,
  • /opt/hadoop/share/hadoop/yarn/lib/*.jar

注意:各个jar用逗号分隔,中间不能带空格。

准备MySQL

将mysql-connector-java-5.1.22-bin.jar拷贝到/opt/sqoop/server/lib下

使用

服务器端

  • 启动Sqoop
$ bin/sqoop.sh server start

启动后默认监听12000和12001端口。

  • 停止Sqoop
# bin/sqoop.sh server stop

验证安装结果

[root@data03 sqoop]# curl http://data03:12000/sqoop/version | python -mjson.tool
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
119   239    0   239    0     0  95561      0 --:--:-- --:--:-- --:--:--     0
{
    "date": "Fri Oct 18 14:15:53 EDT 2013",
    "protocols": [
        "1"
    ],
    "revision": "2404393160301df16a94716a3034e31b03e27b0b",
    "url": "git://unix12.andrew.cmu.edu/afs/andrew.cmu.edu/usr20/mengweid/sqoop/common",
    "user": "mengweid",
    "version": "1.99.3"
}
[root@data03 sqoop]#

客户端连接

# bin/sqoop.sh client
sqoop:000> set server --host data03 --port 12000
sqoop:000> show version --all
  • Resource file

可将连接服务器的命令写入到Resource file中。Resource file的文件路径为HOME目录下的“.sqoop2rc”。在交互式模式和批处理模式下,都会运行此文件中的命令。

# Configure our Sqoop 2 server automatically
set server --host data03 --port 12000

# Run in verbose mode by default
set option --name verbose --value true

参考文档

  1. Apache Sqoop documentation - Installation
Tags: #Sqoop    Post on ETL