[特性]Hive分区功能使用 2015-11-14 15:30

说明

划分不同的分区,并将同一个分区的记录保存在同一个目录下。在查询时,指定分区,那么只需要在指定的目录下检索数据。这样可以提高查询效率。

使用分区(内部表)

创建带分区的内部表:

1
2
3
4
CREATE TABLE student(id STRING, name STRING) 
PARTITIONED BY (year INT, major INT)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',';

按分区导入数据:

1
2
3
4
5
LOAD DATA LOCAL INPATH '/root/stu1.txt' INTO TABLE student PARTITION(year=2001,major=810);
LOAD DATA LOCAL INPATH '/root/stu5.txt' INTO TABLE student PARTITION(year=2001,major=810);
LOAD DATA LOCAL INPATH '/root/stu2.txt' INTO TABLE student PARTITION(year=2001,major=820);
LOAD DATA LOCAL INPATH '/root/stu3.txt' INTO TABLE student PARTITION(year=2002,major=810);
LOAD DATA LOCAL INPATH '/root/stu4.txt' INTO TABLE student PARTITION(year=2002,major=820);

通过HDFS可以看到按照分区结构创建了相应的目录,并且对应的数据存入相应的分区目录下:

hdfs dfs -ls -R /user/hive/warehouse/student
drwxr-xr-x   - root 2015-11-14 15:25 /user/hive/warehouse/student/year=2001
drwxr-xr-x   - root 2015-11-14 15:26 /user/hive/warehouse/student/year=2001/major=810
-rw-r--r--   2 root 2015-11-14 15:26 /user/hive/warehouse/student/year=2001/major=810/stu1.txt
-rw-r--r--   2 root 2015-11-14 15:26 /user/hive/warehouse/student/year=2001/major=810/stu5.txt
drwxr-xr-x   - root 2015-11-14 15:26 /user/hive/warehouse/student/year=2001/major=820
-rw-r--r--   2 root 2015-11-14 15:26 /user/hive/warehouse/student/year=2001/major=820/stu2.txt
drwxr-xr-x   - root 2015-11-14 15:25 /user/hive/warehouse/student/year=2002
drwxr-xr-x   - root 2015-11-14 15:26 /user/hive/warehouse/student/year=2002/major=810
-rw-r--r--   2 root 2015-11-14 15:26 /user/hive/warehouse/student/year=2002/major=810/stu3.txt
drwxr-xr-x   - root 2015-11-14 15:26 /user/hive/warehouse/student/year=2002/major=820
-rw-r--r--   2 root 2015-11-14 15:26 /user/hive/warehouse/student/year=2002/major=820/stu4.txt

使用分区(外部表)

创建带分区的外部表:

1
2
3
4
5
CREATE EXTERNAL TABLE student2(id STRING, name STRING) 
PARTITIONED BY (year INT, major INT)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ','
LOCATION '/data/student2/';

创建相应的分区(这个步骤容易遗漏.如果遗漏,SHOW PARTITIONS student2会显示无分区.):

1
2
3
4
ALTER TABLE student2 ADD PARTITION (year=2001, major=810) LOCATION 'year=2001/major=810';
ALTER TABLE student2 ADD PARTITION (year=2001, major=820) LOCATION 'year=2001/major=820';
ALTER TABLE student2 ADD PARTITION (year=2002, major=810) LOCATION 'year=2002/major=810';
ALTER TABLE student2 ADD PARTITION (year=2002, major=820) LOCATION 'year=2002/major=820';

在HDFS上创建表数据的相关表目录,并上传数据文件,形成如下文件:

hdfs dfs -ls -R /data/student2/
drwxr-xr-x   2015-11-14 15:37 /data/student2/year=2001
drwxr-xr-x   2015-11-14 15:44 /data/student2/year=2001/major=810
-rw-r--r--   2015-11-14 15:41 /data/student2/year=2001/major=810/stu1.txt
-rw-r--r--   2015-11-14 15:44 /data/student2/year=2001/major=810/stu5.txt
drwxr-xr-x   2015-11-14 15:43 /data/student2/year=2001/major=820
-rw-r--r--   2015-11-14 15:43 /data/student2/year=2001/major=820/stu2.txt
drwxr-xr-x   2015-11-14 15:37 /data/student2/year=2002
drwxr-xr-x   2015-11-14 15:43 /data/student2/year=2002/major=810
-rw-r--r--   2015-11-14 15:43 /data/student2/year=2002/major=810/stu3.txt
drwxr-xr-x   2015-11-14 15:43 /data/student2/year=2002/major=820
-rw-r--r--   2015-11-14 15:43 /data/student2/year=2002/major=820/stu4.txt

维护命令

查看指定的表目录有哪一些分区:

SHOW PARTITIONS student;

0: jdbc:hive2://ctrl:10000> SHOW PARTITIONS student;
+----------------------+--+
|      partition       |
+----------------------+--+
| year=2001/major=810  |
| year=2001/major=820  |
| year=2002/major=810  |
| year=2002/major=820  |
+----------------------+--+
4 rows selected (0.39 seconds)
0: jdbc:hive2://ctrl:10000>

基于分区进行查询,加快查询速度:

1
SELECT * FROM student WHERE year=2002 AND major=820;
Tags: #Hive    Post on Hadoop