Hive性能优化(1.2.1)(未完成) 2016-02-02 22:00

数据准备

表1, 数据结构如下:

SELECT * FROM csdn LIMIT 5;
+----------------+----------------+--------------------------+--+
| csdn.username  | csdn.password  |        csdn.email        |
+----------------+----------------+--------------------------+--+
| zdg            | 12344321       | zdg@csdn.net             |
| LaoZheng       | 670203313747   | chengming_zheng@163.com  |
| fstao          | 730413         | fstao@tom.com            |
| huwolf         | 2535263        | hujiye@263.net           |
| cadcjl         | KIC43dk6!      | ccedcjl@21cn.com         |
+----------------+----------------+--------------------------+

记录数:6428632

建表脚本:

1
2
3
4
5
6
7
8
CREATE EXTERNAL TABLE csdn(
    username STRING,
    password STRING,
    email STRING
)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ','
LOCATION '/data/csdn/';

表2, 数据结构如下:

SELECT * FROM score;
+---------------------------+----------------+--+
|        score.email        |   score.year   |
+---------------------------+----------------+--+
| hexi_delphi@chinaren.com  | 6118198.5      |
| xiecy@mail.huptt.zj.cn    | 8800899.0      |
| singwolf@sina.com         | 3064531.25     |
| lyonsxu@263.net           | 1889652.0      |
| jack_tanlei@263.net       | 161175.546875  |
| childisheep@263.net       | 2034223.875    |
| zhehui@wx88.net           | 2868698.25     |
| legend03awd@fm.365.com    | 2048440.125    |
| xiaoy2000@sina.com.cn     | 6697944.5      |
| zhangweijunfjj@sohu       | 8044697.5      |
| joyfm@chinaren.com        | 1733621.75     |
| wanxg@runway.cn.net       | 2302380.0      |
| FengTianYa@263.net        | 4495792.5      |
| kexu000@21cn.com          | 9388435.0      |
| zygapi@ccidnet.com        | 5686911.0      |
| joyfm@chinaren.com        | 3997199.75     |
| wanxg@runway.cn.net       | 5412279.0      |
| FengTianYa@263.net        | 703157.9375    |
| kexu000@21cn.com          | 4711111.5      |
+---------------------------+----------------+--+

记录数:19.

创建脚本:

1
2
3
4
5
6
7
CREATE EXTERNAL TABLE score(
    email STRING,
    year FLOAT
)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ','
LOCATION '/data/score/';

构造一些数据:

1
2
INSERT OVERWRITE TABLE score
SELECT email,RAND() FROM csdn TABLESAMPLE (1000 ROWS);
Tags: #Hive    Post on Hive