Sqoop
本文所使用的Sqoop版本为1.4.6
1.官网
http://sqoop.apache.org
2.作用
A:可以把hadoop数据导入到关系数据库里面(e.g. Hive -> Mysql)
B:可以把关系数据库数据导入到hadoop里面(e.g. Mysql -> Hive)
3.下载
http://archive.apache.org/dist/sqoop/1.4.6/sqoop-1.4.6.bin__hadoop-1.0.0.tar.gz
4.安装
- --上传到node1(我之前安装的hive就在node1上面)本目录,并且解压
- cd
- tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
- --创建软链
- ln -sf /root/sqoop-1.4.6.bin__hadoop-2.0.4-alpha /home/sqoop-1.4.6
5.环境变量配置
- --配置环境变量
- vi /etc/profile
- export HADOOP_PREFIX=$HADOOP_HOME
- export PATH=$PATH:$SQOOP_HOME/bin
- :wq
- source /etc/profile
6.修改配置文件
- --修改配置文件
- cd /home/sqoop-1.4.6/conf/
- cp sqoop-env-template.sh sqoop-env.sh
- vi sqoop-env.sh
7.添加驱动包
- --把mysql驱动包添加到sqoop的lib目录下面
- cd
- scp mysql-connector-java-5.1.23-bin.jar /home/sqoop-1.4.6/lib/
8.测试
- sqoop version
- [root@node1 ~]# sqoop version
- Warning: /home/sqoop-1.4.6/../hbase does not exist! HBase imports will fail.
- Please set $HBASE_HOME to the root of your HBase installation.
- Warning: /home/sqoop-1.4.6/../hcatalog does not exist! HCatalog jobs will fail.
- Please set $HCAT_HOME to the root of your HCatalog installation.
- Warning: /home/sqoop-1.4.6/../accumulo does not exist! Accumulo imports will fail.
- Please set $ACCUMULO_HOME to the root of your Accumulo installation.
- Warning: /home/sqoop-1.4.6/../zookeeper does not exist! Accumulo imports will fail.
- Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
- 19/01/04 23:15:15 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
- Sqoop 1.4.6
- git commit id c0c5a81723759fa575844a0a1eae8f510fa32c25
- Compiled by root on Mon Apr 27 14:38:36 CST 2015
- sqoop list-databases -connect jdbc:mysql://node1:3306/ -username root -password '!QAZ2wsx3edc'
-
- [root@node1 ~]# sqoop list-databases -connect jdbc:mysql://node1:3306/ -username root -password '!QAZ2wsx3edc'
- Warning: /home/sqoop-1.4.6/../hbase does not exist! HBase imports will fail.
- Please set $HBASE_HOME to the root of your HBase installation.
- Warning: /home/sqoop-1.4.6/../hcatalog does not exist! HCatalog jobs will fail.
- Please set $HCAT_HOME to the root of your HCatalog installation.
- Warning: /home/sqoop-1.4.6/../accumulo does not exist! Accumulo imports will fail.
- Please set $ACCUMULO_HOME to the root of your Accumulo installation.
- Warning: /home/sqoop-1.4.6/../zookeeper does not exist! Accumulo imports will fail.
- Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
- 19/01/04 23:17:49 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
- 19/01/04 23:17:49 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
- 19/01/04 23:17:49 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
- information_schema
- hive
- mysql
- performance_schema
- result_db
- spark
- sys
你会发现,在输出里面会有很多Warning
我们可以通过以下操作去掉这些Warning
- --去除Warning
- cd /home/sqoop-1.4.6/bin/
- vi configure-sqoop
- --把下面的行全部注释掉 - 在每一行前面加 '#'
- ## Moved to be a runtime check in sqoop.
- #if [ ! -d "${HBASE_HOME}" ]; then
- # echo "Warning: $HBASE_HOME does not exist! HBase imports will fail."
- # echo 'Please set $HBASE_HOME to the root of your HBase installation.'
- #fi
- ## Moved to be a runtime check in sqoop.
- #if [ ! -d "${HCAT_HOME}" ]; then
- # echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail."
- # echo 'Please set $HCAT_HOME to the root of your HCatalog installation.'
- #fi
- #if [ ! -d "${ACCUMULO_HOME}" ]; then
- # echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail."
- # echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.'
- #fi
- #if [ ! -d "${ZOOKEEPER_HOME}" ]; then
- # echo "Warning: $ZOOKEEPER_HOME does not exist! Accumulo imports will fail."
- # echo 'Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.'
- #fi
- :wq
- --再次测试
- [root@node1 bin]# sqoop list-databases -connect jdbc:mysql://node1:3306/ -username root -password '!QAZ2wsx3edc'
19/01/04 23:34:21 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 - 19/01/04 23:34:21 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
- 19/01/04 23:34:21 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
- information_schema
- hive
- mysql
- performance_schema
- result_db
- spark
- sys
9.应用
9.1.从Mysql导入到HDFS
准备工作:
- --在数据库里面先创建table
- CREATE TABLE t_user (id INT, name VARCHAR(20), age INT);
- --插入测试数据
- insert into t_user values(1, 'Tom', 20);
- insert into t_user values(2, 'John', 18);
- insert into t_user values(3, 'Div', 25);
- insert into t_user values(4, 'Susan', 31);
- insert into t_user values(5, 'Tiran', 40);
- insert into t_user values(6, 'Shasita', 13);
查询结果:
- mysql> select * from t_user;
- +------+---------+------+
- | id | name | age |
- +------+---------+------+
- | 1 | Tom | 20 |
- | 2 | John | 18 |
- | 3 | Div | 25 |
- | 4 | Susan | 31 |
- | 5 | Tiran | 40 |
- | 6 | Shasita | 13 |
- +------+---------+------+
- 6 rows in set (0.00 sec)
- --从mysql数据库里面导出数据到Hdfs上面
- sqoop import --connect jdbc:mysql://node1:3306/sqoop_db --username root --password '!QAZ2wsx3edc' --table t_user --columns id,name,age -m 1 --target-dir /sqoop_t_user
-
-
- [root@node1 bin]# sqoop import --connect jdbc:mysql://node1:3306/sqoop_db --username root --password '!QAZ2wsx3edc' --table t_user --columns id,name,age -m 1 --target-dir /sqoop_t_user
- 19/01/04 23:54:30 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
- 19/01/04 23:54:30 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
- 19/01/04 23:54:30 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
- 19/01/04 23:54:30 INFO tool.CodeGenTool: Beginning code generation
- 19/01/04 23:54:31 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `t_user` AS t LIMIT 1
- 19/01/04 23:54:31 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `t_user` AS t LIMIT 1
- 19/01/04 23:54:31 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop-2.5
- Note: /tmp/sqoop-root/compile/84e97965496cc61c73c17151375a419b/t_user.java uses or overrides a deprecated API.
- Note: Recompile with -Xlint:deprecation for details.
- 19/01/04 23:54:33 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/84e97965496cc61c73c17151375a419b/t_user.jar
- 19/01/04 23:54:33 WARN manager.MySQLManager: It looks like you are importing from mysql.
- 19/01/04 23:54:33 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
- 19/01/04 23:54:33 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
- 19/01/04 23:54:33 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
- 19/01/04 23:54:33 INFO mapreduce.ImportJobBase: Beginning import of t_user
- 19/01/04 23:54:33 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
- 19/01/04 23:54:34 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
- 19/01/04 23:54:34 INFO client.RMProxy: Connecting to ResourceManager at node1/192.168.79.138:8032
- 19/01/04 23:54:48 INFO db.DBInputFormat: Using read commited transaction isolation
- 19/01/04 23:54:48 INFO mapreduce.JobSubmitter: number of splits:1
- 19/01/04 23:54:48 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1546674829746_0001
- 19/01/04 23:54:49 INFO impl.YarnClientImpl: Submitted application application_1546674829746_0001
- 19/01/04 23:54:49 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1546674829746_0001/
- 19/01/04 23:54:49 INFO mapreduce.Job: Running job: job_1546674829746_0001
- 19/01/04 23:54:59 INFO mapreduce.Job: Job job_1546674829746_0001 running in uber mode : false
- 19/01/04 23:54:59 INFO mapreduce.Job: map 0% reduce 0%
- 19/01/04 23:55:06 INFO mapreduce.Job: map 100% reduce 0%
- 19/01/04 23:55:06 INFO mapreduce.Job: Job job_1546674829746_0001 completed successfully
- 19/01/04 23:55:06 INFO mapreduce.Job: Counters: 30
- File System Counters
- FILE: Number of bytes read=0
- FILE: Number of bytes written=116299
- FILE: Number of read operations=0
- FILE: Number of large read operations=0
- FILE: Number of write operations=0
- HDFS: Number of bytes read=87
- HDFS: Number of bytes written=63
- HDFS: Number of read operations=4
- HDFS: Number of large read operations=0
- HDFS: Number of write operations=2
- Job Counters
- Launched map tasks=1
- Other local map tasks=1
- Total time spent by all maps in occupied slots (ms)=4153
- Total time spent by all reduces in occupied slots (ms)=0
- Total time spent by all map tasks (ms)=4153
- Total vcore-seconds taken by all map tasks=4153
- Total megabyte-seconds taken by all map tasks=4252672
- Map-Reduce Framework
- Map input records=6
- Map output records=6
- Input split bytes=87
- Spilled Records=0
- Failed Shuffles=0
- Merged Map outputs=0
- GC time elapsed (ms)=69
- CPU time spent (ms)=1170
- Physical memory (bytes) snapshot=175808512
- Virtual memory (bytes) snapshot=893071360
- Total committed heap usage (bytes)=84934656
- File Input Format Counters
- Bytes Read=0
- File Output Format Counters
- Bytes Written=63
- 19/01/04 23:55:06 INFO mapreduce.ImportJobBase: Transferred 63 bytes in 32.3608 seconds (1.9468 bytes/sec)
- 19/01/04 23:55:06 INFO mapreduce.ImportJobBase: Retrieved 6 records.
运行效果:
官网提供另一种方式,即读取文件的方式来实现上面的导入功能
- cd
- mkdir mysqoopdir
- cd mysqoopdir
- vi mysql_to_hdfs
- import
- --connect
- jdbc:mysql://node1:3306/sqoop_db
- --username
- root
- --password
- '!QAZ2wsx3edc'
- --table
- t_user
- --columns
- id,name,age
- -m
- 1
- --target-dir
- /sqoop_t_user
- --delete-target-dir
- :wq
- sqoop --options-file mysql_to_hdfs
我们可以通过Hive,来验证导入结果
- [root@node1 bin]# ./hive
- 19/01/05 00:03:29 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
- Logging initialized using configuration in jar:file:/root/apache-hive-0.13.1-bin/lib/hive-common-0.13.1.jar!/hive-log4j.properties
- hive> dfs -cat /sqoop_t_user/*;
- 1,Tom,20
- 2,John,18
- 3,Div,25
- 4,Susan,31
- 5,Tiran,40
- 6,Shasita,13
我们看到的结果和mysql里面的数据一样。
应用场景:如果现在我们的需要处理/分析的数据都存在Mysql数据库里面,并且数据量比较大,我们想要通过离线分析这些数据。这时,我们就可以把Mysql里面的数据通过Sqoop导入到Hdfs里面,进行分析处理。
导入查询结果:
- --导入查询结果
- cd mysqoopdir
- vi mysql_query_to_hdfs
- import
- --connect
- jdbc:mysql://node1:3306/sqoop_db
- --username
- root
- --password
- '!QAZ2wsx3edc'
- -e
- select id, name from t_user where id >= 1 and $CONDITIONS
- -m
- 1
- --target-dir
- /sqoop_t_user
- --delete-target-dir
- :wq
- sqoop --options-file mysql_query_to_hdfs
-
- --检验
- hive> dfs -cat /sqoop_t_user/*;
- 1,Tom
- 2,John
- 3,Div
- 4,Susan
- 5,Tiran
- 6,Shasita
9.2Mysql导入数据到Hive
- --mysql导入到Hive
- cd mysqoopdir
- vi mysql_to_hive
- import
- --connect
- jdbc:mysql://node1:3306/sqoop_db
- --username
- root
- --password
- '!QAZ2wsx3edc'
- --table
- t_user
- -m
- 1
- --create-hive-table
- --target-dir
- /sqoop_mysql_to_hive/
- --hive-home
- /home/hive/
- --hive-import
- --hive-table
- t_sqoop_mysql_t_user_to_hive
- --create-hive-table
- :wq
- sqoop --options-file mysql_to_hive
-
- --检验
- hive> select * from t_sqoop_mysql_t_user_to_hive;
- OK
- 1 Tom 20
- 2 John 18
- 3 Div 25
- 4 Susan 31
- 5 Tiran 40
- 6 Shasita 13
- Time taken: 0.577 seconds, Fetched: 6 row(s)
9.3.Mysql导入到Hbase
- --mysql导入到Hbase
- cd mysqoopdir
- vi mysql_to_hbase
- import
- --connect
- jdbc:mysql://node1:3306/sqoop_db
- --username
- root
- --password
- '!QAZ2wsx3edc'
- --table
- t_user
- -m
- 1
- --hbase-table
- t_sqoop_mysql_t_user_to_hbase
- --hbase-row-key
- id
- --hbase-create-table
- --column-family
- cf
- :wq
- sqoop --options-file mysql_to_hbase
-
- --检验
- hbase(main):004:0> scan 't_sqoop_mysql_t_user_to_hbase'
- ROW COLUMN+CELL
- 1 column=cf:age, timestamp=1546680684317, value=20
- 1 column=cf:name, timestamp=1546680684317, value=Tom
- 2 column=cf:age, timestamp=1546680684317, value=18
- 2 column=cf:name, timestamp=1546680684317, value=John
- 3 column=cf:age, timestamp=1546680684317, value=25
- 3 column=cf:name, timestamp=1546680684317, value=Div
- 4 column=cf:age, timestamp=1546680684317, value=31
- 4 column=cf:name, timestamp=1546680684317, value=Susan
- 5 column=cf:age, timestamp=1546680684317, value=40
- 5 column=cf:name, timestamp=1546680684317, value=Tiran
- 6 column=cf:age, timestamp=1546680684317, value=13
- 6 column=cf:name, timestamp=1546680684317, value=Shasita
- 6 row(s) in 0.1730 seconds
9.4.Hdfs导出到Mysql
- --Hdfs导出到Mysql
- cd mysqoopdir
- vi hdfs_to_mysql
- export
- --connect
- jdbc:mysql://node1:3306/sqoop_db
- --username
- root
- --password
- '!QAZ2wsx3edc'
- --table
- t_hdfs_user_to_mysql
- -m
- 1
- --columns
- id,name,age
- --export-dir
- /sqoop_t_user/
- :wq
- --进入node1
- mysql -u root -p
- !QAZ2wsx3edc
- show database;
- use sqoop_db;
- show tables;
- --创建t_hdfs_user_to_mysql表
- CREATE TABLE t_hdfs_user_to_mysql (id INT, name VARCHAR(20), age INT);
- cd mysqoopdir
- sqoop --options-file hdfs_to_mysql
-
- --检验
- --执行导入前
- mysql> select * from t_hdfs_user_to_mysql;
- Empty set (0.00 sec)
- --执行导入后
- mysql> mysql> select * from t_hdfs_user_to_mysql;
- +------+---------+------+
- | id | name | age |
- +------+---------+------+
- | 1 | Tom | 20 |
- | 2 | John | 18 |
- | 3 | Div | 25 |
- | 4 | Susan | 31 |
- | 5 | Tiran | 40 |
- | 6 | Shasita | 13 |
- +------+---------+------+
- 6 rows in set (0.00 sec)
========================================================
More reading,and english is important.
I'm Hongten
- 大哥哥大姐姐,觉得有用打赏点哦!你的支持是我最大的动力。谢谢。
Hongten博客排名在100名以内。粉丝过千。
Hongten出品,必是精品。
E | hongtenzone@foxmail.com B | http://www.cnblogs.com/hongten
========================================================