经验首页 前端设计 程序设计 Java相关 移动开发 数据库/运维 软件/图像 大数据/云计算 其他经验
当前位置:技术经验 » 大数据/云/AI » Hadoop » 查看文章
关系数据库数据与hadoop数据进行转换的工具 - Sqoop
来源:cnblogs  作者:Hongten  时间:2019/1/7 9:37:56  对本文有异议

Sqoop

本文所使用的Sqoop版本为1.4.6

1.官网

  http://sqoop.apache.org

2.作用

  A:可以把hadoop数据导入到关系数据库里面(e.g. Hive -> Mysql)

  B:可以把关系数据库数据导入到hadoop里面(e.g. Mysql -> Hive)

3.下载

  http://archive.apache.org/dist/sqoop/1.4.6/sqoop-1.4.6.bin__hadoop-1.0.0.tar.gz

4.安装

  1. --上传到node1(我之前安装的hive就在node1上面)本目录,并且解压
  2. cd
  3. tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
  4. --创建软链
  5. ln -sf /root/sqoop-1.4.6.bin__hadoop-2.0.4-alpha /home/sqoop-1.4.6

 

5.环境变量配置

  1. --配置环境变量
  2. vi /etc/profile
  3. export HADOOP_PREFIX=$HADOOP_HOME
  4. export PATH=$PATH:$SQOOP_HOME/bin
  5. :wq
  6. source /etc/profile

 

6.修改配置文件

  1. --修改配置文件
  2. cd /home/sqoop-1.4.6/conf/
  3. cp sqoop-env-template.sh sqoop-env.sh
  4. vi sqoop-env.sh

 

7.添加驱动包

  1. --mysql驱动包添加到sqooplib目录下面
  2. cd
  3. scp mysql-connector-java-5.1.23-bin.jar /home/sqoop-1.4.6/lib/

 

8.测试

  1. sqoop version
  2. [root@node1 ~]# sqoop version
  3. Warning: /home/sqoop-1.4.6/../hbase does not exist! HBase imports will fail.
  4. Please set $HBASE_HOME to the root of your HBase installation.
  5. Warning: /home/sqoop-1.4.6/../hcatalog does not exist! HCatalog jobs will fail.
  6. Please set $HCAT_HOME to the root of your HCatalog installation.
  7. Warning: /home/sqoop-1.4.6/../accumulo does not exist! Accumulo imports will fail.
  8. Please set $ACCUMULO_HOME to the root of your Accumulo installation.
  9. Warning: /home/sqoop-1.4.6/../zookeeper does not exist! Accumulo imports will fail.
  10. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
  11. 19/01/04 23:15:15 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
  12. Sqoop 1.4.6
  13. git commit id c0c5a81723759fa575844a0a1eae8f510fa32c25
  14. Compiled by root on Mon Apr 27 14:38:36 CST 2015
  15. sqoop list-databases -connect jdbc:mysql://node1:3306/ -username root -password '!QAZ2wsx3edc'
  16.  
  17. [root@node1 ~]# sqoop list-databases -connect jdbc:mysql://node1:3306/ -username root -password '!QAZ2wsx3edc'
  18. Warning: /home/sqoop-1.4.6/../hbase does not exist! HBase imports will fail.
  19. Please set $HBASE_HOME to the root of your HBase installation.
  20. Warning: /home/sqoop-1.4.6/../hcatalog does not exist! HCatalog jobs will fail.
  21. Please set $HCAT_HOME to the root of your HCatalog installation.
  22. Warning: /home/sqoop-1.4.6/../accumulo does not exist! Accumulo imports will fail.
  23. Please set $ACCUMULO_HOME to the root of your Accumulo installation.
  24. Warning: /home/sqoop-1.4.6/../zookeeper does not exist! Accumulo imports will fail.
  25. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
  26. 19/01/04 23:17:49 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
  27. 19/01/04 23:17:49 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
  28. 19/01/04 23:17:49 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
  29. information_schema
  30. hive
  31. mysql
  32. performance_schema
  33. result_db
  34. spark
  35. sys

你会发现,在输出里面会有很多Warning

我们可以通过以下操作去掉这些Warning

  1. --去除Warning
  2. cd /home/sqoop-1.4.6/bin/
  3. vi configure-sqoop
  4. --把下面的行全部注释掉 - 在每一行前面加 '#'
  5. ## Moved to be a runtime check in sqoop.
  6. #if [ ! -d "${HBASE_HOME}" ]; then
  7. # echo "Warning: $HBASE_HOME does not exist! HBase imports will fail."
  8. # echo 'Please set $HBASE_HOME to the root of your HBase installation.'
  9. #fi
  10. ## Moved to be a runtime check in sqoop.
  11. #if [ ! -d "${HCAT_HOME}" ]; then
  12. # echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail."
  13. # echo 'Please set $HCAT_HOME to the root of your HCatalog installation.'
  14. #fi
  15. #if [ ! -d "${ACCUMULO_HOME}" ]; then
  16. # echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail."
  17. # echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.'
  18. #fi
  19. #if [ ! -d "${ZOOKEEPER_HOME}" ]; then
  20. # echo "Warning: $ZOOKEEPER_HOME does not exist! Accumulo imports will fail."
  21. # echo 'Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.'
  22. #fi
  23. :wq
  24. --再次测试
  25. [root@node1 bin]# sqoop list-databases -connect jdbc:mysql://node1:3306/ -username root -password '!QAZ2wsx3edc'
    19/01/04 23:34:21 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
  26. 19/01/04 23:34:21 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
  27. 19/01/04 23:34:21 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
  28. information_schema
  29. hive
  30. mysql
  31. performance_schema
  32. result_db
  33. spark
  34. sys

 

9.应用

9.1.从Mysql导入到HDFS

准备工作:

  1. --在数据库里面先创建table
  2. CREATE TABLE t_user (id INT, name VARCHAR(20), age INT);
  3. --插入测试数据
  4. insert into t_user values(1, 'Tom', 20);
  5. insert into t_user values(2, 'John', 18);
  6. insert into t_user values(3, 'Div', 25);
  7. insert into t_user values(4, 'Susan', 31);
  8. insert into t_user values(5, 'Tiran', 40);
  9. insert into t_user values(6, 'Shasita', 13);

 

查询结果:

  1. mysql> select * from t_user;
  2. +------+---------+------+
  3. | id | name | age |
  4. +------+---------+------+
  5. | 1 | Tom | 20 |
  6. | 2 | John | 18 |
  7. | 3 | Div | 25 |
  8. | 4 | Susan | 31 |
  9. | 5 | Tiran | 40 |
  10. | 6 | Shasita | 13 |
  11. +------+---------+------+
  12. 6 rows in set (0.00 sec)

 

  1. --mysql数据库里面导出数据到Hdfs上面
  2. sqoop import --connect jdbc:mysql://node1:3306/sqoop_db --username root --password '!QAZ2wsx3edc' --table t_user --columns id,name,age -m 1 --target-dir /sqoop_t_user
  3.  
  4.  
  5. [root@node1 bin]# sqoop import --connect jdbc:mysql://node1:3306/sqoop_db --username root --password '!QAZ2wsx3edc' --table t_user --columns id,name,age -m 1 --target-dir /sqoop_t_user
  6. 19/01/04 23:54:30 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
  7. 19/01/04 23:54:30 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
  8. 19/01/04 23:54:30 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
  9. 19/01/04 23:54:30 INFO tool.CodeGenTool: Beginning code generation
  10. 19/01/04 23:54:31 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `t_user` AS t LIMIT 1
  11. 19/01/04 23:54:31 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `t_user` AS t LIMIT 1
  12. 19/01/04 23:54:31 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop-2.5
  13. Note: /tmp/sqoop-root/compile/84e97965496cc61c73c17151375a419b/t_user.java uses or overrides a deprecated API.
  14. Note: Recompile with -Xlint:deprecation for details.
  15. 19/01/04 23:54:33 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/84e97965496cc61c73c17151375a419b/t_user.jar
  16. 19/01/04 23:54:33 WARN manager.MySQLManager: It looks like you are importing from mysql.
  17. 19/01/04 23:54:33 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
  18. 19/01/04 23:54:33 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
  19. 19/01/04 23:54:33 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
  20. 19/01/04 23:54:33 INFO mapreduce.ImportJobBase: Beginning import of t_user
  21. 19/01/04 23:54:33 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
  22. 19/01/04 23:54:34 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
  23. 19/01/04 23:54:34 INFO client.RMProxy: Connecting to ResourceManager at node1/192.168.79.138:8032
  24. 19/01/04 23:54:48 INFO db.DBInputFormat: Using read commited transaction isolation
  25. 19/01/04 23:54:48 INFO mapreduce.JobSubmitter: number of splits:1
  26. 19/01/04 23:54:48 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1546674829746_0001
  27. 19/01/04 23:54:49 INFO impl.YarnClientImpl: Submitted application application_1546674829746_0001
  28. 19/01/04 23:54:49 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1546674829746_0001/
  29. 19/01/04 23:54:49 INFO mapreduce.Job: Running job: job_1546674829746_0001
  30. 19/01/04 23:54:59 INFO mapreduce.Job: Job job_1546674829746_0001 running in uber mode : false
  31. 19/01/04 23:54:59 INFO mapreduce.Job: map 0% reduce 0%
  32. 19/01/04 23:55:06 INFO mapreduce.Job: map 100% reduce 0%
  33. 19/01/04 23:55:06 INFO mapreduce.Job: Job job_1546674829746_0001 completed successfully
  34. 19/01/04 23:55:06 INFO mapreduce.Job: Counters: 30
  35. File System Counters
  36. FILE: Number of bytes read=0
  37. FILE: Number of bytes written=116299
  38. FILE: Number of read operations=0
  39. FILE: Number of large read operations=0
  40. FILE: Number of write operations=0
  41. HDFS: Number of bytes read=87
  42. HDFS: Number of bytes written=63
  43. HDFS: Number of read operations=4
  44. HDFS: Number of large read operations=0
  45. HDFS: Number of write operations=2
  46. Job Counters
  47. Launched map tasks=1
  48. Other local map tasks=1
  49. Total time spent by all maps in occupied slots (ms)=4153
  50. Total time spent by all reduces in occupied slots (ms)=0
  51. Total time spent by all map tasks (ms)=4153
  52. Total vcore-seconds taken by all map tasks=4153
  53. Total megabyte-seconds taken by all map tasks=4252672
  54. Map-Reduce Framework
  55. Map input records=6
  56. Map output records=6
  57. Input split bytes=87
  58. Spilled Records=0
  59. Failed Shuffles=0
  60. Merged Map outputs=0
  61. GC time elapsed (ms)=69
  62. CPU time spent (ms)=1170
  63. Physical memory (bytes) snapshot=175808512
  64. Virtual memory (bytes) snapshot=893071360
  65. Total committed heap usage (bytes)=84934656
  66. File Input Format Counters
  67. Bytes Read=0
  68. File Output Format Counters
  69. Bytes Written=63
  70. 19/01/04 23:55:06 INFO mapreduce.ImportJobBase: Transferred 63 bytes in 32.3608 seconds (1.9468 bytes/sec)
  71. 19/01/04 23:55:06 INFO mapreduce.ImportJobBase: Retrieved 6 records.

 

运行效果:

 

官网提供另一种方式,即读取文件的方式来实现上面的导入功能

  1. cd
  2. mkdir mysqoopdir
  3. cd mysqoopdir
  4. vi mysql_to_hdfs
  5. import
  6. --connect
  7. jdbc:mysql://node1:3306/sqoop_db
  8. --username
  9. root
  10. --password
  11. '!QAZ2wsx3edc'
  12. --table
  13. t_user
  14. --columns
  15. id,name,age
  16. -m
  17. 1
  18. --target-dir
  19. /sqoop_t_user
  20. --delete-target-dir
  21. :wq
  22. sqoop --options-file mysql_to_hdfs

 

我们可以通过Hive,来验证导入结果

  1. [root@node1 bin]# ./hive
  2. 19/01/05 00:03:29 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
  3. Logging initialized using configuration in jar:file:/root/apache-hive-0.13.1-bin/lib/hive-common-0.13.1.jar!/hive-log4j.properties
  4. hive> dfs -cat /sqoop_t_user/*;
  5. 1,Tom,20
  6. 2,John,18
  7. 3,Div,25
  8. 4,Susan,31
  9. 5,Tiran,40
  10. 6,Shasita,13

 

我们看到的结果和mysql里面的数据一样。

应用场景:如果现在我们的需要处理/分析的数据都存在Mysql数据库里面,并且数据量比较大,我们想要通过离线分析这些数据。这时,我们就可以把Mysql里面的数据通过Sqoop导入到Hdfs里面,进行分析处理。

 

导入查询结果:

  1. --导入查询结果
  2. cd mysqoopdir
  3. vi mysql_query_to_hdfs
  4. import
  5. --connect
  6. jdbc:mysql://node1:3306/sqoop_db
  7. --username
  8. root
  9. --password
  10. '!QAZ2wsx3edc'
  11. -e
  12. select id, name from t_user where id >= 1 and $CONDITIONS
  13. -m
  14. 1
  15. --target-dir
  16. /sqoop_t_user
  17. --delete-target-dir
  18. :wq
  19. sqoop --options-file mysql_query_to_hdfs
  20.  
  21. --检验
  22. hive> dfs -cat /sqoop_t_user/*;
  23. 1,Tom
  24. 2,John
  25. 3,Div
  26. 4,Susan
  27. 5,Tiran
  28. 6,Shasita

 

9.2Mysql导入数据到Hive

  1. --mysql导入到Hive
  2. cd mysqoopdir
  3. vi mysql_to_hive
  4. import
  5. --connect
  6. jdbc:mysql://node1:3306/sqoop_db
  7. --username
  8. root
  9. --password
  10. '!QAZ2wsx3edc'
  11. --table
  12. t_user
  13. -m
  14. 1
  15. --create-hive-table
  16. --target-dir
  17. /sqoop_mysql_to_hive/
  18. --hive-home
  19. /home/hive/
  20. --hive-import
  21. --hive-table
  22. t_sqoop_mysql_t_user_to_hive
  23. --create-hive-table
  24. :wq
  25. sqoop --options-file mysql_to_hive
  26.  
  27. --检验
  28. hive> select * from t_sqoop_mysql_t_user_to_hive;
  29. OK
  30. 1 Tom 20
  31. 2 John 18
  32. 3 Div 25
  33. 4 Susan 31
  34. 5 Tiran 40
  35. 6 Shasita 13
  36. Time taken: 0.577 seconds, Fetched: 6 row(s)

 

9.3.Mysql导入到Hbase

  1. --mysql导入到Hbase
  2. cd mysqoopdir
  3. vi mysql_to_hbase
  4. import
  5. --connect
  6. jdbc:mysql://node1:3306/sqoop_db
  7. --username
  8. root
  9. --password
  10. '!QAZ2wsx3edc'
  11. --table
  12. t_user
  13. -m
  14. 1
  15. --hbase-table
  16. t_sqoop_mysql_t_user_to_hbase
  17. --hbase-row-key
  18. id
  19. --hbase-create-table
  20. --column-family
  21. cf
  22. :wq
  23. sqoop --options-file mysql_to_hbase
  24.  
  25. --检验
  26. hbase(main):004:0> scan 't_sqoop_mysql_t_user_to_hbase'
  27. ROW COLUMN+CELL
  28. 1 column=cf:age, timestamp=1546680684317, value=20
  29. 1 column=cf:name, timestamp=1546680684317, value=Tom
  30. 2 column=cf:age, timestamp=1546680684317, value=18
  31. 2 column=cf:name, timestamp=1546680684317, value=John
  32. 3 column=cf:age, timestamp=1546680684317, value=25
  33. 3 column=cf:name, timestamp=1546680684317, value=Div
  34. 4 column=cf:age, timestamp=1546680684317, value=31
  35. 4 column=cf:name, timestamp=1546680684317, value=Susan
  36. 5 column=cf:age, timestamp=1546680684317, value=40
  37. 5 column=cf:name, timestamp=1546680684317, value=Tiran
  38. 6 column=cf:age, timestamp=1546680684317, value=13
  39. 6 column=cf:name, timestamp=1546680684317, value=Shasita
  40. 6 row(s) in 0.1730 seconds

 

9.4.Hdfs导出到Mysql

  1. --Hdfs导出到Mysql
  2. cd mysqoopdir
  3. vi hdfs_to_mysql
  4. export
  5. --connect
  6. jdbc:mysql://node1:3306/sqoop_db
  7. --username
  8. root
  9. --password
  10. '!QAZ2wsx3edc'
  11. --table
  12. t_hdfs_user_to_mysql
  13. -m
  14. 1
  15. --columns
  16. id,name,age
  17. --export-dir
  18. /sqoop_t_user/
  19. :wq
  20. --进入node1
  21. mysql -u root -p
  22. !QAZ2wsx3edc
  23. show database;
  24. use sqoop_db;
  25. show tables;
  26. --创建t_hdfs_user_to_mysql
  27. CREATE TABLE t_hdfs_user_to_mysql (id INT, name VARCHAR(20), age INT);
  28. cd mysqoopdir
  29. sqoop --options-file hdfs_to_mysql
  30.  
  31. --检验
  32. --执行导入前
  33. mysql> select * from t_hdfs_user_to_mysql;
  34. Empty set (0.00 sec)
  35. --执行导入后
  36. mysql> mysql> select * from t_hdfs_user_to_mysql;
  37. +------+---------+------+
  38. | id | name | age |
  39. +------+---------+------+
  40. | 1 | Tom | 20 |
  41. | 2 | John | 18 |
  42. | 3 | Div | 25 |
  43. | 4 | Susan | 31 |
  44. | 5 | Tiran | 40 |
  45. | 6 | Shasita | 13 |
  46. +------+---------+------+
  47. 6 rows in set (0.00 sec)

 

========================================================

More reading,and english is important.

I'm Hongten

 

  1. 大哥哥大姐姐,觉得有用打赏点哦!你的支持是我最大的动力。谢谢。
    Hongten博客排名在100名以内。粉丝过千。
    Hongten出品,必是精品。

E | hongtenzone@foxmail.com  B | http://www.cnblogs.com/hongten

========================================================

 

 友情链接:直通硅谷  点职佳  北美留学生论坛

本站QQ群:前端 618073944 | Java 606181507 | Python 626812652 | C/C++ 612253063 | 微信 634508462 | 苹果 692586424 | C#/.net 182808419 | PHP 305140648 | 运维 608723728

W3xue 的所有内容仅供测试,对任何法律问题及风险不承担任何责任。通过使用本站内容随之而来的风险与本站无关。
关于我们  |  意见建议  |  捐助我们  |  报错有奖  |  广告合作、友情链接(目前9元/月)请联系QQ:27243702 沸活量
皖ICP备17017327号-2 皖公网安备34020702000426号