sqoop1.4.7 + hadoop2.6.0 进行 Mysql 到 HDFS 的数据转换

所谓的 Sqoop2 的版本好其实是 1.99.7,从架构上来将,Sqoop2 确实在安全性等方面有很好的提升,但是 Sqoop2 目前还不推荐在生产环境中使用,它很多功能还缺失,不够完善,不过,对我们小规模的使用 Hadoop 的团队或公司来讲,Sqoop1.4.x 足够用了,命令行好用的很啊!

目前 Sqoop 最新稳定版本是 1.4.7

Sqoop 官网地址:http://sqoop.apache.org

Sqoop 1.4.7 下载地址:http://mirrors.hust.edu.cn/apache/sqoop/1.4.7/ ,如果不可以用可以直接到官网点击 Download 去选择一个比较快的镜像下载。

[root@hd-node1 downloads]# wget http://mirrors.hust.edu.cn/apache/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
[root@hd-node1 downloads]# tar -zxvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz -C /opt

我之前在 CentOS7 搭建 Hadoop 2.6.0 集群 讲的,Hadoop 2 选择 2.6.0 版本是兼容性比较有保障的,就是官方在这个版本上做了比较好的兼容和稳定性的测试。当然不怕出现比较麻烦的问题,并且具备解决各种应急问题的能力,选择最新稳定版也可以。

环境

  • 操作系统: CentOS 7
  • JDK 版本: 1.8.0_112
  • Hadoop 版本: Hadoop 2.6.0
  • Sqoop 版本: sqoop-1.4.7.bin-hadoop-2.6.0.tar.gz
  • Hadoop 安装目录: /opt/hadoop-2.6.0-cdh5.10.0
  • Sqoop 安装目录: /opt/sqoop-1.4.7

我们的 Sqoop 安装配置在 Hadoop 2.6.0 的集群上,有关 JDK、集群安装配置等请参考 CentOS7 搭建 Hadoop 2.6.0 集群,为了简介方便阅读,这里就不再重复了。

如果安装 Hive,请参考 CDH5.10.0 手动安装 hive-1.1.0-cdh5.10.0

修改 Sqoop 配置文件

Sqoop 集群的配置非常简单,只需要配置 Hadoop、MapReduce 的位置。我们这里配置了 Hive,因为后面我们的需求是要把数据库表中的数据导入到 Hive。一共涉及 HADOOP_COMMON_HOMEHADOOP_MAPRED_HOMEHIVE_HOME 三个环境变量,根据变量名也很容易见名知意。

如果使用了单独的 Zookeeper 集群,那么可以配置 ZOOCFGDIR,如果使用 HBase 做为数据仓库,那么可以配置 HBASE_HOME。这些 sqoop-env-template.sh 模板都给了很详细的配置指导。

[root@hd-node1 opt]# cd sqoop-1.4.7/conf/
[root@hd-node1 conf]# cp sqoop-env-template.sh  sqoop-env.sh
[root@hd-node1 conf]# vim sqoop-env.sh

# 在 sqoop-env.sh 的末尾追加如下几个环境变量设置
export HADOOP_COMMON_HOME=/opt/hadoop-2.6.0-cdh5.10.0
export HADOOP_MAPRED_HOME=/opt/hadoop-2.6.0-cdh5.10.0/share/hadoop/mapreduce
export HIVE_HOME=/opt/hive-1.1.0-cdh5.10.0

配置好了,如图所示:

另外如果需要启用 Sqoop 服务,可以配置 sqoop-site.xml,从 sqoop-site-template.xml 模板中拷贝,都提供了比较详细的配置说明,我这里只使用命令行,就不配置了。

配置环境变量

配置 SQOOP_HOMEPATH,把 sqoop 等命令加入到环境变量中。

[root@hd-node1 opt]# vim /etc/profile

# 添加 sqoop 环境变量
export SQOOP_HOME=/opt/sqoop-1.4.7
export PATH=$PATH:$SQOOP_HOME/bin

MySQL 驱动

因为我们需要先将 MySQL 中的数据导入到 HDFS,所以 Sqoop 任务需要一个 MySQL JDBC 驱动,我们下载一份和目标 MySQL 库兼容的驱动 jar 包放到 Sqoop 的 lib 目录。

[root@hd-node1 downloads]# cp mysql-connector-java-5.1.47.jar /opt/sqoop-1.4.7/lib/
[root@hd-node1 downloads]# cd /opt/sqoop-1.4.7/lib/
[root@hd-node1 lib]# ll mysql*
-rw-r--r-- 1 root root 1004838 Sep 23 15:34 mysql-connector-java-5.1.46.jar
[root@hd-node1 lib]#

使用 Sqoop

用 Sqoop 列出 192.168.0.1 下的数据库

[root@hd-node1 bin]# sqoop list-databases --connect jdbc:mysql://192.168.0.1:3306/mydb?characterEncoding=UTF-8 --username test --password 'test'

将表 book 数据导入到 HDFS 中

[root@hd-node1 bin]# sqoop import --connect jdbc:mysql://192.168.0.1:3306/mydb?characterEncoding=UTF-8 --username test --password 'test' --target-dir '/user/hive/warehouse/book' --table book

Sqoop 配置问题

ERROR manager.SqlManager: Error reading from database

在导出 MySQL 数据的时候报驱动错误。

15/03/15 22:30:33 ERROR manager.SqlManager: Error reading from database: java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@54b0a583 is still active. No statements may be issued when any streaming result sets are open and in use on a
 given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.  
java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@54b0a583 is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming
 result sets before attempting more queries.  
     at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:930)  
     at com.mysql.jdbc.MysqlIO.checkForOutstandingStreamingData(MysqlIO.java:2694)  
     at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1868)  
     at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2109)  
     at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2642)  
     at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2571)  
     at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1464)  
     at com.mysql.jdbc.ConnectionImpl.getMaxBytesPerChar(ConnectionImpl.java:3030)  
     at com.mysql.jdbc.Field.getMaxBytesPerCharacter(Field.java:592)  
     at com.mysql.jdbc.ResultSetMetaData.getPrecision(ResultSetMetaData.java:444)  
     at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:285)  
     at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:240)  
     at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:226)  
     at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:295)  
     at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1773)  
     at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1578)  
     at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)  
     at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)  
     at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:601)  
     at org.apache.sqoop.Sqoop.run(Sqoop.java:143)  
     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)  
     at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)  
     at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)  
     at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)  
     at org.apache.sqoop.Sqoop.main(Sqoop.java:236)  
15/03/15 22:30:33 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: No columns to generate for ClassWriter  
    at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1584)  
    at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)  
    at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)  
    at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:601)  
    at org.apache.sqoop.Sqoop.run(Sqoop.java:143)  
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)  
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)  
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)  
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)  
    at org.apache.sqoop.Sqoop.main(Sqoop.java:236)

这个问题我没有出现过,问题的原因是驱动版本太老,或者和数据库版本不匹配。因为 Sqoop 在导出数据之前是要分析数据库和表的元数据的,如果驱动和数据库的版本差异太大,可能会出现分析不了的错误。

所以如果你用最新的库,就把驱动更新一下,这样就可以了。

java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/InputFormat

[root@hd-node1 bin]# sqoop import --connect jdbc:mysql://192.168.0.1:3306/mydb?characterEncoding=UTF-8 --username test --password 'test' --target-dir '/user/hive/warehouse/book' --table t_book
Warning: /opt/sqoop-1.4.7/../hbase does not exist! HBase imports will fail.  
Please set $HBASE_HOME to the root of your HBase installation.  
Warning: /opt/sqoop-1.4.7/../hcatalog does not exist! HCatalog jobs will fail.  
Please set $HCAT_HOME to the root of your HCatalog installation.  
Warning: /opt/sqoop-1.4.7/../accumulo does not exist! Accumulo imports will fail.  
Please set $ACCUMULO_HOME to the root of your Accumulo installation.  
Warning: /opt/sqoop-1.4.7/../zookeeper does not exist! Accumulo imports will fail.  
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.  
28/09/18 23:10:55 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
28/09/18 23:10:55 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.  
28/09/18 23:10:56 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.  
28/09/18 23:10:56 INFO tool.CodeGenTool: Beginning code generation  
28/09/18 23:10:56 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `t_book` AS t LIMIT 1  
28/09/18 23:10:56 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `t_book` AS t LIMIT 1  
28/09/18 23:10:56 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/hadoop-2.6.0-cdh5.10.0/share/hadoop/mapreduce
Note: /tmp/sqoop-hadoop/compile/c798c2a151fc7c3baed090b15aa6e2cb/book.java uses or overrides a deprecated API.  
Note: Recompile with -Xlint:deprecation for details.  
28/09/18 23:10:59 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/c798c2a151fc7c3baed090b15aa6e2cb/book.jar  
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/InputFormat  
    at java.lang.ClassLoader.defineClass1(Native Method)  
    at java.lang.ClassLoader.defineClass(ClassLoader.java:800)  
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)  
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)  
    at java.net.URLClassLoader.access$100(URLClassLoader.java:71)  
    at java.net.URLClassLoader$1.run(URLClassLoader.java:361)  
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)  
    at java.security.AccessController.doPrivileged(Native Method)  
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)  
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)  
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)  
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)  
    at java.lang.ClassLoader.defineClass1(Native Method)  
    at java.lang.ClassLoader.defineClass(ClassLoader.java:800)  
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)  
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)  
    at java.net.URLClassLoader.access$100(URLClassLoader.java:71)  
    at java.net.URLClassLoader$1.run(URLClassLoader.java:361)  
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)  
    at java.security.AccessController.doPrivileged(Native Method)  
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)  
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)  
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)  
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)  
    at java.lang.ClassLoader.defineClass1(Native Method)  
    at java.lang.ClassLoader.defineClass(ClassLoader.java:800)  
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)  
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)  
    at java.net.URLClassLoader.access$100(URLClassLoader.java:71)  
    at java.net.URLClassLoader$1.run(URLClassLoader.java:361)  
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)  
    at java.security.AccessController.doPrivileged(Native Method)  
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)  
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)  
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)  
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)  
    at java.lang.ClassLoader.defineClass1(Native Method)  
    at java.lang.ClassLoader.defineClass(ClassLoader.java:800)  
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)  
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)  
    at java.net.URLClassLoader.access$100(URLClassLoader.java:71)  
    at java.net.URLClassLoader$1.run(URLClassLoader.java:361)  
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)  
    at java.security.AccessController.doPrivileged(Native Method)  
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)  
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)  
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)  
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)  
    at org.apache.sqoop.manager.ImportJobContext.<init>(ImportJobContext.java:51)  
    at com.cloudera.sqoop.manager.ImportJobContext.<init>(ImportJobContext.java:33)  
    at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:483)  
    at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:601)  
    at org.apache.sqoop.Sqoop.run(Sqoop.java:143)  
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)  
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)  
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)  
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)  
    at org.apache.sqoop.Sqoop.main(Sqoop.java:236)  
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.InputFormat  
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)  
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)  
    at java.security.AccessController.doPrivileged(Native Method)  
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)  
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)  
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)  
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)  
    ... 58 more

这个问题是 Sqoop 执行任务的时候找不到 Hadoop mapreduce 的 jar 包,有个简单办法,即是把 mapreduce 相关的 jar 包复制到 $SQOOP_HOME/lib 下面来。

[root@hd-node1 opt]# cp /opt/hadoop-2.6.0-cdh5.10.0/share/hadoop/mapreduce/*.jar /opt/sqoop-1.4.7/lib

至此问题才算真正解决了,再次导出 mysql 的数据到 hdfs 中,终于在 HDFS 的 /user/hive/warehouse/book 这个目录下找到了输出的文件数据了。

java.sql.SQLException: Access denied for user 'test'@'192.168.0.2' (using password: YES)

这种数据库访问拒绝的问题也会见到,但是很好解决,要么是数据库没有设置允许远程访问,要么就是没有授权,或者防火墙没有放行数据库端口。总之,找准环境问题,就可以解决了。

15/03/16 13:07:12 INFO mapreduce.Job: Task Id : attempt_1426431271248_0007_m_000003_0, Status : FAILED  
Error: java.lang.RuntimeException: java.lang.RuntimeException: java.sql.SQLException: Access denied for user 'test'@'192.168.0.2' (using password: YES)  
    at org.apache.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:167)  
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)  
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)  
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)  
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)  
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)  
    at java.security.AccessController.doPrivileged(Native Method)  
    at javax.security.auth.Subject.doAs(Subject.java:415)  
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)    
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)  
Caused by: java.lang.RuntimeException: java.sql.SQLException: Access denied for user 'test'@'192.168.0.1' (using password: YES)  
    at org.apache.sqoop.mapreduce.db.DBInputFormat.getConnection(DBInputFormat.java:220)  
    at org.apache.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:165)  
        ... 9 more  
Caused by: java.sql.SQLException: Access denied for user ''test'@'192.168.0.2' (using password: YES)  
    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1094)  
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4208)  
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4140)  
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:925)  
    at com.mysql.jdbc.MysqlIO.proceedHandshakeWithPluggableAuthentication(MysqlIO.java:1747)  
    at com.mysql.jdbc.MysqlIO.doHandshake(MysqlIO.java:1287)  
    at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2494)  
    at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2527)  
    at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2309)  
    at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:834)  
    at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:46)  
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)  
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)  
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)  
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)  
    at com.mysql.jdbc.Util.handleNewInstance(Util.java:408)  
    at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:419)  
    at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:344)  
    at java.sql.DriverManager.getConnection(DriverManager.java:571)  
    at java.sql.DriverManager.getConnection(DriverManager.java:215)  
    at org.apache.sqoop.mapreduce.db.DBConfiguration.getConnection(DBConfiguration.java:302)  
    at org.apache.sqoop.mapreduce.db.DBInputFormat.getConnection(DBInputFormat.java:213)  
    ... 10 more
如果觉得这对你有用,请随意赞赏,给与作者支持
评论 0
最新评论