sqoop-howto

Sqoop主要用于在Hadoop与传统关系型数据库之间进行数据传递

##帮助命令
$SQOOP_HOME/bin/sqoop help <command>

##从Mysql导入数据到HDFS

首先先把mysql的jdbc驱动copy到$SQOOP_HOME/lib目录

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
[hduser@hadoop2 lib]$ pwd
/usr/local/share/applications/sqoop-1.4.5.bin__hadoop-2.0.4-alpha/lib
[hduser@hadoop2 lib]$ l
total 5144
-rw-rw-r--. 1 root root 224277 Aug 2 2014 ant-contrib-1.0b3.jar
-rw-rw-r--. 1 root root 36455 Aug 2 2014 ant-eclipse-1.0-jvm1.2.jar
-rw-rw-r--. 1 root root 400680 Aug 2 2014 avro-1.7.5.jar
-rw-rw-r--. 1 root root 170570 Aug 2 2014 avro-mapred-1.7.5-hadoop2.jar
-rw-rw-r--. 1 root root 241367 Aug 2 2014 commons-compress-1.4.1.jar
-rw-rw-r--. 1 root root 109043 Aug 2 2014 commons-io-1.4.jar
-rw-rw-r--. 1 root root 706710 Aug 2 2014 hsqldb-1.8.0.10.jar
-rw-rw-r--. 1 root root 232248 Aug 2 2014 jackson-core-asl-1.9.13.jar
-rw-rw-r--. 1 root root 780664 Aug 2 2014 jackson-mapper-asl-1.9.13.jar
-rw-r--r--. 1 root root 969020 Jul 8 18:06 mysql-connector-java-5.1.32-bin.jar
-rw-rw-r--. 1 root root 29555 Aug 2 2014 paranamer-2.3.jar
-rw-rw-r--. 1 root root 1251514 Aug 2 2014 snappy-java-1.0.5.jar
-rw-rw-r--. 1 root root 94672 Aug 2 2014 xz-1.0.jar

执行命令查看当前Mysql有哪些数据库

1
bin/sqoop list-databases --connect jdbc:mysql://localhost/test --username mysql

会看到类似的输出,最后两行就是当前Mysql中的数据库列表

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[hduser@hadoop2 sqoop-1.4.5.bin__hadoop-2.0.4-alpha]$ bin/sqoop list-databases --connect jdbc:mysql://localhost/test --username mysql
Warning: /usr/local/share/applications/sqoop-1.4.5.bin__hadoop-2.0.4-alpha/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/share/applications/sqoop-1.4.5.bin__hadoop-2.0.4-alpha/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /usr/local/share/applications/sqoop-1.4.5.bin__hadoop-2.0.4-alpha/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
14/07/08 18:23:15 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5
14/07/08 18:23:15 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/share/applications/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/share/applications/hbase-0.98.2-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
information_schema
test

执行如下命令可以将users表的数据导入到HDFS的/sqoop/import/users目录
注意:若users表没有主键,则必须指定--split-by参数或-m 1参数

1
bin/sqoop import --connect jdbc:mysql://localhost/test --username mysql --table users --split-by id

##从Mysql导入数据到HBase

##从HDFS导出数据到Mysql