Hadoop配置步骤

以下假设服务器上的hadoop用户为hduser

Hadoop版本为2.2.0

###1. JDK配置
下载并解压JDK的包,假设解压后目录为/usr/local/share/applications/jdk1.7.0_55,在hduser的用户配置文件增加JAVA_HOME环境变量

1
JAVA_HOME=/usr/local/share/applications/jdk1.7.0_55

###2. SSH配置
hduser运行以下命令

1
ssh-keygen -t rsa

会出现类似的界面,所有设置用默认值即可

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hduser/.ssh/id_rsa):
Created directory '/home/hduser/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hduser/.ssh/id_rsa.
Your public key has been saved in /home/hduser/.ssh/id_rsa.pub.
The key fingerprint is:
76:c6:77:0d:1a:0a:81:65:69:d9:b7:e7:7b:d8:a6:fc hduser@hadoop2
The key's randomart image is:
+--[ RSA 2048]----+
| o++ |
| ..+.. . |
| .. .... |
| o ..o.o |
| S = oo. .|
| . o . .. |
| + |
| .o +|
| o=E|
+-----------------+

/home/hduser/.ssh目录下会生成id_rsaid_rsa.pub两个文件,如果需要真正多节点运行的话,需要将id_rsa.pub文件分发到每个节点中去,由于我这里采用伪分布式,所以把该文件的内容放到/home/hduser/.ssh目录下的一个名为authorized_keys的文件即可,可以采用以下命令

1
cp id_rsa.pub authorized_keys #注:使用cat id_rsa.pub > authorized_keys不行,具体原因没有深究

再利用ssh命令测试

1
ssh localhost

没有出现要求输入密码即为设置成功

###3. Hadoop配置文件配置
在hduser的用户配置文件增加HADOOP_HOME环境变量

1
HADOOP_HOME=/usr/local/share/applications/hadoop-2.2.0

修改$HADOOP_HOME/etc/hadoop/hadoop-env.sh,加上一段

1
JAVA_HOME=/usr/local/share/applications/jdk1.7.0_55

###4. 修改hosts文件
修改/etc/hosts文件,增加一行

1
127.0.0.1 hadoop2 #hadoop2是我本机的名字

###5. 格式化namenode
执行命令

1
$HADOOP_HOME/bin/hdfs namenode -format

###6. 运行Hadoop

1
2
cd $HADOOP_HOME/sbin
start-all.sh

正常执行完之后,执行jps命令,可以看到hadoop的相关进程

1
2
3
4
5
6
7
8
[hduser@hadoop2 bin]$ jps
1761 NodeManager
1667 ResourceManager
1293 NameNode
2041 Jps
1538 SecondaryNameNode
1378 DataNode
[hduser@hadoop2 bin]$

###7. 在windows客户端中使用eclipse访问hadoop
如果每次都在windows使用eclipse编辑java文件,再打成jar上传到hadoop的开发机上测试,实在太麻烦,而目前hadoop 2.2.0的eclipse插件官方也还没提供,虽然github里有这个插件的源码,可以下载下来编译,但实在是太懒不想去编译,于是尝试直接在eclipse里写一个类,然后以application的方式运行,代码很简单:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
package com.louz.hdfs;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
public class FileListAction {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
FileSystem hdfs = FileSystem.get(conf);
Path listf = new Path("/");
FileStatus stats[] = hdfs.listStatus(listf);
for (int i = 0; i < stats.length; ++i) {
System.out.println(stats[i].getPath().toString());
}
hdfs.close();
}
}

就是查看/目录下的文件列表,但中间也遇到了各种各样的问题。

由于是使用虚拟机部署的伪分布式hadoop,一开始使用virtualbox,无论怎么设置,都无法很好地访问到虚拟机里面的hadoop,后来转投vmware player,使用host-only模式,把虚拟机的防火墙关掉,轻易访问到22、50070、50075端口,但运行上面的程序时,报以下的错误:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
Exception in thread "main" java.net.ConnectException: Call From Louz-HP/192.168.154.1 to hadoop2:9000 failed on connection exception: java.net.ConnectException: Connection refused: no further information; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
at org.apache.hadoop.ipc.Client.call(Client.java:1351)
at org.apache.hadoop.ipc.Client.call(Client.java:1300)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at $Proxy9.getListing(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at $Proxy9.getListing(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:482)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1660)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1643)
at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:640)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:92)
at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:702)
at org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:698)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:698)
at com.louz.hdfs.FileListAction.main(FileListAction.java:15)
Caused by: java.net.ConnectException: Connection refused: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:547)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:642)
at org.apache.hadoop.ipc.Client$Connection.access$2600(Client.java:314)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1399)
at org.apache.hadoop.ipc.Client.call(Client.java:1318)
... 20 more

在宿主机无论如何也telnet不通9000端口。折腾了几天,就快放弃的时候,看到网上有人说需要namenode机器上的hosts文件里要用具体的ip做映射,不能使用127.0.0.1,抱着死马当活马医的想法,修改/etc/hosts文件,使用本机ip进行映射:

1
192.168.154.129 hadoop2

重启hadoop,再次运行上面的程序,终于搞定!

1
2
hdfs://hadoop2:9000/abc
hdfs://hadoop2:9000/test