伪分布式HADOOP环境搭建

本文伪分布式集群的搭建基于CentOS 7+hadoop 2.8.0+server jre 8。

配置防火墙规则

本文主要目标是单节点集群的搭建,为了简单省事直接把防火墙关闭了。

1
[root@localhost storage]# systemctl stop firewalld

关于CENTOS 7为firewalld新增开放端口的方式请移步CentOS 7 为firewalld添加开放端口及相关资料

安装jdk

jdk版本:server-jre-8u151-linux-x64.tar.gz

    解压文件

1
# tar -zxvf server-jre-8u151-linux-x64.tar.gz

    配置环境变量

1
2
3
4
5
6
7
8
# vi ~/.bash_profile
内容如下:
export JAVA_HOME=/storage/jre/jdk1.8.0_151
export CALSSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/tools.jar
PATH=$PATH:$HOME/bin:$JAVA_HOME/bin
export PATH

检查是否配置正确

1
2
3
4
[root@localhost jdk1.8.0_151]# java -version
java version "1.8.0_151"
Java(TM) SE Runtime Environment (build 1.8.0_151-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.151-b12, mixed mode)

创建用户

为hadoop创建独立的系统用户,方便安全、权限等的管理。该步骤非必须,也可以直接部署在root用户下。

1
2
# groupadd hadoop
# useradd hduser -g hadoop

配置SSH

切换至hduser用户su - hduser,为hduser用户生成SSH key。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
[hduser@localhost ~]$ ssh-keygen -t rsa -P ""
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hduser/.ssh/id_rsa):
Created directory '/home/hduser/.ssh'.
Your identification has been saved in /home/hduser/.ssh/id_rsa.
Your public key has been saved in /home/hduser/.ssh/id_rsa.pub.
The key fingerprint is:
55:06:d8:e8:20:75:1c:d0:f9:7e:83:16:9f:89:4d:8b hduser@localhost.localdomain
The key's randomart image is:
+--[ RSA 2048]----+
| .o+o*..o |
| . ..* .o |
| . o .. |
| ..o . |
| S. O + |
| E O |
| . . . |
| |
| |
+-----------------+

以上命令执行完成之后会生成一个空密码的RSA密钥对(空密码是为了避免每次登陆时输入密码)。

1
2
[hduser@localhost .ssh]$ ls
id_rsa id_rsa.pub

然后,你需要启用刚刚创建的密钥来通过SSH访问你的服务器。

1
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

最后测试SSH的设置是否正确

1
2
3
4
5
6
[hduser@localhost .ssh]$ ssh localhost
The authenticity of host 'localhost (::1)' can't be established.
ECDSA key fingerprint is a6:35:b9:b9:43:4f:b2:8b:29:e2:89:35:b5:ae:18:c9.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Last login: Tue Oct 31 02:21:24 2017

如果SSH连接失败了,检查/etc/ssh/sshd_config的服务配置,PubkeyAuthentication值是否为yes

如果启用了AllowUsers参数,需要把hduser用户添加到列表。

如果修改了SSH配置,需要重新加载配置/etc/init.d/ssh reload

下载解压hadoop

Apache Download Mirrors下载安装包。它包括源码版的,源码版需要下载后自己编译,我们直接下载编译好的版本。

1
2
[hduser@localhost hadoop_install]$ tar -zxvf hadoop-2.8.2.tar.gz
[hduser@localhost hadoop_install]$ mv hadoop-2.8.2 /storage/

配置hduser环境变量

新增hadoop相关环境变量配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
[hduser@localhost hadoop-2.8.2]$ vi ~/.bash_profile
# .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
# User specific environment and startup programs
export HADOOP_HOME=/storage/hadoop-2.8.2
export JAVA_HOME=/storage/jre/jdk1.8.0_151
PATH=$PATH:$HOME/.local/bin:$HOME/bin:$HADOOP_HOME/bin:$JAVA_HOME/bin
export PATH

重新加载环境变量source ~/.bash_profile

配置hadoop

hadoop-env.sh

文件所在目录为$HADOOP_HOME/etc/hadoop,它唯一必须修改的参数是JAVA_HOME,其他可以采用默认配置。

1
2
3
4
5
6
7
8
9
......
# The only required environment variable is JAVA_HOME. All others are
# optional. When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.
# The java implementation to use.
export JAVA_HOME=/storage/jre/jdk1.8.0_151
......

core-site.xml

这里我们主要配置hadoop.tmp.dirfs.defaultFS。这里查看更多core-site.xml相关配置。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/storage/hadoop-2.8.2/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>

hdfs-site.xml

由于我们配置的单节点伪分布式模式,所以修改dfs的副本数为1。这里查看其他core_default.xml默认配置。

1
2
3
4
5
6
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

mapred-site.xml

mapred-site.xml相关配置。

格式化HDFS文件系统

在NameNode节点格式化hadoop文件系统。如果在正在运行的集群中执行格式化命令会导致集群所有数据丢失。

1
[hduser@localhost hadoop]$ hdfs namenode -format

启动与停止集群

执行以下命令可以启动集群的NodeManager,NameNode,SecondaryNameNode,ResourceManager,DataNode。

1
[hduser@localhost sbin]$ sbin/start-all.sh

启动完成之后可以执行JPS命令查看各进程是否启动成功。

1
2
3
4
5
6
7
[hduser@localhost sbin]$ jps
9265 NodeManager
9314 Jps
8711 NameNode
9003 SecondaryNameNode
9164 ResourceManager
8813 DataNode

hadoop提供了一些web界面可以查看集群的相关信息。可以通过http://localhost:50070/查看Namenode的一些简要信息以及集群的启停日志。

通过执行sbin/stop-all.sh可以停止集群的所有进程。