标签:git man sse dock selector persist 解决 日志 track
网上使用Kubernetes搭建Hadoop的资料较少,因此自己尝试做了一个,记录下过程和遇到的问题。
一、选择镜像
首先从官方Docker Hub中选择比较热门的镜像。这里选择了bde2020的系列镜像,因为其Githab上的资料比较完善。https://github.com/big-data-europe/docker-hadoop
二、使用docker-compose进行测试
网站上给出的是使用docker-compose运行此hadoop镜像的方法,按照网站上操作即可。
docker-compose是Docker自带的容器编排工具,操作简单,只需要将docker-compose.yml和hadoop.env文件下载到本地,使用docker-compose up命令即可启动。停止服务执行docker-compose down命令。
三、编写各个组件的Kubernetes yaml文件
上面的docker-compose案例虽然简单,但是功能较少,且运行于同一台机器上。我们要做的就是把docker-compose的yaml文件的语法改写为Kubernetes的yaml文件语法。
1.创建configmap
配置文件可以通过configmap录入。参考hadoop.env,编写configmap.yaml如下:
apiVersion: v1 kind: ConfigMap metadata: name: hadoop-config data: CORE_CONF_fs_defaultFS: "hdfs://namenode:8020" CORE_CONF_hadoop_http_staticuser_user: "root" CORE_CONF_hadoop_proxyuser_hue_hosts: "*" CORE_CONF_hadoop_proxyuser_hue_groups: "*" HDFS_CONF_dfs_webhdfs_enabled: "true" HDFS_CONF_dfs_permissions_enabled: "false" YARN_CONF_yarn_log___aggregation___enable: "true" YARN_CONF_yarn_resourcemanager_recovery_enabled: "true" YARN_CONF_yarn_resourcemanager_store_class: "org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore" YARN_CONF_yarn_resourcemanager_fs_state___store_uri: "/rmstate" YARN_CONF_yarn_nodemanager_remote___app___log___dir: "/app-logs" YARN_CONF_yarn_log_server_url: "http://historyserver:8188/applicationhistory/logs/" YARN_CONF_yarn_timeline___service_enabled: "true" YARN_CONF_yarn_timeline___service_generic___application___history_enabled: "true" YARN_CONF_yarn_resourcemanager_system___metrics___publisher_enabled: "true" YARN_CONF_yarn_resourcemanager_hostname: "resourcemanager" YARN_CONF_yarn_timeline___service_hostname: "historyserver" YARN_CONF_yarn_resourcemanager_address: "resourcemanager:8032" YARN_CONF_yarn_resourcemanager_scheduler_address: "resourcemanager:8030" YARN_CONF_yarn_resourcemanager_resource___tracker_address: "resourcemanager:8031"
2.创建namenode
hadoop节点间的通信使用hostname,但是pod在创建时会被系统随机指定一个hostname并写入自己的/etc/hosts文件中,从而造成节点间的通信问题,出现UnresolvedAddressException等错误信息。这里坑了我好久,查了很多资料才发现问题。
解决方法就是在service中将clusterIP指定为None,并在deployment中指定hostname与service名称一致。为了避免混淆,后面的service name、container name、hostname等都设为相同的值。
namenode需要挂载volume,因此先编写pvc.yaml(需要先创建StorageClass,具体可参考我之前的博客https://www.cnblogs.com/00986014w/p/9406962.html):
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: hadoop-namenode-pvc
spec:
storageClassName: nfs
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
编写namenode的service和deployment文件namenode.yaml如下:
apiVersion: v1
kind: Service
metadata:
name: namenode
labels:
name: namenode
spec:
ports:
- port: 50070
name: http
- port: 8020
name: hdfs
- port: 50075
name: hdfs1
- port: 50010
name: hdfs2
- port: 50020
name: hdfs3
- port: 9000
name: hdfs4
- port: 50090
name: hdfs5
- port: 31010
name: hdfs6
- port: 8030
name: yarn1
- port: 8031
name: yarn2
- port: 8032
name: yarn3
- port: 8033
name: yarn4
- port: 8040
name: yarn5
- port: 8042
name: yarn6
- port: 8088
name: yarn7
- port: 8188
name: historyserver
selector:
name: namenode
clusterIP: None
---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: namenode
spec:
replicas: 1
template:
metadata:
labels:
name: namenode
spec:
hostname: namenode
containers:
- name: namenode
image: bde2020/hadoop-namenode:1.1.0-hadoop2.7.1-java8
imagePullPolicy: IfNotPresent
ports:
- containerPort: 50070
name: http
- containerPort: 8020
name: hdfs
- containerPort: 50075
name: hdfs1
- containerPort: 50010
name: hdfs2
- containerPort: 50020
name: hdfs3
- containerPort: 9000
name: hdfs4
- containerPort: 50090
name: hdfs5
- containerPort: 31010
name: hdfs6
- containerPort: 8030
name: yarn1
- containerPort: 8031
name: yarn2
- containerPort: 8032
name: yarn3
- containerPort: 8033
name: yarn4
- containerPort: 8040
name: yarn5
- containerPort: 8042
name: yarn6
- containerPort: 8088
name: yarn7
- containerPort: 8188
name: historyserver
env:
- name: CLUSTER_NAME
value: test
envFrom:
- configMapRef:
name: hadoop-config
volumeMounts:
- name: hadoop-namenode
mountPath: /hadoop/dfs/name
volumes:
- name: hadoop-namenode
persistentVolumeClaim:
claimName: hadoop-namenode-pvc
2.datanode
创建3个datanode。以datanode1为例,编写datanode的datanode.yaml如下(pvc与namenode的类似,不贴出来了):
apiVersion: v1
kind: Service
metadata:
name: datanode1
labels:
name: datanode1
spec:
ports:
- port: 50070
name: http
- port: 8020
name: hdfs
- port: 50075
name: hdfs1
- port: 50010
name: hdfs2
- port: 50020
name: hdfs3
- port: 9000
name: hdfs4
- port: 50090
name: hdfs5
- port: 31010
name: hdfs6
- port: 8030
name: yarn1
- port: 8031
name: yarn2
- port: 8032
name: yarn3
- port: 8033
name: yarn4
- port: 8040
name: yarn5
- port: 8042
name: yarn6
- port: 8088
name: yarn7
- port: 8188
name: historyserver
selector:
name: datanode1
clusterIP: None
---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: datanode1
spec:
replicas: 1
template:
metadata:
labels:
name: datanode1
spec:
hostname: datanode1
containers:
- name: datanode1
image: bde2020/hadoop-datanode:1.1.0-hadoop2.7.1-java8
imagePullPolicy: IfNotPresent
ports:
- containerPort: 50070
name: http
- containerPort: 8020
name: hdfs
- containerPort: 50075
name: hdfs1
- containerPort: 50010
name: hdfs2
- containerPort: 50020
name: hdfs3
- containerPort: 9000
name: hdfs4
- containerPort: 50090
name: hdfs5
- containerPort: 31010
name: hdfs6
- containerPort: 8030
name: yarn1
- containerPort: 8031
name: yarn2
- containerPort: 8032
name: yarn3
- containerPort: 8033
name: yarn4
- containerPort: 8040
name: yarn5
- containerPort: 8042
name: yarn6
- containerPort: 8088
name: yarn7
- containerPort: 8188
name: historyserver
envFrom:
- configMapRef:
name: hadoop-config
volumeMounts:
- name: hadoop-datanode1
mountPath: /hadoop/dfs/data
volumes:
- name: hadoop-datanode1
persistentVolumeClaim:
claimName: hadoop-datanode1-pvc
创建完成后,一定要用kubectl logs查看一下日志,确认没有错误信息后再继续下一步。
3.resourcemanager
编写resourcemanager.yaml文件如下:
apiVersion: v1
kind: Service
metadata:
name: resourcemanager
labels:
name: resourcemanager
spec:
ports:
- port: 50070
name: http
- port: 8020
name: hdfs
- port: 50075
name: hdfs1
- port: 50010
name: hdfs2
- port: 50020
name: hdfs3
- port: 9000
name: hdfs4
- port: 50090
name: hdfs5
- port: 31010
name: hdfs6
- port: 8030
name: yarn1
- port: 8031
name: yarn2
- port: 8032
name: yarn3
- port: 8033
name: yarn4
- port: 8040
name: yarn5
- port: 8042
name: yarn6
- port: 8088
name: yarn7
- port: 8188
name: historyserver
selector:
name: resourcemanager
clusterIP: None
---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: resourcemanager
spec:
replicas: 1
template:
metadata:
labels:
name: resourcemanager
spec:
hostname: resourcemanager
containers:
- name: resourcemanager
image: bde2020/hadoop-resourcemanager:1.1.0-hadoop2.7.1-java8
imagePullPolicy: IfNotPresent
ports:
- containerPort: 50070
name: http
- containerPort: 8020
name: hdfs
- containerPort: 50075
name: hdfs1
- containerPort: 50010
name: hdfs2
- containerPort: 50020
name: hdfs3
- containerPort: 9000
name: hdfs4
- containerPort: 50090
name: hdfs5
- containerPort: 31010
name: hdfs6
- containerPort: 8030
name: yarn1
- containerPort: 8031
name: yarn2
- containerPort: 8032
name: yarn3
- containerPort: 8033
name: yarn4
- containerPort: 8040
name: yarn5
- containerPort: 8042
name: yarn6
- containerPort: 8088
name: yarn7
- containerPort: 8188
name: historyserver
envFrom:
- configMapRef:
name: hadoop-config
4.nodemanager
编写nodemanager.yaml如下:
apiVersion: v1
kind: Service
metadata:
name: nodemanager1
labels:
name: nodemanager1
spec:
ports:
- port: 50070
name: http
- port: 8020
name: hdfs
- port: 50075
name: hdfs1
- port: 50010
name: hdfs2
- port: 50020
name: hdfs3
- port: 9000
name: hdfs4
- port: 50090
name: hdfs5
- port: 31010
name: hdfs6
- port: 8030
name: yarn1
- port: 8031
name: yarn2
- port: 8032
name: yarn3
- port: 8033
name: yarn4
- port: 8040
name: yarn5
- port: 8042
name: yarn6
- port: 8088
name: yarn7
- port: 8188
name: historyserver
selector:
name: nodemanager1
clusterIP: None
---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: nodemanager1
spec:
replicas: 1
template:
metadata:
labels:
name: nodemanager1
spec:
hostname: nodemanager1
containers:
- name: nodemanager1
image: bde2020/hadoop-nodemanager:1.1.0-hadoop2.7.1-java8
imagePullPolicy: IfNotPresent
ports:
- containerPort: 50070
name: http
- containerPort: 8020
name: hdfs
- containerPort: 50075
name: hdfs1
- containerPort: 50010
name: hdfs2
- containerPort: 50020
name: hdfs3
- containerPort: 9000
name: hdfs4
- containerPort: 50090
name: hdfs5
- containerPort: 31010
name: hdfs6
- containerPort: 8030
name: yarn1
- containerPort: 8031
name: yarn2
- containerPort: 8032
name: yarn3
- containerPort: 8033
name: yarn4
- containerPort: 8040
name: yarn5
- containerPort: 8042
name: yarn6
- containerPort: 8088
name: yarn7
- containerPort: 8188
envFrom:
- configMapRef:
name: hadoop-config
5.historyserver
pvc与前面类似。编写historyserver.yaml如下:
apiVersion: v1
kind: Service
metadata:
name: historyserver
labels:
name: historyserver
spec:
ports:
- port: 50070
name: http
- port: 8020
name: hdfs
- port: 50075
name: hdfs1
- port: 50010
name: hdfs2
- port: 50020
name: hdfs3
- port: 9000
name: hdfs4
- port: 50090
name: hdfs5
- port: 31010
name: hdfs6
- port: 8030
name: yarn1
- port: 8031
name: yarn2
- port: 8032
name: yarn3
- port: 8033
name: yarn4
- port: 8040
name: yarn5
- port: 8042
name: yarn6
- port: 8088
name: yarn7
- port: 8188
name: historyserver
selector:
name: historyserver
clusterIP: None
---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: historyserver
spec:
replicas: 1
template:
metadata:
labels:
name: historyserver
spec:
hostname: historyserver
containers:
- name: historyserver
image: bde2020/hadoop-historyserver:1.1.0-hadoop2.7.1-java8
imagePullPolicy: IfNotPresent
ports:
- containerPort: 50070
name: http
- containerPort: 8020
name: hdfs
- containerPort: 50075
name: hdfs1
- containerPort: 50010
name: hdfs2
- containerPort: 50020
name: hdfs3
- containerPort: 9000
name: hdfs4
- containerPort: 50090
name: hdfs5
- containerPort: 31010
name: hdfs6
- containerPort: 8030
name: yarn1
- containerPort: 8031
name: yarn2
- containerPort: 8032
name: yarn3
- containerPort: 8033
name: yarn4
- containerPort: 8040
name: yarn5
- containerPort: 8042
name: yarn6
- containerPort: 8088
name: yarn7
- containerPort: 8188
envFrom:
- configMapRef:
name: hadoop-config
volumeMounts:
- name: hadoop-historyserver
mountPath: /hadoop/yarn/timeline
volumes:
- name: hadoop-historyserver
persistentVolumeClaim:
claimName: hadoop-historyserver-pvc
以上几部分都用kubectl create创建后,参考GitHub,按照这5个部件对应的endpoint加上对应的端口,在浏览器上测试(需要在集群内部的某台机器上进行操作),如果能够正确显示Hadoop的页面,说明搭建成功!
6.测试
简单地测试一下节点间是否能够正常通行。
使用kubectl exec -it namenode /bin/bash进入namenode内部,执行hdfs dfs -put /etc/issue /,看看是否能够正常上传。
标签:git man sse dock selector persist 解决 日志 track
原文地址:https://www.cnblogs.com/00986014w/p/9732796.html