0
  • 聊天消息
  • 系統(tǒng)消息
  • 評(píng)論與回復(fù)
登錄后你可以
  • 下載海量資料
  • 學(xué)習(xí)在線(xiàn)課程
  • 觀看技術(shù)視頻
  • 寫(xiě)文章/發(fā)帖/加入社區(qū)
會(huì)員中心
創(chuàng)作中心

完善資料讓更多小伙伴認(rèn)識(shí)你,還能領(lǐng)取20積分哦,立即完善>

3天內(nèi)不再提示

如何將Hadoop部署在低廉的硬件上

馬哥Linux運(yùn)維 ? 來(lái)源:馬哥Linux運(yùn)維 ? 作者:馬哥Linux運(yùn)維 ? 2022-09-27 09:40 ? 次閱讀

一、概述

Hadoop 是 Apache 軟件基金會(huì)下一個(gè)開(kāi)源分布式計(jì)算平臺(tái),以 HDFS(Hadoop Distributed File System)、MapReduce(Hadoop2.0 加入了 YARN,Yarn 是資源調(diào)度框架,能夠細(xì)粒度的管理和調(diào)度任務(wù),還能夠支持其他的計(jì)算框架,比如 spark)為核心的 Hadoop 為用戶(hù)提供了系統(tǒng)底層細(xì)節(jié)透明的分布式基礎(chǔ)架構(gòu)。hdfs 的高容錯(cuò)性、高伸縮性、高效性等優(yōu)點(diǎn)讓用戶(hù)可以將 Hadoop 部署在低廉的硬件上,形成分布式系統(tǒng)。

a0425924-3daf-11ed-9e49-dac502259ad0.pngHDFS a07397b4-3daf-11ed-9e49-dac502259ad0.pngYARN

二、開(kāi)始部署

1)添加源

地址:

https://artifacthub.io/packages/helm/apache-hadoop-helm/hadoop

helmrepoaddapache-hadoop-helmhttps://pfisterer.github.io/apache-hadoop-helm/
helmpullapache-hadoop-helm/hadoop--version1.2.0
tar-xfhadoop-1.2.0.tgz

2)構(gòu)建鏡像 Dockerfile

FROMmyharbor.com/bigdata/centos:7.9.2009

RUNrm-f/etc/localtime&&ln-sv/usr/share/zoneinfo/Asia/Shanghai/etc/localtime&&echo"Asia/Shanghai">/etc/timezone

RUNexportLANG=zh_CN.UTF-8

#創(chuàng)建用戶(hù)和用戶(hù)組,跟yaml編排里的spec.template.spec.containers.securityContext.runAsUser:9999
RUNgroupadd--system--gid=9999admin&&useradd--system--home-dir/home/admin--uid=9999--gid=adminadmin

#安裝sudo
RUNyum-yinstallsudo;chmod640/etc/sudoers

#給admin添加sudo權(quán)限
RUNecho"adminALL=(ALL)NOPASSWD:ALL">>/etc/sudoers

RUNyum-yinstallinstallnet-toolstelnetwget

RUNmkdir/opt/apache/

ADDjdk-8u212-linux-x64.tar.gz/opt/apache/

ENVJAVA_HOME=/opt/apache/jdk1.8.0_212
ENVPATH=$JAVA_HOME/bin:$PATH

ENVHADOOP_VERSION3.3.2
ENVHADOOP_HOME=/opt/apache/hadoop

ENVHADOOP_COMMON_HOME=${HADOOP_HOME}
HADOOP_HDFS_HOME=${HADOOP_HOME}
HADOOP_MAPRED_HOME=${HADOOP_HOME}
HADOOP_YARN_HOME=${HADOOP_HOME}
HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
PATH=${PATH}:${HADOOP_HOME}/bin

#RUNcurl--silent--output/tmp/hadoop.tgzhttps://ftp-stud.hs-esslingen.de/pub/Mirrors/ftp.apache.org/dist/hadoop/common/hadoop-${HADOOP_VERSION}/hadoop-${HADOOP_VERSION}.tar.gz&&tar--directory/opt/apache-xzf/tmp/hadoop.tgz&&rm/tmp/hadoop.tgz
ADDhadoop-${HADOOP_VERSION}.tar.gz/opt/apache
RUNln-s/opt/apache/hadoop-${HADOOP_VERSION}${HADOOP_HOME}

RUNchown-Radmin:admin/opt/apache

WORKDIR$HADOOP_HOME

#Hdfsports
EXPOSE500105002050070500755009080209000

#Mapredports
EXPOSE19888

#Yarnports
EXPOSE8030803180328033804080428088

#Otherports
EXPOSE497072122

開(kāi)始構(gòu)建鏡像

dockerbuild-tmyharbor.com/bigdata/hadoop:3.3.2.--no-cache

###參數(shù)解釋
#-t:指定鏡像名稱(chēng)
# . :當(dāng)前目錄Dockerfile
#-f:指定Dockerfile路徑
#--no-cache:不緩存

推送到鏡像倉(cāng)庫(kù)

dockerpushmyharbor.com/bigdata/hadoop:3.3.2

調(diào)整目錄結(jié)構(gòu)

mkdirhadoop/templates/hdfshadoop/templates/yarn
mvhadoop/templates/hdfs-*hadoop/templates/hdfs/
mvhadoop/templates/yarn-*hadoop/templates/yarn/

2)修改配置

hadoop/values.yaml

image:
repository:myharbor.com/bigdata/hadoop
tag:3.3.2
pullPolicy:IfNotPresent

...

persistence:
nameNode:
enabled:true
storageClass:"hadoop-nn-local-storage"
accessMode:ReadWriteOnce
size:10Gi
local:
-name:hadoop-nn-0
host:"local-168-182-110"
path:"/opt/bigdata/servers/hadoop/nn/data/data1"

dataNode:
enabled:true
storageClass:"hadoop-dn-local-storage"
accessMode:ReadWriteOnce
size:20Gi
local:
-name:hadoop-dn-0
host:"local-168-182-110"
path:"/opt/bigdata/servers/hadoop/dn/data/data1"
-name:hadoop-dn-1
host:"local-168-182-110"
path:"/opt/bigdata/servers/hadoop/dn/data/data2"
-name:hadoop-dn-2
host:"local-168-182-110"
path:"/opt/bigdata/servers/hadoop/dn/data/data3"
-name:hadoop-dn-3
host:"local-168-182-111"
path:"/opt/bigdata/servers/hadoop/dn/data/data1"
-name:hadoop-dn-4
host:"local-168-182-111"
path:"/opt/bigdata/servers/hadoop/dn/data/data2"
-name:hadoop-dn-5
host:"local-168-182-111"
path:"/opt/bigdata/servers/hadoop/dn/data/data3"
-name:hadoop-dn-6
host:"local-168-182-112"
path:"/opt/bigdata/servers/hadoop/dn/data/data1"
-name:hadoop-dn-7
host:"local-168-182-112"
path:"/opt/bigdata/servers/hadoop/dn/data/data2"
-name:hadoop-dn-8
host:"local-168-182-112"
path:"/opt/bigdata/servers/hadoop/dn/data/data3"

...

service:
nameNode:
type:NodePort
ports:
dfs:9000
webhdfs:9870
nodePorts:
dfs:30900
webhdfs:30870
dataNode:
type:NodePort
ports:
dfs:9000
webhdfs:9864
nodePorts:
dfs:30901
webhdfs:30864
resourceManager:
type:NodePort
ports:
web:8088
nodePorts:
web:30088
...

securityContext:
runAsUser:9999
privileged:true

hadoop/templates/hdfs/hdfs-nn-pv.yaml

{{-range.Values.persistence.nameNode.local}}
---
apiVersion:v1
kind:PersistentVolume
metadata:
name:{{.name}}
labels:
name:{{.name}}
spec:
storageClassName:{{$.Values.persistence.nameNode.storageClass}}
capacity:
storage:{{$.Values.persistence.nameNode.size}}
accessModes:
-ReadWriteOnce
local:
path:{{.path}}
nodeAffinity:
required:
nodeSelectorTerms:
-matchExpressions:
-key:kubernetes.io/hostname
operator:In
values:
-{{.host}}
---
{{-end}}

hadoop/templates/hdfs/hdfs-dn-pv.yaml

{{-range.Values.persistence.dataNode.local}}
---
apiVersion:v1
kind:PersistentVolume
metadata:
name:{{.name}}
labels:
name:{{.name}}
spec:
storageClassName:{{$.Values.persistence.dataNode.storageClass}}
capacity:
storage:{{$.Values.persistence.dataNode.size}}
accessModes:
-ReadWriteOnce
local:
path:{{.path}}
nodeAffinity:
required:
nodeSelectorTerms:
-matchExpressions:
-key:kubernetes.io/hostname
operator:In
values:
-{{.host}}
---
{{-end}}

修改 hdfs service

mvhadoop/templates/hdfs/hdfs-nn-svc.yamlhadoop/templates/hdfs/hdfs-nn-svc-headless.yaml
mvhadoop/templates/hdfs/hdfs-dn-svc.yamlhadoop/templates/hdfs/hdfs-dn-svc-headless.yaml
#注意修改名稱(chēng),不要重復(fù)

hadoop/templates/hdfs/hdfs-nn-svc.yaml

#AheadlessservicetocreateDNSrecords
apiVersion:v1
kind:Service
metadata:
name:{{include"hadoop.fullname".}}-hdfs-nn
labels:
app.kubernetes.io/name:{{include"hadoop.name".}}
helm.sh/chart:{{include"hadoop.chart".}}
app.kubernetes.io/instance:{{.Release.Name}}
app.kubernetes.io/component:hdfs-nn
spec:
ports:
-name:dfs
port:{{.Values.service.nameNode.ports.dfs}}
protocol:TCP
nodePort:{{.Values.service.nameNode.nodePorts.dfs}}
-name:webhdfs
port:{{.Values.service.nameNode.ports.webhdfs}}
nodePort:{{.Values.service.nameNode.nodePorts.webhdfs}}
type:{{.Values.service.nameNode.type}}
selector:
app.kubernetes.io/name:{{include"hadoop.name".}}
app.kubernetes.io/instance:{{.Release.Name}}
app.kubernetes.io/component:hdfs-nn

hadoop/templates/hdfs/hdfs-dn-svc.yaml

#AheadlessservicetocreateDNSrecords
apiVersion:v1
kind:Service
metadata:
name:{{include"hadoop.fullname".}}-hdfs-dn
labels:
app.kubernetes.io/name:{{include"hadoop.name".}}
helm.sh/chart:{{include"hadoop.chart".}}
app.kubernetes.io/instance:{{.Release.Name}}
app.kubernetes.io/component:hdfs-nn
spec:
ports:
-name:dfs
port:{{.Values.service.dataNode.ports.dfs}}
protocol:TCP
nodePort:{{.Values.service.dataNode.nodePorts.dfs}}
-name:webhdfs
port:{{.Values.service.dataNode.ports.webhdfs}}
nodePort:{{.Values.service.dataNode.nodePorts.webhdfs}}
type:{{.Values.service.dataNode.type}}
selector:
app.kubernetes.io/name:{{include"hadoop.name".}}
app.kubernetes.io/instance:{{.Release.Name}}
app.kubernetes.io/component:hdfs-dn

修改 yarn service

mvhadoop/templates/yarn/yarn-nm-svc.yamlhadoop/templates/yarn/yarn-nm-svc-headless.yaml
mvhadoop/templates/yarn/yarn-rm-svc.yamlhadoop/templates/yarn/yarn-rm-svc-headless.yaml
mvhadoop/templates/yarn/yarn-ui-svc.yamlhadoop/templates/yarn/yarn-rm-svc.yaml
#注意修改名稱(chēng),不要重復(fù)

hadoop/templates/yarn/yarn-rm-svc.yaml

#Servicetoaccesstheyarnwebui
apiVersion:v1
kind:Service
metadata:
name:{{include"hadoop.fullname".}}-yarn-rm
labels:
app.kubernetes.io/name:{{include"hadoop.name".}}
helm.sh/chart:{{include"hadoop.chart".}}
app.kubernetes.io/instance:{{.Release.Name}}
app.kubernetes.io/component:yarn-rm
spec:
ports:
-port:{{.Values.service.resourceManager.ports.web}}
name:web
nodePort:{{.Values.service.resourceManager.nodePorts.web}}
type:{{.Values.service.resourceManager.type}}
selector:
app.kubernetes.io/name:{{include"hadoop.name".}}
app.kubernetes.io/instance:{{.Release.Name}}
app.kubernetes.io/component:yarn-rm

修改控制器

在所有控制中新增如下內(nèi)容:

containers:
...
securityContext:
runAsUser:{{.Values.securityContext.runAsUser}}
privileged:{{.Values.securityContext.privileged}}

hadoop/templates/hadoop-configmap.yaml

###1、將/root換成/opt/apache
###2、TMP_URL="http://{{include"hadoop.fullname".}}-yarn-rm-headless:8088/ws/v1/cluster/info"

3)開(kāi)始安裝

#創(chuàng)建存儲(chǔ)目錄
mkdir-p/opt/bigdata/servers/hadoop/{nn,dn}/data/data{1..3}

helminstallhadoop./hadoop-nhadoop--create-namespace

NOTES

NAME:hadoop
LASTDEPLOYED:SatSep2417552022
NAMESPACE:hadoop
STATUS:deployed
REVISION:1
TESTSUITE:None
NOTES:
1.YoucancheckthestatusofHDFSbyrunningthiscommand:
kubectlexec-nhadoop-ithadoop-hadoop-hdfs-nn-0--/opt/hadoop/bin/hdfsdfsadmin-report

2.Youcanlisttheyarnnodesbyrunningthiscommand:
kubectlexec-nhadoop-ithadoop-hadoop-yarn-rm-0--/opt/hadoop/bin/yarnnode-list

3.Createaport-forwardtotheyarnresourcemanagerUI:
kubectlport-forward-nhadoophadoop-hadoop-yarn-rm-08088:8088

Thenopentheuiinyourbrowser:

openhttp://localhost:8088

4.Youcanrunincludedhadooptestslikethis:
kubectlexec-nhadoop-ithadoop-hadoop-yarn-nm-0--/opt/hadoop/bin/hadoopjar/opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.3.2-tests.jarTestDFSIO-write-nrFiles5-fileSize128MB-resFile/tmp/TestDFSIOwrite.txt

5.Youcanlistthemapreducejobslikethis:
kubectlexec-nhadoop-ithadoop-hadoop-yarn-rm-0--/opt/hadoop/bin/mapredjob-list

6.Thischartcanalsobeusedwiththezeppelinchart
helminstall--namespacehadoop--sethadoop.useConfigMap=true,hadoop.configMapName=hadoop-hadoopstable/zeppelin

7.Youcanscalethenumberofyarnnodeslikethis:
helmupgradehadoop--setyarn.nodeManager.replicas=4stable/hadoop

Makesuretoupdatethevalues.yamlifyouwanttomakethispermanent.

a08eac48-3daf-11ed-9e49-dac502259ad0.png查看

kubectlgetpods,svc-nhadoop-owide
a0bac5bc-3daf-11ed-9e49-dac502259ad0.png

hdfs web:

http://192.168.182.110:30870/

a0f35468-3daf-11ed-9e49-dac502259ad0.png

yarn web:

http://192.168.182.110:30088/

a11c3342-3daf-11ed-9e49-dac502259ad0.png

5)測(cè)試驗(yàn)證

HDFS 測(cè)試驗(yàn)證

kubectlexec-ithadoop-hadoop-hdfs-nn-0-nhadoop--bash
[root@local-168-182-110hadoop]#kubectlexec-ithadoop-hadoop-hdfs-nn-0-nhadoop--bash
bash-4.2$
bash-4.2$
bash-4.2$hdfsdfs-mkdir/tmp
bash-4.2$hdfsdfs-ls/
Found1items
drwxr-xr-x-adminsupergroup02022-09-2417:56/tmp
bash-4.2$echo"testhadoop">test.txt
bash-4.2$hdfsdfs-puttest.txt/tmp/
bash-4.2$hdfsdfs-ls/tmp/
Found1items
-rw-r--r--3adminsupergroup122022-09-2417:57/tmp/test.txt
bash-4.2$hdfsdfs-cat/tmp/
cat:`/tmp':Isadirectory
bash-4.2$hdfsdfs-cat/tmp/test.txt
testhadoop
bash-4.2$

a141d174-3daf-11ed-9e49-dac502259ad0.png
Yarn 的測(cè)試驗(yàn)證等后面講到 hive on k8s 再來(lái)測(cè)試驗(yàn)證。

6)卸載

helmuninstallhadoop-nhadoop

kubectldeletepod-nhadoop`kubectlgetpod-nhadoop|awk'NR>1{print$1}'`--force
kubectlpatchnshadoop-p'{"metadata":{"finalizers":null}}'
kubectldeletenshadoop--force

這里也提供 git 下載地址,有需要的小伙伴可以下載部署玩玩:

https://gitee.com/hadoop-bigdata/hadoop-on-k8s

在 k8s 集群中 yarn 會(huì)慢慢被弱化,直接使用 k8s 資源調(diào)度,而不再使用 yarn 去調(diào)度資源了,這里只是部署了單點(diǎn),僅限于測(cè)試環(huán)境使用,下一篇文章會(huì)講 Hadoop 高可用 on k8s 實(shí)現(xiàn),請(qǐng)小伙伴耐心等待,有任何疑問(wèn)歡迎給我留言~

審核編輯:彭靜
聲明:本文內(nèi)容及配圖由入駐作者撰寫(xiě)或者入駐合作網(wǎng)站授權(quán)轉(zhuǎn)載。文章觀點(diǎn)僅代表作者本人,不代表電子發(fā)燒友網(wǎng)立場(chǎng)。文章及其配圖僅供工程師學(xué)習(xí)之用,如有內(nèi)容侵權(quán)或者其他違規(guī)問(wèn)題,請(qǐng)聯(lián)系本站處理。 舉報(bào)投訴
  • 硬件
    +關(guān)注

    關(guān)注

    11

    文章

    3113

    瀏覽量

    65848
  • Hadoop
    +關(guān)注

    關(guān)注

    1

    文章

    90

    瀏覽量

    15914
  • 計(jì)算框架
    +關(guān)注

    關(guān)注

    0

    文章

    4

    瀏覽量

    1928

原文標(biāo)題:7 張圖入門(mén) Hadoop 在 K8S 環(huán)境中部署

文章出處:【微信號(hào):magedu-Linux,微信公眾號(hào):馬哥Linux運(yùn)維】歡迎添加關(guān)注!文章轉(zhuǎn)載請(qǐng)注明出處。

收藏 人收藏

    評(píng)論

    相關(guān)推薦

    如何將弄好的程序燒multisum的芯片呢?

    如何將弄好的程序燒multisum的芯片呢?
    發(fā)表于 02-13 15:51

    hadoop無(wú)法訪(fǎng)問(wèn)50070端口解決方案

    Hadoop50070是hdfs的web管理頁(yè)面,搭建Hadoop集群環(huán)境時(shí),有些大數(shù)據(jù)開(kāi)發(fā)技術(shù)人員會(huì)遇到Hadoop 50070端口打不開(kāi)的情況,引起該問(wèn)題的原因很多,想要解決這個(gè)
    發(fā)表于 04-10 16:02

    Hadoop的集群環(huán)境部署說(shuō)明

    步驟,那么多的指令肯定是會(huì)覺(jué)得很繁瑣的。畢竟都是從菜鳥(niǎo)一步一步過(guò)來(lái)的,記得當(dāng)初做hadoop集群環(huán)境搭建真的是很煩瑣。目前國(guó)內(nèi)的hadoop商業(yè)發(fā)行版雖然比較多,但是集群環(huán)境的搭建方面基本都很類(lèi)似
    發(fā)表于 10-12 15:51

    hadoop和spark的區(qū)別

    學(xué)習(xí)hadoop已經(jīng)有很長(zhǎng)一段時(shí)間了,好像是二三月份的時(shí)候朋友給了一個(gè)國(guó)產(chǎn)Hadoop發(fā)行版下載地址,因?yàn)檫€是在學(xué)習(xí)階段就下載了一個(gè)三節(jié)點(diǎn)的學(xué)習(xí)版玩一下。研究、學(xué)習(xí)hadoop的朋友
    發(fā)表于 11-30 15:51

    大數(shù)據(jù)hadoop入門(mén)之hadoop家族產(chǎn)品詳解

    Spark和Strom數(shù)據(jù)存在內(nèi)存中 Pig/Hive(Hadoop編程):角色描述Pig是一種高級(jí)編程語(yǔ)言,處理半結(jié)構(gòu)化數(shù)據(jù)擁有非常高的性能,可以幫助我們縮短開(kāi)發(fā)周期。Hive是數(shù)據(jù)分析查詢(xún)工具,尤其
    發(fā)表于 12-26 15:02

    Hadoop新手篇:hadoop入門(mén)基礎(chǔ)教程

    一起就算是hadoop新手入門(mén)的一個(gè)基礎(chǔ)性教程吧(持續(xù)更新中)。 五篇文章講什么?前兩周時(shí)間寫(xiě)的五篇文章,其實(shí)都在講一件事情——hadoop運(yùn)行環(huán)境安裝部署!可能口頭描述幾分鐘就可以把整個(gè)過(guò)程說(shuō)完了,但
    發(fā)表于 01-09 15:39

    hadoop集群的NameNod

    hadoop集群部署
    發(fā)表于 08-20 14:33

    Hadoop-260 HA部署步驟

    Hadoop-260 HA(高可用架構(gòu))部署(超詳細(xì))
    發(fā)表于 09-12 09:38

    如何將程序很好的部署

    我們要開(kāi)啟一個(gè)項(xiàng)目,如何將程序很好的部署關(guān)系到后續(xù)的代碼移植和代碼量增多之后的管理。本文就針對(duì)上述問(wèn)題,提出一種合理的工程部署方法。我們基于cobemx的軟件靈活性,確保每次編譯之后否不改變系統(tǒng)
    發(fā)表于 08-24 07:40

    如何將AI模型部署到嵌入式系統(tǒng)中

    本期我們分享主題是如何將 AI 模型部署到嵌入式系統(tǒng)中,下一期介紹如何在 RT-Thread 操作系統(tǒng)運(yùn)行 Mnist Demo(手寫(xiě)數(shù)字識(shí)別)。 嵌入式關(guān)聯(lián) AIAI落地一直是一
    發(fā)表于 12-14 07:55

    如何將外界溫度顯示1602LCD

    如何將外界溫度顯示1602LCD?
    發(fā)表于 01-24 07:30

    如何將RF與數(shù)模電路設(shè)計(jì)同一PCB

    如何將RF與數(shù)模電路設(shè)計(jì)同一PCB
    發(fā)表于 01-12 21:59 ?17次下載

    淺析Hadoop集群硬件選擇

    Hadoop遠(yuǎn)遠(yuǎn)不止HDFS和MapReduce/Spark,它是一個(gè)全面的數(shù)據(jù)平臺(tái)。CDH平臺(tái)包含了很多Hadoop生態(tài)圈的其他組件。我們?cè)谧鋈杭?guī)劃的時(shí)候往往還需要考慮HBase,Impala和Solr等。它們都會(huì)運(yùn)行在DataNode
    發(fā)表于 11-09 11:59 ?1648次閱讀
    淺析<b class='flag-5'>Hadoop</b>集群<b class='flag-5'>硬件</b>選擇

    基于Hadoop的I/O硬件壓縮加速器

    ,因此使用硬件壓縮加速器來(lái)替換軟件壓縮。Hadoop運(yùn)行在Java虛擬機(jī)上,無(wú)法直接調(diào)用底層I/O硬件壓縮加速器。通過(guò)實(shí)現(xiàn)Hadoop壓縮器/解壓縮器類(lèi)和設(shè)計(jì)C++動(dòng)態(tài)鏈接庫(kù)來(lái)解決從
    發(fā)表于 11-27 10:49 ?0次下載
    基于<b class='flag-5'>Hadoop</b>的I/O<b class='flag-5'>硬件</b>壓縮加速器

    idea如何將項(xiàng)目部署到tomcat服務(wù)器

    項(xiàng)目部署到Tomcat服務(wù)器是一個(gè)常見(jiàn)的操作,下面是一個(gè)詳細(xì)的步驟指南,描述了如何將項(xiàng)目成功部署到Tomcat服務(wù)器
    的頭像 發(fā)表于 12-03 15:25 ?1327次閱讀