Self Hosted Infrastructure：以自动运维 Kubernetes 为例

AirJD

没有录音文件

00:00/00:00

加收藏

Self Hosted Infrastructure：以自动运维 Kubernetes 为例

发布者 kernal

发布于 1492610261002 浏览 8256 关键词 虚拟化

分享到

第1页

Self driving infrastructure

Xiang Li

xiang.li@coreos.com | Head of distributed system

第2页

Topics

● Cluster management systems ● Today’s problems with operating cluster management

systems ● A self-driving approach

第3页

Motivation: microservices

● Increased operational cost ○ a lot of components ○ dynamic dependencies ○ fast deployment iteration

● Solution: automation

第4页

Cluster management system

● Automation ○ Scheduling ○ Deployment ○ Healing ○ Discovery/load balancing ○ Scaling

第5页

Scheduling

Scheduler

第6页

Scheduling

Scheduler

第7页

Scheduling

Scheduler

第8页

Discovery

color=yellow

第9页

Discovery color=yellow Select color = yellow

第10页

Load balancing

yellow.mycluster

Select color = yellow

第11页

Healing

Controller manager

第12页

Healing

Controller manager

第13页

Healing

Controller manager

第14页

People love automation!

第16页

I hate Kubernetes!

第17页

I hate to OPERATE Kubernetes!

第18页

Kubernetes Architecture

第19页

Operating Kubernetes

● Installation ● Upgrade ● Healing ● Scaling ● Security ● Monitoring ● ...

第20页

Installation

- SSH - Install kubelet

- $pkgmanager install kubelet - Install container runtime

- $pkgmanager install [docker|rkt] - Start kubelet

- Systemctl start kubelet

第21页

Installation - master

- SSH - Install scheduler - Install controller manager - Install API server - Config them correctly - Start them

第22页

Installation - etcd

- SSH - Install etcd - Config them correctly - Start them

第23页

Installation

kops, kubeup.sh, kube-AWS,...

AWS, GCP API

node1

node2

node3

第24页

Upgrade

- SSH - Upgrade container runtime - Upgrade Kubelet

第25页

Upgrade - master

- SSH - Upgrade master components

第26页

Upgrade - etcd

- SSH - Upgrade etcd

第27页

Upgrade

kops AWS, GCP API

node1

node2

node3

第28页

Rollback

jQuery11020470094545113128_1492610438640? AWS, GCP API

node1

node2

node3

第29页

Healing

AWS, GCP API

node2

node3

第30页

Healing

Create node

AWS, GCP API

node1’

node2

node3

第31页

Healing

Install Config

AWS, GCP API

node1’

node2

node3

第32页

Problems

A lot of manual/semi-manual work No standard way to approach all the problems

do it wrong, lose the cluster!

第33页

Self hosting

// gcc source code #include <stdio.h> int main() {

compile_c(argv[1]); }

gcc

gcc

第34页

Self hosting

// golang source code package main import "os" func main() {

compile_go(os.Args[1:]) }

go

go

第35页

Self hosting

第36页

Self hosting

$ uname -s minix $ gcc linux.c

第37页

Self hosting

$ uname -s minix $ gcc linux.c

第38页

Self hosting

第39页

Self hosting

$ uname -s linux $ gcc linux.c

第40页

Self hosting

$ uname -s linux $ gcc linux.c

第41页

Self-hosted Kubernetes?

第42页

What is self-hosted Kubernetes?

● Kubernetes manages own core components ● Core components deployed as native API objects

第43页

Self-hosted k8s Architecture

第44页

Why Self-host Kubernetes?

● Operational expertise around app management in k8s extends to k8s itself ○ E.g. scaling

● Bootstrapping simplified ● Simply cluster life cycle management

○ E.g. updates ● Upstream improvements in Kubernetes directly

translate to improvements in managing Kubernetes

第45页

Simplify Node Bootstrap

On-host requirements become: ● Kubelet ● Container Runtime (docker, rkt, …)

第46页

Any Distro Node Bootstrap

● Install kubelet ○ $pkgmanager install kubelet

● Install container runtime ○ $pkgmanager install [docker|rkt]

● Write kubeconfig ○ scp kubeconfig user@host:/etc/kubernetes/kubeconfig

● Start kubelet ○ Systemctl start kubelet

第47页

Simplify k8s lifecycle management

Manage your cluster with only kubectl

Upgrading a self-hosted Kubernetes cluster:

$ kubectl apply -f kube-apiserver.yaml $ kubectl apply -f kube-scheduler.yaml $ kubectl apply -f kube-controller-manager.yaml $ kubectl apply -f kube-proxy.yaml

第48页

Launching a self-hosted cluster

Need an initial control plane to bootstrap a self-hosted cluster

Bootkube:

● Acts as a temporary control plane long enough to be replaced by a self-hosted control plane.

● Run only on very first node, then not needed again.

github.com/kubernetes-incubator/bootkube

第49页

How Bootkube Works

第50页

etcd Kubelet

第51页

Bootkube

API Server

Scheduler

Controller Manager

etcd Kubelet

第52页

Bootkube

API Server

Scheduler

Controller Manager

etcd Kubelet

第53页

Bootkube

API Server

Scheduler

Controller Manager

etcd Kubelet

第54页

Bootkube

API Server

Create:

Deployment Daemonset Service Secret

Scheduler

Controller Manager

etcd Kubelet

第55页

Bootkube

API Server

Scheduler

Controller Manager

etcd Kubelet

Pods

API Server

Scheduler

Controller Manager

第56页

Bootkube

API Server

Scheduler

Controller Manager

etcd Kubelet

Pods

API Server

Scheduler

Controller Manager

第57页

etcd Kubelet

Pods

API Server

Scheduler

Controller Manager

第58页

etcd Kubelet

Pods

API Server

Scheduler

Controller Manager

第59页

But wait! There’s more!

You can even self-host etcd!

https://coreos.com/blog/introducing-the-etcd-operator.html https://github.com/coreos/etcd-operator

第60页

How to bootstrap self-hosted etcd

第61页

Bootkube

API Server Scheduler Controller

Manager

etcd

Kubelet

第62页

Bootkube

API Server Scheduler Controller

Manager

etcd

Kubelet

Pods

API Server

Scheduler

Controller Manager

etcd operator

第63页

Bootkube

API Server Scheduler Controller

Manager

etcd

Kubelet Seed node

Pods

API Server

Scheduler

Controller Manager

etcd operator

第64页

Bootkube

API Server Scheduler Controller

Manager

etcd

Kubelet

Pods

API Server

Scheduler

Controller Manager

etcd operator

etcd

Add Member

第65页

Bootkube

API Server Scheduler Controller

Manager

etcd

Kubelet Remove member

Pods

API Server

Scheduler

Controller Manager

etcd operator

etcd

第66页

Kubelet

Pods

API Server

Scheduler

Controller Manager

etcd operator

etcd

第67页

Disaster Recovery

Node failure in HA deployments (Kubernetes) Partial loss of control plane components (Kubernetes) Power cycling the entire control plane (Kubernetes) Permanent loss of control plane (External tool)

第68页

Disaster Recovery

Permanent loss of control plane ● Similar situation to initial node bootstrap, but utilizing

existing etcd state or etcd backup. ● Need to start a temporary replacement api-server

○ Could be binary, static pod, new tool, bootkube, etc. ● Recovery once etcd+api is available can be done via

kubectl (as seen previously)

第69页

Self-Driving Kubernetes

第70页

Self driving

- A self-hosted cluster launched via Bootkube - Upgraded via Kubernetes APIs and an Operator - Automated by single-button or fully automatic

第71页

Kubernetes Version Operator

Cluster is running v1.4.3 and configured to run v1.4.5 ● API Server is v1.4.3 ● Scheduler is v1.4.3

Differences from desired config ● API Server should be v1.4.5 ● Scheduler should be v1.4.5

How to get there ● Upgrade all API servers Daemons to v1.4.5 safely

one-by-one ● Upgrade all Scheduler Deployments to v1.4.5 ● Update status to v1.4.5

第72页

The infrastructure

Workload driven Automation driven Easy to manage: self driving approach (Today’s topic) Security focused

第73页

Xiang Li

xiang.li@coreos.com

Thank you!