HarmonyOS开发者限时福利来啦!最高10w+现金激励等你拿~ 了解详情
写点什么

详解 Kubernetes Service 的实现原理

  • 2019-12-03
  • 本文字数:25841 字

    阅读完需:约 85 分钟

详解 Kubernetes Service 的实现原理

在上一篇文章中,我们介绍了 Kubernetes 中 Pod 的实现原理,Pod 是 Kubernetes 中非常轻量的对象。


集群中的每一个 Pod 都可以通过 podIP 被直接访问的,但是正如我们所看到的,Kubernetes 中的 Pod 是有生命周期的对象,尤其是被 ReplicaSet、Deployment 等对象管理的 Pod,随时都有可能由于集群的状态变化被销毁和创建。



这也就造成了一个非常有意思的问题,当 Kuberentes 集群中的一些 Pod 需要为另外的一些 Pod 提供服务时,我们如何为提供同一功能服务的一组 Pod 建立一个抽象并追踪这组服务中节点的健康状态。


这一个抽象在 Kubernetes 中其实就是 Service,每一个 Kubernetes 的 Service 都是一组 Pod 的逻辑集合和访问方式的抽象,我也可以把 Service 加上的一组 Pod 称作是一个微服务。


在这篇文章中,我们将分两个部分介绍 Kubernetes 中 Service 的实现原理,在第一部分我们将介绍 Kubernetes 如何处理服务的创建,第二部分会介绍它是如何转发来自节点内部和外部的流量。

创建服务

在 Kubernetes 中创建一个新的 Service 对象需要两大模块同时协作,其中一个模块是控制器,它需要在每次客户端创建新的 Service 对象时,生成其他用于暴露一组 Pod 的 Kubernetes 对象,也就是 Endpoint 对象;另一个模块是 kube-proxy,它运行在 Kubernetes 集群中的每一个节点上,会根据 Service 和 Endpoint 的变动改变节点上 iptables 或者 ipvs 中保存的规则。

控制器

控制器模块其实总共有两个部分监听了 Service 变动的事件,其中一个是 ServiceController、另一个是 EndpointController,我们分别来看两者如何应对 Service 的变动。

Service

我们可以先来看一下 ServiceController 在 Service 对象变动时发生了什么事情,每当有服务被创建或者销毁时,Informer 都会通知 ServiceController,它会将这些任务投入工作队列中并由其本身启动的 Worker 协程消费:


#mermaid-1575357500964 .label{font-family:trebuchet ms,verdana,arial;color:#333}#mermaid-1575357500964 .node circle,#mermaid-1575357500964 .node ellipse,#mermaid-1575357500964 .node polygon,#mermaid-1575357500964 .node rect{fill:#ececff;stroke:#9370db;stroke-width:1px}#mermaid-1575357500964 .node.clickable{cursor:pointer}#mermaid-1575357500964 .arrowheadPath{fill:#333}#mermaid-1575357500964 .edgePath .path{stroke:#333;stroke-width:1.5px}#mermaid-1575357500964 .edgeLabel{background-color:#e8e8e8}#mermaid-1575357500964 .cluster rect{fill:#ffffde!important;stroke:#aa3!important;stroke-width:1px!important}#mermaid-1575357500964 .cluster text{fill:#333}#mermaid-1575357500964 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:trebuchet ms,verdana,arial;font-size:12px;background:#ffffde;border:1px solid #aa3;border-radius:2px;pointer-events:none;z-index:100}#mermaid-1575357500964 .actor{stroke:#ccf;fill:#ececff}#mermaid-1575357500964 text.actor{fill:#000;stroke:none}#mermaid-1575357500964 .actor-line{stroke:grey}#mermaid-1575357500964 .messageLine0{marker-end:“url(#arrowhead)”}#mermaid-1575357500964 .messageLine0,#mermaid-1575357500964 .messageLine1{stroke-width:1.5;stroke-dasharray:“2 2”;stroke:#333}#mermaid-1575357500964 #arrowhead{fill:#333}#mermaid-1575357500964 #crosshead path{fill:#333!important;stroke:#333!important}#mermaid-1575357500964 .messageText{fill:#333;stroke:none}#mermaid-1575357500964 .labelBox{stroke:#ccf;fill:#ececff}#mermaid-1575357500964 .labelText,#mermaid-1575357500964 .loopText{fill:#000;stroke:none}#mermaid-1575357500964 .loopLine{stroke-width:2;stroke-dasharray:“2 2”;marker-end:“url(#arrowhead)”;stroke:#ccf}#mermaid-1575357500964 .note{stroke:#aa3;fill:#fff5ad}#mermaid-1575357500964 .noteText{fill:#000;stroke:none;font-family:trebuchet ms,verdana,arial;font-size:14px}#mermaid-1575357500964 .section{stroke:none;opacity:.2}#mermaid-1575357500964 .section0{fill:rgba(102,102,255,.49)}#mermaid-1575357500964 .section2{fill:#fff400}#mermaid-1575357500964 .section1,#mermaid-1575357500964 .section3{fill:#fff;opacity:.2}#mermaid-1575357500964 .sectionTitle0,#mermaid-1575357500964 .sectionTitle1,#mermaid-1575357500964 .sectionTitle2,#mermaid-1575357500964 .sectionTitle3{fill:#333}#mermaid-1575357500964 .sectionTitle{text-anchor:start;font-size:11px;text-height:14px}#mermaid-1575357500964 .grid .tick{stroke:#d3d3d3;opacity:.3;shape-rendering:crispEdges}#mermaid-1575357500964 .grid path{stroke-width:0}#mermaid-1575357500964 .today{fill:none;stroke:red;stroke-width:2px}#mermaid-1575357500964 .task{stroke-width:2}#mermaid-1575357500964 .taskText{text-anchor:middle;font-size:11px}#mermaid-1575357500964 .taskTextOutsideRight{fill:#000;text-anchor:start;font-size:11px}#mermaid-1575357500964 .taskTextOutsideLeft{fill:#000;text-anchor:end;font-size:11px}#mermaid-1575357500964 .taskText0,#mermaid-1575357500964 .taskText1,#mermaid-1575357500964 .taskText2,#mermaid-1575357500964 .taskText3{fill:#fff}#mermaid-1575357500964 .task0,#mermaid-1575357500964 .task1,#mermaid-1575357500964 .task2,#mermaid-1575357500964 .task3{fill:#8a90dd;stroke:#534fbc}#mermaid-1575357500964 .taskTextOutside0,#mermaid-1575357500964 .taskTextOutside1,#mermaid-1575357500964 .taskTextOutside2,#mermaid-1575357500964 .taskTextOutside3{fill:#000}#mermaid-1575357500964 .active0,#mermaid-1575357500964 .active1,#mermaid-1575357500964 .active2,#mermaid-1575357500964 .active3{fill:#bfc7ff;stroke:#534fbc}#mermaid-1575357500964 .activeText0,#mermaid-1575357500964 .activeText1,#mermaid-1575357500964 .activeText2,#mermaid-1575357500964 .activeText3{fill:#000!important}#mermaid-1575357500964 .done0,#mermaid-1575357500964 .done1,#mermaid-1575357500964 .done2,#mermaid-1575357500964 .done3{stroke:grey;fill:#d3d3d3;stroke-width:2}#mermaid-1575357500964 .doneText0,#mermaid-1575357500964 .doneText1,#mermaid-1575357500964 .doneText2,#mermaid-1575357500964 .doneText3{fill:#000!important}#mermaid-1575357500964 .crit0,#mermaid-1575357500964 .crit1,#mermaid-1575357500964 .crit2,#mermaid-1575357500964 .crit3{stroke:#f88;fill:red;stroke-width:2}#mermaid-1575357500964 .activeCrit0,#mermaid-1575357500964 .activeCrit1,#mermaid-1575357500964 .activeCrit2,#mermaid-1575357500964 .activeCrit3{stroke:#f88;fill:#bfc7ff;stroke-width:2}#mermaid-1575357500964 .doneCrit0,#mermaid-1575357500964 .doneCrit1,#mermaid-1575357500964 .doneCrit2,#mermaid-1575357500964 .doneCrit3{stroke:#f88;fill:#d3d3d3;stroke-width:2;cursor:pointer;shape-rendering:crispEdges}#mermaid-1575357500964 .activeCritText0,#mermaid-1575357500964 .activeCritText1,#mermaid-1575357500964 .activeCritText2,#mermaid-1575357500964 .activeCritText3,#mermaid-1575357500964 .doneCritText0,#mermaid-1575357500964 .doneCritText1,#mermaid-1575357500964 .doneCritText2,#mermaid-1575357500964 .doneCritText3{fill:#000!important}#mermaid-1575357500964 .titleText{text-anchor:middle;font-size:18px;fill:#000}#mermaid-1575357500964 g.classGroup text{fill:#9370db;stroke:none;font-family:trebuchet ms,verdana,arial;font-size:10px}#mermaid-1575357500964 g.classGroup rect{fill:#ececff;stroke:#9370db}#mermaid-1575357500964 g.classGroup line{stroke:#9370db;stroke-width:1}#mermaid-1575357500964 .classLabel .box{stroke:none;stroke-width:0;fill:#ececff;opacity:.5}#mermaid-1575357500964 .classLabel .label{fill:#9370db;font-size:10px}#mermaid-1575357500964 .relation{stroke:#9370db;stroke-width:1;fill:none}#mermaid-1575357500964 #compositionEnd,#mermaid-1575357500964 #compositionStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-1575357500964 #aggregationEnd,#mermaid-1575357500964 #aggregationStart{fill:#ececff;stroke:#9370db;stroke-width:1}#mermaid-1575357500964 #dependencyEnd,#mermaid-1575357500964 #dependencyStart,#mermaid-1575357500964 #extensionEnd,#mermaid-1575357500964 #extensionStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-1575357500964 .branch-label,#mermaid-1575357500964 .commit-id,#mermaid-1575357500964 .commit-msg{fill:#d3d3d3;color:#d3d3d3}#mermaid-1575357500964 {


color: rgb(58, 65, 69);


font: normal normal 400 normal 18px / 33.3px “Hiragino Sans GB”, “Heiti SC”, “Microsoft YaHei”, sans-serif, Merriweather, serif;


}InformerServiceControllerWorkQueueBalancerAdd/Update/DeleteServiceAddreturnGetkeysyncServiceEnsureLoadBalancerLoadBalancerStatusloop[ Worker ]InformerServiceControllerWorkQueueBalancer


不过 ServiceController 其实只处理了负载均衡类型的 Service 对象,它会调用云服务商的 API 接口,不同的云服务商会实现不同的适配器来创建 LoadBalancer 类型的资源。



我们以 GCE 为例简单介绍一下 Google Cloud 是如何对实现负载均衡类型的 Service:


Go


func (g *Cloud) EnsureLoadBalancer(ctx context.Context, clusterName string, svc *v1.Service, nodes []*v1.Node) (*v1.LoadBalancerStatus, error) {  loadBalancerName := g.GetLoadBalancerName(ctx, clusterName, svc)  desiredScheme := getSvcScheme(svc)  clusterID, _ := g.ClusterID.GetID()
existingFwdRule, _ := g.GetRegionForwardingRule(loadBalancerName, g.region)
if existingFwdRule != nil { existingScheme := cloud.LbScheme(strings.ToUpper(existingFwdRule.LoadBalancingScheme)) if existingScheme != desiredScheme { switch existingScheme { case cloud.SchemeInternal: g.ensureInternalLoadBalancerDeleted(clusterName, clusterID, svc) default: g.ensureExternalLoadBalancerDeleted(clusterName, clusterID, svc) } existingFwdRule = nil } }
var status *v1.LoadBalancerStatus switch desiredScheme { case cloud.SchemeInternal: status, err = g.ensureInternalLoadBalancer(clusterName, clusterID, svc, existingFwdRule, nodes) default: status, err = g.ensureExternalLoadBalancer(clusterName, clusterID, svc, existingFwdRule, nodes) } return status, err}
复制代码


上述代码会先判断是否应该先删除已经存在的负载均衡资源,随后会调用一个内部的方法 ensureExternalLoadBalancer 在 Google Cloud 上创建一个新的资源,这个方法的调用过程比较复杂:


  1. 检查转发规则是否存在并获取它的 IP 地址;

  2. 确定当前 LoadBalancer 使用的 IP 地址;

  3. 处理防火墙的规则的创建和更新;

  4. 创建和删除指定的健康检查;


想要了解 GCE 是如何对 LoadBalancer 进行支持的可以在 Kubernetes 中的 gce package 中阅读相关的代码,这里面就是 gce 对于云服务商特定资源的实现方式。

Endpoint

ServiceController 主要处理的还是与 LoadBalancer 相关的逻辑,但是 EndpointController 的作用就没有这么简单了,我们在使用 Kubernetes 时虽然很少会直接与 Endpoint 资源打交道,但是它却是 Kubernetes 中非常重要的组成部分。


EndpointController 本身并没有通过 Informer 监听 Endpoint 资源的变动,但是它却同时订阅了 Service 和 Pod 资源的增删事件,对于 Service 资源来讲,EndpointController 会通过以下的方式进行处理:


#mermaid-1575357501016 .label{font-family:trebuchet ms,verdana,arial;color:#333}#mermaid-1575357501016 .node circle,#mermaid-1575357501016 .node ellipse,#mermaid-1575357501016 .node polygon,#mermaid-1575357501016 .node rect{fill:#ececff;stroke:#9370db;stroke-width:1px}#mermaid-1575357501016 .node.clickable{cursor:pointer}#mermaid-1575357501016 .arrowheadPath{fill:#333}#mermaid-1575357501016 .edgePath .path{stroke:#333;stroke-width:1.5px}#mermaid-1575357501016 .edgeLabel{background-color:#e8e8e8}#mermaid-1575357501016 .cluster rect{fill:#ffffde!important;stroke:#aa3!important;stroke-width:1px!important}#mermaid-1575357501016 .cluster text{fill:#333}#mermaid-1575357501016 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:trebuchet ms,verdana,arial;font-size:12px;background:#ffffde;border:1px solid #aa3;border-radius:2px;pointer-events:none;z-index:100}#mermaid-1575357501016 .actor{stroke:#ccf;fill:#ececff}#mermaid-1575357501016 text.actor{fill:#000;stroke:none}#mermaid-1575357501016 .actor-line{stroke:grey}#mermaid-1575357501016 .messageLine0{marker-end:“url(#arrowhead)”}#mermaid-1575357501016 .messageLine0,#mermaid-1575357501016 .messageLine1{stroke-width:1.5;stroke-dasharray:“2 2”;stroke:#333}#mermaid-1575357501016 #arrowhead{fill:#333}#mermaid-1575357501016 #crosshead path{fill:#333!important;stroke:#333!important}#mermaid-1575357501016 .messageText{fill:#333;stroke:none}#mermaid-1575357501016 .labelBox{stroke:#ccf;fill:#ececff}#mermaid-1575357501016 .labelText,#mermaid-1575357501016 .loopText{fill:#000;stroke:none}#mermaid-1575357501016 .loopLine{stroke-width:2;stroke-dasharray:“2 2”;marker-end:“url(#arrowhead)”;stroke:#ccf}#mermaid-1575357501016 .note{stroke:#aa3;fill:#fff5ad}#mermaid-1575357501016 .noteText{fill:#000;stroke:none;font-family:trebuchet ms,verdana,arial;font-size:14px}#mermaid-1575357501016 .section{stroke:none;opacity:.2}#mermaid-1575357501016 .section0{fill:rgba(102,102,255,.49)}#mermaid-1575357501016 .section2{fill:#fff400}#mermaid-1575357501016 .section1,#mermaid-1575357501016 .section3{fill:#fff;opacity:.2}#mermaid-1575357501016 .sectionTitle0,#mermaid-1575357501016 .sectionTitle1,#mermaid-1575357501016 .sectionTitle2,#mermaid-1575357501016 .sectionTitle3{fill:#333}#mermaid-1575357501016 .sectionTitle{text-anchor:start;font-size:11px;text-height:14px}#mermaid-1575357501016 .grid .tick{stroke:#d3d3d3;opacity:.3;shape-rendering:crispEdges}#mermaid-1575357501016 .grid path{stroke-width:0}#mermaid-1575357501016 .today{fill:none;stroke:red;stroke-width:2px}#mermaid-1575357501016 .task{stroke-width:2}#mermaid-1575357501016 .taskText{text-anchor:middle;font-size:11px}#mermaid-1575357501016 .taskTextOutsideRight{fill:#000;text-anchor:start;font-size:11px}#mermaid-1575357501016 .taskTextOutsideLeft{fill:#000;text-anchor:end;font-size:11px}#mermaid-1575357501016 .taskText0,#mermaid-1575357501016 .taskText1,#mermaid-1575357501016 .taskText2,#mermaid-1575357501016 .taskText3{fill:#fff}#mermaid-1575357501016 .task0,#mermaid-1575357501016 .task1,#mermaid-1575357501016 .task2,#mermaid-1575357501016 .task3{fill:#8a90dd;stroke:#534fbc}#mermaid-1575357501016 .taskTextOutside0,#mermaid-1575357501016 .taskTextOutside1,#mermaid-1575357501016 .taskTextOutside2,#mermaid-1575357501016 .taskTextOutside3{fill:#000}#mermaid-1575357501016 .active0,#mermaid-1575357501016 .active1,#mermaid-1575357501016 .active2,#mermaid-1575357501016 .active3{fill:#bfc7ff;stroke:#534fbc}#mermaid-1575357501016 .activeText0,#mermaid-1575357501016 .activeText1,#mermaid-1575357501016 .activeText2,#mermaid-1575357501016 .activeText3{fill:#000!important}#mermaid-1575357501016 .done0,#mermaid-1575357501016 .done1,#mermaid-1575357501016 .done2,#mermaid-1575357501016 .done3{stroke:grey;fill:#d3d3d3;stroke-width:2}#mermaid-1575357501016 .doneText0,#mermaid-1575357501016 .doneText1,#mermaid-1575357501016 .doneText2,#mermaid-1575357501016 .doneText3{fill:#000!important}#mermaid-1575357501016 .crit0,#mermaid-1575357501016 .crit1,#mermaid-1575357501016 .crit2,#mermaid-1575357501016 .crit3{stroke:#f88;fill:red;stroke-width:2}#mermaid-1575357501016 .activeCrit0,#mermaid-1575357501016 .activeCrit1,#mermaid-1575357501016 .activeCrit2,#mermaid-1575357501016 .activeCrit3{stroke:#f88;fill:#bfc7ff;stroke-width:2}#mermaid-1575357501016 .doneCrit0,#mermaid-1575357501016 .doneCrit1,#mermaid-1575357501016 .doneCrit2,#mermaid-1575357501016 .doneCrit3{stroke:#f88;fill:#d3d3d3;stroke-width:2;cursor:pointer;shape-rendering:crispEdges}#mermaid-1575357501016 .activeCritText0,#mermaid-1575357501016 .activeCritText1,#mermaid-1575357501016 .activeCritText2,#mermaid-1575357501016 .activeCritText3,#mermaid-1575357501016 .doneCritText0,#mermaid-1575357501016 .doneCritText1,#mermaid-1575357501016 .doneCritText2,#mermaid-1575357501016 .doneCritText3{fill:#000!important}#mermaid-1575357501016 .titleText{text-anchor:middle;font-size:18px;fill:#000}#mermaid-1575357501016 g.classGroup text{fill:#9370db;stroke:none;font-family:trebuchet ms,verdana,arial;font-size:10px}#mermaid-1575357501016 g.classGroup rect{fill:#ececff;stroke:#9370db}#mermaid-1575357501016 g.classGroup line{stroke:#9370db;stroke-width:1}#mermaid-1575357501016 .classLabel .box{stroke:none;stroke-width:0;fill:#ececff;opacity:.5}#mermaid-1575357501016 .classLabel .label{fill:#9370db;font-size:10px}#mermaid-1575357501016 .relation{stroke:#9370db;stroke-width:1;fill:none}#mermaid-1575357501016 #compositionEnd,#mermaid-1575357501016 #compositionStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-1575357501016 #aggregationEnd,#mermaid-1575357501016 #aggregationStart{fill:#ececff;stroke:#9370db;stroke-width:1}#mermaid-1575357501016 #dependencyEnd,#mermaid-1575357501016 #dependencyStart,#mermaid-1575357501016 #extensionEnd,#mermaid-1575357501016 #extensionStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-1575357501016 .branch-label,#mermaid-1575357501016 .commit-id,#mermaid-1575357501016 .commit-msg{fill:#d3d3d3;color:#d3d3d3}#mermaid-1575357501016 {


color: rgb(58, 65, 69);


font: normal normal 400 normal 18px / 33.3px “Hiragino Sans GB”, “Heiti SC”, “Microsoft YaHei”, sans-serif, Merriweather, serif;


}InformerEndpointControllerWorkQueuePodListerClientAdd/Update/DeleteServiceAddreturnGetkeysyncServiceListPod(service.Spec.Selector)PodsaddEndpointSubsetloop[ Every Pod ]Create/UpdateEndpointresultloop[ Worker ]InformerEndpointControllerWorkQueuePodListerClient


EndpointController 中的 syncService 方法时用于创建和删除 Endpoint 资源最重要的方法,在这个方法中我们会根据 Service 对象规格中的选择器 Selector 获取集群中存在的所有 Pod,并将 Service 和 Pod 上的端口进行映射生成一个 EndpointPort 结构体:


Go


func (e *EndpointController) syncService(key string) error {  namespace, name, _ := cache.SplitMetaNamespaceKey(key)  service, _ := e.serviceLister.Services(namespace).Get(name)  pods, _ := e.podLister.Pods(service.Namespace).List(labels.Set(service.Spec.Selector).AsSelectorPreValidated())
subsets := []v1.EndpointSubset{} for _, pod := range pods { epa := *podToEndpointAddress(pod)
for i := range service.Spec.Ports { servicePort := &service.Spec.Ports[i]
portName := servicePort.Name portProto := servicePort.Protocol portNum, _ := podutil.FindPort(pod, servicePort)
epp := &v1.EndpointPort{Name: portName, Port: int32(portNum), Protocol: portProto} subsets, _, _ = addEndpointSubset(subsets, pod, epa, epp, tolerateUnreadyEndpoints) } } subsets = endpoints.RepackSubsets(subsets)
currentEndpoints = &v1.Endpoints{ ObjectMeta: metav1.ObjectMeta{ Name: service.Name, Labels: service.Labels, }, }
newEndpoints := currentEndpoints.DeepCopy() newEndpoints.Subsets = subsets newEndpoints.Labels = service.Labels e.client.CoreV1().Endpoints(service.Namespace).Create(newEndpoints)
return nil}
复制代码


对于每一个 Pod 都会生成一个新的 EndpointSubset,其中包含了 Pod 的 IP 地址和端口和 Service 的规格中指定的输入端口和目标端口,在最后 EndpointSubset 的数据会被重新打包并通过客户端创建一个新的 Endpoint 资源。


在上面我们已经提到过,除了 Service 的变动会触发 Endpoint 的改变之外,Pod 对象的增删也会触发 EndpointController 中的回调函数。


#mermaid-1575357501044 .label{font-family:trebuchet ms,verdana,arial;color:#333}#mermaid-1575357501044 .node circle,#mermaid-1575357501044 .node ellipse,#mermaid-1575357501044 .node polygon,#mermaid-1575357501044 .node rect{fill:#ececff;stroke:#9370db;stroke-width:1px}#mermaid-1575357501044 .node.clickable{cursor:pointer}#mermaid-1575357501044 .arrowheadPath{fill:#333}#mermaid-1575357501044 .edgePath .path{stroke:#333;stroke-width:1.5px}#mermaid-1575357501044 .edgeLabel{background-color:#e8e8e8}#mermaid-1575357501044 .cluster rect{fill:#ffffde!important;stroke:#aa3!important;stroke-width:1px!important}#mermaid-1575357501044 .cluster text{fill:#333}#mermaid-1575357501044 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:trebuchet ms,verdana,arial;font-size:12px;background:#ffffde;border:1px solid #aa3;border-radius:2px;pointer-events:none;z-index:100}#mermaid-1575357501044 .actor{stroke:#ccf;fill:#ececff}#mermaid-1575357501044 text.actor{fill:#000;stroke:none}#mermaid-1575357501044 .actor-line{stroke:grey}#mermaid-1575357501044 .messageLine0{marker-end:“url(#arrowhead)”}#mermaid-1575357501044 .messageLine0,#mermaid-1575357501044 .messageLine1{stroke-width:1.5;stroke-dasharray:“2 2”;stroke:#333}#mermaid-1575357501044 #arrowhead{fill:#333}#mermaid-1575357501044 #crosshead path{fill:#333!important;stroke:#333!important}#mermaid-1575357501044 .messageText{fill:#333;stroke:none}#mermaid-1575357501044 .labelBox{stroke:#ccf;fill:#ececff}#mermaid-1575357501044 .labelText,#mermaid-1575357501044 .loopText{fill:#000;stroke:none}#mermaid-1575357501044 .loopLine{stroke-width:2;stroke-dasharray:“2 2”;marker-end:“url(#arrowhead)”;stroke:#ccf}#mermaid-1575357501044 .note{stroke:#aa3;fill:#fff5ad}#mermaid-1575357501044 .noteText{fill:#000;stroke:none;font-family:trebuchet ms,verdana,arial;font-size:14px}#mermaid-1575357501044 .section{stroke:none;opacity:.2}#mermaid-1575357501044 .section0{fill:rgba(102,102,255,.49)}#mermaid-1575357501044 .section2{fill:#fff400}#mermaid-1575357501044 .section1,#mermaid-1575357501044 .section3{fill:#fff;opacity:.2}#mermaid-1575357501044 .sectionTitle0,#mermaid-1575357501044 .sectionTitle1,#mermaid-1575357501044 .sectionTitle2,#mermaid-1575357501044 .sectionTitle3{fill:#333}#mermaid-1575357501044 .sectionTitle{text-anchor:start;font-size:11px;text-height:14px}#mermaid-1575357501044 .grid .tick{stroke:#d3d3d3;opacity:.3;shape-rendering:crispEdges}#mermaid-1575357501044 .grid path{stroke-width:0}#mermaid-1575357501044 .today{fill:none;stroke:red;stroke-width:2px}#mermaid-1575357501044 .task{stroke-width:2}#mermaid-1575357501044 .taskText{text-anchor:middle;font-size:11px}#mermaid-1575357501044 .taskTextOutsideRight{fill:#000;text-anchor:start;font-size:11px}#mermaid-1575357501044 .taskTextOutsideLeft{fill:#000;text-anchor:end;font-size:11px}#mermaid-1575357501044 .taskText0,#mermaid-1575357501044 .taskText1,#mermaid-1575357501044 .taskText2,#mermaid-1575357501044 .taskText3{fill:#fff}#mermaid-1575357501044 .task0,#mermaid-1575357501044 .task1,#mermaid-1575357501044 .task2,#mermaid-1575357501044 .task3{fill:#8a90dd;stroke:#534fbc}#mermaid-1575357501044 .taskTextOutside0,#mermaid-1575357501044 .taskTextOutside1,#mermaid-1575357501044 .taskTextOutside2,#mermaid-1575357501044 .taskTextOutside3{fill:#000}#mermaid-1575357501044 .active0,#mermaid-1575357501044 .active1,#mermaid-1575357501044 .active2,#mermaid-1575357501044 .active3{fill:#bfc7ff;stroke:#534fbc}#mermaid-1575357501044 .activeText0,#mermaid-1575357501044 .activeText1,#mermaid-1575357501044 .activeText2,#mermaid-1575357501044 .activeText3{fill:#000!important}#mermaid-1575357501044 .done0,#mermaid-1575357501044 .done1,#mermaid-1575357501044 .done2,#mermaid-1575357501044 .done3{stroke:grey;fill:#d3d3d3;stroke-width:2}#mermaid-1575357501044 .doneText0,#mermaid-1575357501044 .doneText1,#mermaid-1575357501044 .doneText2,#mermaid-1575357501044 .doneText3{fill:#000!important}#mermaid-1575357501044 .crit0,#mermaid-1575357501044 .crit1,#mermaid-1575357501044 .crit2,#mermaid-1575357501044 .crit3{stroke:#f88;fill:red;stroke-width:2}#mermaid-1575357501044 .activeCrit0,#mermaid-1575357501044 .activeCrit1,#mermaid-1575357501044 .activeCrit2,#mermaid-1575357501044 .activeCrit3{stroke:#f88;fill:#bfc7ff;stroke-width:2}#mermaid-1575357501044 .doneCrit0,#mermaid-1575357501044 .doneCrit1,#mermaid-1575357501044 .doneCrit2,#mermaid-1575357501044 .doneCrit3{stroke:#f88;fill:#d3d3d3;stroke-width:2;cursor:pointer;shape-rendering:crispEdges}#mermaid-1575357501044 .activeCritText0,#mermaid-1575357501044 .activeCritText1,#mermaid-1575357501044 .activeCritText2,#mermaid-1575357501044 .activeCritText3,#mermaid-1575357501044 .doneCritText0,#mermaid-1575357501044 .doneCritText1,#mermaid-1575357501044 .doneCritText2,#mermaid-1575357501044 .doneCritText3{fill:#000!important}#mermaid-1575357501044 .titleText{text-anchor:middle;font-size:18px;fill:#000}#mermaid-1575357501044 g.classGroup text{fill:#9370db;stroke:none;font-family:trebuchet ms,verdana,arial;font-size:10px}#mermaid-1575357501044 g.classGroup rect{fill:#ececff;stroke:#9370db}#mermaid-1575357501044 g.classGroup line{stroke:#9370db;stroke-width:1}#mermaid-1575357501044 .classLabel .box{stroke:none;stroke-width:0;fill:#ececff;opacity:.5}#mermaid-1575357501044 .classLabel .label{fill:#9370db;font-size:10px}#mermaid-1575357501044 .relation{stroke:#9370db;stroke-width:1;fill:none}#mermaid-1575357501044 #compositionEnd,#mermaid-1575357501044 #compositionStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-1575357501044 #aggregationEnd,#mermaid-1575357501044 #aggregationStart{fill:#ececff;stroke:#9370db;stroke-width:1}#mermaid-1575357501044 #dependencyEnd,#mermaid-1575357501044 #dependencyStart,#mermaid-1575357501044 #extensionEnd,#mermaid-1575357501044 #extensionStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-1575357501044 .branch-label,#mermaid-1575357501044 .commit-id,#mermaid-1575357501044 .commit-msg{fill:#d3d3d3;color:#d3d3d3}#mermaid-1575357501044 {


color: rgb(58, 65, 69);


font: normal normal 400 normal 18px / 33.3px “Hiragino Sans GB”, “Heiti SC”, “Microsoft YaHei”, sans-serif, Merriweather, serif;


}InformerEndpointControllerWorkQueueServiceListerAdd/Update/DeletePodGetPodServices[]ServiceAddreturnInformerEndpointControllerWorkQueueServiceLister


getPodServiceMemberships 会获取跟当前 Pod 有关的 Service 对象并将所有的 Service 对象都转换成 <namespace>/<name> 的字符串:


Go


func (e *EndpointController) getPodServiceMemberships(pod *v1.Pod) (sets.String, error) {  set := sets.String{}  services, _ := e.serviceLister.GetPodServices(pod)
for i := range services { key, _ := controller.KeyFunc(services[i]) set.Insert(key) } return set, nil}
复制代码


这些服务最后会被加入 EndpointController 的队列中,等待它持有的几个 Worker 对 Service 进行同步。


这些其实就是 EndpointController 的作用,订阅 Pod 和 Service 对象的变更,并根据当前集群中的对象生成 Endpoint 对象将两者进行关联。

代理

在整个集群中另一个订阅 Service 对象变动的组件就是 kube-proxy 了,每当 kube-proxy 在新的节点上启动时都会初始化一个 ServiceConfig 对象,就像介绍 iptables 代理模式时提到的,这个对象会接受 Service 的变更事件:


#mermaid-1575357501066 .label{font-family:trebuchet ms,verdana,arial;color:#333}#mermaid-1575357501066 .node circle,#mermaid-1575357501066 .node ellipse,#mermaid-1575357501066 .node polygon,#mermaid-1575357501066 .node rect{fill:#ececff;stroke:#9370db;stroke-width:1px}#mermaid-1575357501066 .node.clickable{cursor:pointer}#mermaid-1575357501066 .arrowheadPath{fill:#333}#mermaid-1575357501066 .edgePath .path{stroke:#333;stroke-width:1.5px}#mermaid-1575357501066 .edgeLabel{background-color:#e8e8e8}#mermaid-1575357501066 .cluster rect{fill:#ffffde!important;stroke:#aa3!important;stroke-width:1px!important}#mermaid-1575357501066 .cluster text{fill:#333}#mermaid-1575357501066 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:trebuchet ms,verdana,arial;font-size:12px;background:#ffffde;border:1px solid #aa3;border-radius:2px;pointer-events:none;z-index:100}#mermaid-1575357501066 .actor{stroke:#ccf;fill:#ececff}#mermaid-1575357501066 text.actor{fill:#000;stroke:none}#mermaid-1575357501066 .actor-line{stroke:grey}#mermaid-1575357501066 .messageLine0{marker-end:“url(#arrowhead)”}#mermaid-1575357501066 .messageLine0,#mermaid-1575357501066 .messageLine1{stroke-width:1.5;stroke-dasharray:“2 2”;stroke:#333}#mermaid-1575357501066 #arrowhead{fill:#333}#mermaid-1575357501066 #crosshead path{fill:#333!important;stroke:#333!important}#mermaid-1575357501066 .messageText{fill:#333;stroke:none}#mermaid-1575357501066 .labelBox{stroke:#ccf;fill:#ececff}#mermaid-1575357501066 .labelText,#mermaid-1575357501066 .loopText{fill:#000;stroke:none}#mermaid-1575357501066 .loopLine{stroke-width:2;stroke-dasharray:“2 2”;marker-end:“url(#arrowhead)”;stroke:#ccf}#mermaid-1575357501066 .note{stroke:#aa3;fill:#fff5ad}#mermaid-1575357501066 .noteText{fill:#000;stroke:none;font-family:trebuchet ms,verdana,arial;font-size:14px}#mermaid-1575357501066 .section{stroke:none;opacity:.2}#mermaid-1575357501066 .section0{fill:rgba(102,102,255,.49)}#mermaid-1575357501066 .section2{fill:#fff400}#mermaid-1575357501066 .section1,#mermaid-1575357501066 .section3{fill:#fff;opacity:.2}#mermaid-1575357501066 .sectionTitle0,#mermaid-1575357501066 .sectionTitle1,#mermaid-1575357501066 .sectionTitle2,#mermaid-1575357501066 .sectionTitle3{fill:#333}#mermaid-1575357501066 .sectionTitle{text-anchor:start;font-size:11px;text-height:14px}#mermaid-1575357501066 .grid .tick{stroke:#d3d3d3;opacity:.3;shape-rendering:crispEdges}#mermaid-1575357501066 .grid path{stroke-width:0}#mermaid-1575357501066 .today{fill:none;stroke:red;stroke-width:2px}#mermaid-1575357501066 .task{stroke-width:2}#mermaid-1575357501066 .taskText{text-anchor:middle;font-size:11px}#mermaid-1575357501066 .taskTextOutsideRight{fill:#000;text-anchor:start;font-size:11px}#mermaid-1575357501066 .taskTextOutsideLeft{fill:#000;text-anchor:end;font-size:11px}#mermaid-1575357501066 .taskText0,#mermaid-1575357501066 .taskText1,#mermaid-1575357501066 .taskText2,#mermaid-1575357501066 .taskText3{fill:#fff}#mermaid-1575357501066 .task0,#mermaid-1575357501066 .task1,#mermaid-1575357501066 .task2,#mermaid-1575357501066 .task3{fill:#8a90dd;stroke:#534fbc}#mermaid-1575357501066 .taskTextOutside0,#mermaid-1575357501066 .taskTextOutside1,#mermaid-1575357501066 .taskTextOutside2,#mermaid-1575357501066 .taskTextOutside3{fill:#000}#mermaid-1575357501066 .active0,#mermaid-1575357501066 .active1,#mermaid-1575357501066 .active2,#mermaid-1575357501066 .active3{fill:#bfc7ff;stroke:#534fbc}#mermaid-1575357501066 .activeText0,#mermaid-1575357501066 .activeText1,#mermaid-1575357501066 .activeText2,#mermaid-1575357501066 .activeText3{fill:#000!important}#mermaid-1575357501066 .done0,#mermaid-1575357501066 .done1,#mermaid-1575357501066 .done2,#mermaid-1575357501066 .done3{stroke:grey;fill:#d3d3d3;stroke-width:2}#mermaid-1575357501066 .doneText0,#mermaid-1575357501066 .doneText1,#mermaid-1575357501066 .doneText2,#mermaid-1575357501066 .doneText3{fill:#000!important}#mermaid-1575357501066 .crit0,#mermaid-1575357501066 .crit1,#mermaid-1575357501066 .crit2,#mermaid-1575357501066 .crit3{stroke:#f88;fill:red;stroke-width:2}#mermaid-1575357501066 .activeCrit0,#mermaid-1575357501066 .activeCrit1,#mermaid-1575357501066 .activeCrit2,#mermaid-1575357501066 .activeCrit3{stroke:#f88;fill:#bfc7ff;stroke-width:2}#mermaid-1575357501066 .doneCrit0,#mermaid-1575357501066 .doneCrit1,#mermaid-1575357501066 .doneCrit2,#mermaid-1575357501066 .doneCrit3{stroke:#f88;fill:#d3d3d3;stroke-width:2;cursor:pointer;shape-rendering:crispEdges}#mermaid-1575357501066 .activeCritText0,#mermaid-1575357501066 .activeCritText1,#mermaid-1575357501066 .activeCritText2,#mermaid-1575357501066 .activeCritText3,#mermaid-1575357501066 .doneCritText0,#mermaid-1575357501066 .doneCritText1,#mermaid-1575357501066 .doneCritText2,#mermaid-1575357501066 .doneCritText3{fill:#000!important}#mermaid-1575357501066 .titleText{text-anchor:middle;font-size:18px;fill:#000}#mermaid-1575357501066 g.classGroup text{fill:#9370db;stroke:none;font-family:trebuchet ms,verdana,arial;font-size:10px}#mermaid-1575357501066 g.classGroup rect{fill:#ececff;stroke:#9370db}#mermaid-1575357501066 g.classGroup line{stroke:#9370db;stroke-width:1}#mermaid-1575357501066 .classLabel .box{stroke:none;stroke-width:0;fill:#ececff;opacity:.5}#mermaid-1575357501066 .classLabel .label{fill:#9370db;font-size:10px}#mermaid-1575357501066 .relation{stroke:#9370db;stroke-width:1;fill:none}#mermaid-1575357501066 #compositionEnd,#mermaid-1575357501066 #compositionStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-1575357501066 #aggregationEnd,#mermaid-1575357501066 #aggregationStart{fill:#ececff;stroke:#9370db;stroke-width:1}#mermaid-1575357501066 #dependencyEnd,#mermaid-1575357501066 #dependencyStart,#mermaid-1575357501066 #extensionEnd,#mermaid-1575357501066 #extensionStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-1575357501066 .branch-label,#mermaid-1575357501066 .commit-id,#mermaid-1575357501066 .commit-msg{fill:#d3d3d3;color:#d3d3d3}#mermaid-1575357501066 {


color: rgb(58, 65, 69);


font: normal normal 400 normal 18px / 33.3px “Hiragino Sans GB”, “Heiti SC”, “Microsoft YaHei”, sans-serif, Merriweather, serif;


}ServiceChangeTrackerServiceConfigProxierEndpointConfigEndpointChangeTrackerSyncRunnerOnServiceAdd/Update/Delete/SyncedUpdateReturn ServiceMapOnEndpointsAdd/Update/Delete/SyncedReturn EndpointMapUpdatesyncProxyRulesloop[ Every minSyncPeriod ~ syncPeriod ]ServiceChangeTrackerServiceConfigProxierEndpointConfigEndpointChangeTrackerSyncRunner


这些变更事件都会被订阅了集群中对象变动的 ServiceConfigEndpointConfig 对象推送给启动的 Proxier 实例:


Go


func (c *ServiceConfig) handleAddService(obj interface{}) {  service, ok := obj.(*v1.Service)  if !ok {    return  }  for i := range c.eventHandlers {    c.eventHandlers[i].OnServiceAdd(service)  }}
复制代码


收到事件变动的 Proxier 实例随后会根据启动时的配置更新 iptables 或者 ipvs 中的规则,这些应用最终会负责对进出的流量进行转发并完成一些负载均衡相关的任务。

代理模式

在 Kubernetes 集群中的每一个节点都运行着一个 kube-proxy 进程,这个进程会负责监听 Kubernetes 主节点中 Service 的增加和删除事件并修改运行代理的配置,为节点内的客户端提供流量的转发和负载均衡等功能,但是当前 kube-proxy 的代理模式目前来看有三种:



这三种代理模式中的第一种 userspace 其实就是运行在用户空间代理,所有的流量最终都会通过 kube-proxy 本身转发给其他的服务,后两种 iptable 和 ipvs 都运行在内核空间能够为 Kubernetes 集群提供更加强大的性能支持。

userspace

作为运行在用户空间的代理,对于每一个 Service 都会在当前的节点上开启一个端口,所有连接到当前代理端口的请求都会被转发到 Service 背后的一组 Pod 上,它其实会在节点上添加 iptables 规则,通过 iptables 将流量转发给 kube-proxy 处理。


如果当前节点上的 kube-proxy 在启动时选择了 userspace 模式,那么每当有新的 Service 被创建时,kube-proxy 就会增加一条 iptables 记录并启动一个 Goroutine,前者用于将节点中服务对外发出的流量转发给 kube-proxy,再由后者持有的一系列 Goroutine 将流量转发到目标的 Pod 上。



这一系列的工作大都是在 OnServiceAdd 被触发时中完成的,正如上面所说的,该方法会调用 mergeService 将传入服务 Service 的端口变成一条 iptables 的配置命令为当前节点增加一条规则,同时在 addServiceOnPort 方法中启动一个 TCP 或 UDP 的 Socket:


Go


func (proxier *Proxier) mergeService(service *v1.Service) sets.String {  svcName := types.NamespacedName{Namespace: service.Namespace, Name: service.Name}  existingPorts := sets.NewString()  for i := range service.Spec.Ports {    servicePort := &service.Spec.Ports[i]    serviceName := proxy.ServicePortName{NamespacedName: svcName, Port: servicePort.Name}    existingPorts.Insert(servicePort.Name)    info, exists := proxier.getServiceInfo(serviceName)    if exists {      proxier.closePortal(serviceName, info)      proxier.stopProxy(serviceName, info)    }    proxyPort,  := proxier.proxyPorts.AllocateNext()
serviceIP := net.ParseIP(service.Spec.ClusterIP) info, _ = proxier.addServiceOnPort(serviceName, servicePort.Protocol, proxyPort, proxier.udpIdleTimeout) info.portal.ip = serviceIP info.portal.port = int(servicePort.Port) info.externalIPs = service.Spec.ExternalIPs info.loadBalancerStatus = *service.Status.LoadBalancer.DeepCopy() info.nodePort = int(servicePort.NodePort) info.sessionAffinityType = service.Spec.SessionAffinity
proxier.openPortal(serviceName, info) proxier.loadBalancer.NewService(serviceName, info.sessionAffinityType, info.stickyMaxAgeSeconds) }
return existingPorts}
复制代码


这个启动的进程会监听同一个节点上,转发自所有进程的 TCP 和 UDP 请求并将这些数据包发送给目标的 Pod 对象。


在用户空间模式中,如果一个连接被目标服务拒绝,我们的代理服务能够重新尝试连接其他的服务,除此之外用户空间模式并没有太多的优势。

iptables

另一种常见的代理模式就是直接使用 iptables 转发当前节点上的全部流量,这种脱离了用户空间在内核空间中实现转发的方式能够极大地提高 proxy 的效率,增加 kube-proxy 的吞吐量。



iptables 作为一种代理模式,它同样实现了 OnServiceUpdateOnEndpointsUpdate 等方法,这两个方法会分别调用相应的变更追踪对象。


#mermaid-1575357501090 .label{font-family:trebuchet ms,verdana,arial;color:#333}#mermaid-1575357501090 .node circle,#mermaid-1575357501090 .node ellipse,#mermaid-1575357501090 .node polygon,#mermaid-1575357501090 .node rect{fill:#ececff;stroke:#9370db;stroke-width:1px}#mermaid-1575357501090 .node.clickable{cursor:pointer}#mermaid-1575357501090 .arrowheadPath{fill:#333}#mermaid-1575357501090 .edgePath .path{stroke:#333;stroke-width:1.5px}#mermaid-1575357501090 .edgeLabel{background-color:#e8e8e8}#mermaid-1575357501090 .cluster rect{fill:#ffffde!important;stroke:#aa3!important;stroke-width:1px!important}#mermaid-1575357501090 .cluster text{fill:#333}#mermaid-1575357501090 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:trebuchet ms,verdana,arial;font-size:12px;background:#ffffde;border:1px solid #aa3;border-radius:2px;pointer-events:none;z-index:100}#mermaid-1575357501090 .actor{stroke:#ccf;fill:#ececff}#mermaid-1575357501090 text.actor{fill:#000;stroke:none}#mermaid-1575357501090 .actor-line{stroke:grey}#mermaid-1575357501090 .messageLine0{marker-end:“url(#arrowhead)”}#mermaid-1575357501090 .messageLine0,#mermaid-1575357501090 .messageLine1{stroke-width:1.5;stroke-dasharray:“2 2”;stroke:#333}#mermaid-1575357501090 #arrowhead{fill:#333}#mermaid-1575357501090 #crosshead path{fill:#333!important;stroke:#333!important}#mermaid-1575357501090 .messageText{fill:#333;stroke:none}#mermaid-1575357501090 .labelBox{stroke:#ccf;fill:#ececff}#mermaid-1575357501090 .labelText,#mermaid-1575357501090 .loopText{fill:#000;stroke:none}#mermaid-1575357501090 .loopLine{stroke-width:2;stroke-dasharray:“2 2”;marker-end:“url(#arrowhead)”;stroke:#ccf}#mermaid-1575357501090 .note{stroke:#aa3;fill:#fff5ad}#mermaid-1575357501090 .noteText{fill:#000;stroke:none;font-family:trebuchet ms,verdana,arial;font-size:14px}#mermaid-1575357501090 .section{stroke:none;opacity:.2}#mermaid-1575357501090 .section0{fill:rgba(102,102,255,.49)}#mermaid-1575357501090 .section2{fill:#fff400}#mermaid-1575357501090 .section1,#mermaid-1575357501090 .section3{fill:#fff;opacity:.2}#mermaid-1575357501090 .sectionTitle0,#mermaid-1575357501090 .sectionTitle1,#mermaid-1575357501090 .sectionTitle2,#mermaid-1575357501090 .sectionTitle3{fill:#333}#mermaid-1575357501090 .sectionTitle{text-anchor:start;font-size:11px;text-height:14px}#mermaid-1575357501090 .grid .tick{stroke:#d3d3d3;opacity:.3;shape-rendering:crispEdges}#mermaid-1575357501090 .grid path{stroke-width:0}#mermaid-1575357501090 .today{fill:none;stroke:red;stroke-width:2px}#mermaid-1575357501090 .task{stroke-width:2}#mermaid-1575357501090 .taskText{text-anchor:middle;font-size:11px}#mermaid-1575357501090 .taskTextOutsideRight{fill:#000;text-anchor:start;font-size:11px}#mermaid-1575357501090 .taskTextOutsideLeft{fill:#000;text-anchor:end;font-size:11px}#mermaid-1575357501090 .taskText0,#mermaid-1575357501090 .taskText1,#mermaid-1575357501090 .taskText2,#mermaid-1575357501090 .taskText3{fill:#fff}#mermaid-1575357501090 .task0,#mermaid-1575357501090 .task1,#mermaid-1575357501090 .task2,#mermaid-1575357501090 .task3{fill:#8a90dd;stroke:#534fbc}#mermaid-1575357501090 .taskTextOutside0,#mermaid-1575357501090 .taskTextOutside1,#mermaid-1575357501090 .taskTextOutside2,#mermaid-1575357501090 .taskTextOutside3{fill:#000}#mermaid-1575357501090 .active0,#mermaid-1575357501090 .active1,#mermaid-1575357501090 .active2,#mermaid-1575357501090 .active3{fill:#bfc7ff;stroke:#534fbc}#mermaid-1575357501090 .activeText0,#mermaid-1575357501090 .activeText1,#mermaid-1575357501090 .activeText2,#mermaid-1575357501090 .activeText3{fill:#000!important}#mermaid-1575357501090 .done0,#mermaid-1575357501090 .done1,#mermaid-1575357501090 .done2,#mermaid-1575357501090 .done3{stroke:grey;fill:#d3d3d3;stroke-width:2}#mermaid-1575357501090 .doneText0,#mermaid-1575357501090 .doneText1,#mermaid-1575357501090 .doneText2,#mermaid-1575357501090 .doneText3{fill:#000!important}#mermaid-1575357501090 .crit0,#mermaid-1575357501090 .crit1,#mermaid-1575357501090 .crit2,#mermaid-1575357501090 .crit3{stroke:#f88;fill:red;stroke-width:2}#mermaid-1575357501090 .activeCrit0,#mermaid-1575357501090 .activeCrit1,#mermaid-1575357501090 .activeCrit2,#mermaid-1575357501090 .activeCrit3{stroke:#f88;fill:#bfc7ff;stroke-width:2}#mermaid-1575357501090 .doneCrit0,#mermaid-1575357501090 .doneCrit1,#mermaid-1575357501090 .doneCrit2,#mermaid-1575357501090 .doneCrit3{stroke:#f88;fill:#d3d3d3;stroke-width:2;cursor:pointer;shape-rendering:crispEdges}#mermaid-1575357501090 .activeCritText0,#mermaid-1575357501090 .activeCritText1,#mermaid-1575357501090 .activeCritText2,#mermaid-1575357501090 .activeCritText3,#mermaid-1575357501090 .doneCritText0,#mermaid-1575357501090 .doneCritText1,#mermaid-1575357501090 .doneCritText2,#mermaid-1575357501090 .doneCritText3{fill:#000!important}#mermaid-1575357501090 .titleText{text-anchor:middle;font-size:18px;fill:#000}#mermaid-1575357501090 g.classGroup text{fill:#9370db;stroke:none;font-family:trebuchet ms,verdana,arial;font-size:10px}#mermaid-1575357501090 g.classGroup rect{fill:#ececff;stroke:#9370db}#mermaid-1575357501090 g.classGroup line{stroke:#9370db;stroke-width:1}#mermaid-1575357501090 .classLabel .box{stroke:none;stroke-width:0;fill:#ececff;opacity:.5}#mermaid-1575357501090 .classLabel .label{fill:#9370db;font-size:10px}#mermaid-1575357501090 .relation{stroke:#9370db;stroke-width:1;fill:none}#mermaid-1575357501090 #compositionEnd,#mermaid-1575357501090 #compositionStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-1575357501090 #aggregationEnd,#mermaid-1575357501090 #aggregationStart{fill:#ececff;stroke:#9370db;stroke-width:1}#mermaid-1575357501090 #dependencyEnd,#mermaid-1575357501090 #dependencyStart,#mermaid-1575357501090 #extensionEnd,#mermaid-1575357501090 #extensionStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-1575357501090 .branch-label,#mermaid-1575357501090 .commit-id,#mermaid-1575357501090 .commit-msg{fill:#d3d3d3;color:#d3d3d3}#mermaid-1575357501090 {


color: rgb(58, 65, 69);


font: normal normal 400 normal 18px / 33.3px “Hiragino Sans GB”, “Heiti SC”, “Microsoft YaHei”, sans-serif, Merriweather, serif;


}ServiceConfigProxierServiceChangeTrackerSyncRunneriptableOnServiceAddOnServiceUpdateUpdateReturn ServiceMapsyncProxyRulesUpdateChainwriteLine x NRestoreAllloop[ Every minSyncPeriod ~ syncPeriod ]ServiceConfigProxierServiceChangeTrackerSyncRunneriptable


变更追踪对象会根据 ServiceEndpoint 对象的前后变化改变 ServiceChangeTracker 本身的状态,这些变更会每隔一段时间通过一个 700 行的巨大方法 syncProxyRules 同步,在这里就不介绍这个方法的具体实现了,它的主要功能就是根据 ServiceEndpoint 对象的变更生成一条一条的 iptables 规则,比较感兴趣的读者,可以点击 proxier.go#L640-1379 查看代码。


当我们使用 iptables 的方式启动节点上的代理时,所有的流量都会先经过 PREROUTING 或者 OUTPUT 链,随后进入 Kubernetes 自定义的链入口 KUBE-SERVICES、单个 Service 对应的链 KUBE-SVC-XXXX 以及每个 Pod 对应的链 KUBE-SEP-XXXX,经过这些链的处理,最终才能够访问当一个服务的真实 IP 地址。


虽然相比于用户空间来说,直接运行在内核态的 iptables 能够增加代理的吞吐量,但是当集群中的节点数量非常多时,iptables 并不能达到生产级别的可用性要求,每次对规则进行匹配时都会遍历 iptables 中的所有 Service 链。


规则的更新也不是增量式的,当集群中的 Service 达到 5,000 个,每增加一条规则都需要耗时 11min,当集群中的 Service 达到 20,000 个时,每增加一条规则都需要消耗 5h 的时间,这也就是告诉我们在大规模集群中使用 iptables 作为代理模式是完全不可用的。

ipvs

ipvs 就是用于解决在大量 Service 时,iptables 规则同步变得不可用的性能问题。与 iptables 比较像的是,ipvs 的实现虽然也基于 netfilter 的钩子函数,但是它却使用哈希表作为底层的数据结构并且工作在内核态,这也就是说 ipvs 在重定向流量和同步代理规则有着更好的性能。



在处理 Service 的变化时,ipvs 包和 iptables 其实就有非常相似了,它们都同样使用 ServiceChangeTracker 对象来追踪变更,只是两者对于同步变更的方法 syncProxyRules 实现上有一些不同。


#mermaid-1575357501113 .label{font-family:trebuchet ms,verdana,arial;color:#333}#mermaid-1575357501113 .node circle,#mermaid-1575357501113 .node ellipse,#mermaid-1575357501113 .node polygon,#mermaid-1575357501113 .node rect{fill:#ececff;stroke:#9370db;stroke-width:1px}#mermaid-1575357501113 .node.clickable{cursor:pointer}#mermaid-1575357501113 .arrowheadPath{fill:#333}#mermaid-1575357501113 .edgePath .path{stroke:#333;stroke-width:1.5px}#mermaid-1575357501113 .edgeLabel{background-color:#e8e8e8}#mermaid-1575357501113 .cluster rect{fill:#ffffde!important;stroke:#aa3!important;stroke-width:1px!important}#mermaid-1575357501113 .cluster text{fill:#333}#mermaid-1575357501113 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:trebuchet ms,verdana,arial;font-size:12px;background:#ffffde;border:1px solid #aa3;border-radius:2px;pointer-events:none;z-index:100}#mermaid-1575357501113 .actor{stroke:#ccf;fill:#ececff}#mermaid-1575357501113 text.actor{fill:#000;stroke:none}#mermaid-1575357501113 .actor-line{stroke:grey}#mermaid-1575357501113 .messageLine0{marker-end:“url(#arrowhead)”}#mermaid-1575357501113 .messageLine0,#mermaid-1575357501113 .messageLine1{stroke-width:1.5;stroke-dasharray:“2 2”;stroke:#333}#mermaid-1575357501113 #arrowhead{fill:#333}#mermaid-1575357501113 #crosshead path{fill:#333!important;stroke:#333!important}#mermaid-1575357501113 .messageText{fill:#333;stroke:none}#mermaid-1575357501113 .labelBox{stroke:#ccf;fill:#ececff}#mermaid-1575357501113 .labelText,#mermaid-1575357501113 .loopText{fill:#000;stroke:none}#mermaid-1575357501113 .loopLine{stroke-width:2;stroke-dasharray:“2 2”;marker-end:“url(#arrowhead)”;stroke:#ccf}#mermaid-1575357501113 .note{stroke:#aa3;fill:#fff5ad}#mermaid-1575357501113 .noteText{fill:#000;stroke:none;font-family:trebuchet ms,verdana,arial;font-size:14px}#mermaid-1575357501113 .section{stroke:none;opacity:.2}#mermaid-1575357501113 .section0{fill:rgba(102,102,255,.49)}#mermaid-1575357501113 .section2{fill:#fff400}#mermaid-1575357501113 .section1,#mermaid-1575357501113 .section3{fill:#fff;opacity:.2}#mermaid-1575357501113 .sectionTitle0,#mermaid-1575357501113 .sectionTitle1,#mermaid-1575357501113 .sectionTitle2,#mermaid-1575357501113 .sectionTitle3{fill:#333}#mermaid-1575357501113 .sectionTitle{text-anchor:start;font-size:11px;text-height:14px}#mermaid-1575357501113 .grid .tick{stroke:#d3d3d3;opacity:.3;shape-rendering:crispEdges}#mermaid-1575357501113 .grid path{stroke-width:0}#mermaid-1575357501113 .today{fill:none;stroke:red;stroke-width:2px}#mermaid-1575357501113 .task{stroke-width:2}#mermaid-1575357501113 .taskText{text-anchor:middle;font-size:11px}#mermaid-1575357501113 .taskTextOutsideRight{fill:#000;text-anchor:start;font-size:11px}#mermaid-1575357501113 .taskTextOutsideLeft{fill:#000;text-anchor:end;font-size:11px}#mermaid-1575357501113 .taskText0,#mermaid-1575357501113 .taskText1,#mermaid-1575357501113 .taskText2,#mermaid-1575357501113 .taskText3{fill:#fff}#mermaid-1575357501113 .task0,#mermaid-1575357501113 .task1,#mermaid-1575357501113 .task2,#mermaid-1575357501113 .task3{fill:#8a90dd;stroke:#534fbc}#mermaid-1575357501113 .taskTextOutside0,#mermaid-1575357501113 .taskTextOutside1,#mermaid-1575357501113 .taskTextOutside2,#mermaid-1575357501113 .taskTextOutside3{fill:#000}#mermaid-1575357501113 .active0,#mermaid-1575357501113 .active1,#mermaid-1575357501113 .active2,#mermaid-1575357501113 .active3{fill:#bfc7ff;stroke:#534fbc}#mermaid-1575357501113 .activeText0,#mermaid-1575357501113 .activeText1,#mermaid-1575357501113 .activeText2,#mermaid-1575357501113 .activeText3{fill:#000!important}#mermaid-1575357501113 .done0,#mermaid-1575357501113 .done1,#mermaid-1575357501113 .done2,#mermaid-1575357501113 .done3{stroke:grey;fill:#d3d3d3;stroke-width:2}#mermaid-1575357501113 .doneText0,#mermaid-1575357501113 .doneText1,#mermaid-1575357501113 .doneText2,#mermaid-1575357501113 .doneText3{fill:#000!important}#mermaid-1575357501113 .crit0,#mermaid-1575357501113 .crit1,#mermaid-1575357501113 .crit2,#mermaid-1575357501113 .crit3{stroke:#f88;fill:red;stroke-width:2}#mermaid-1575357501113 .activeCrit0,#mermaid-1575357501113 .activeCrit1,#mermaid-1575357501113 .activeCrit2,#mermaid-1575357501113 .activeCrit3{stroke:#f88;fill:#bfc7ff;stroke-width:2}#mermaid-1575357501113 .doneCrit0,#mermaid-1575357501113 .doneCrit1,#mermaid-1575357501113 .doneCrit2,#mermaid-1575357501113 .doneCrit3{stroke:#f88;fill:#d3d3d3;stroke-width:2;cursor:pointer;shape-rendering:crispEdges}#mermaid-1575357501113 .activeCritText0,#mermaid-1575357501113 .activeCritText1,#mermaid-1575357501113 .activeCritText2,#mermaid-1575357501113 .activeCritText3,#mermaid-1575357501113 .doneCritText0,#mermaid-1575357501113 .doneCritText1,#mermaid-1575357501113 .doneCritText2,#mermaid-1575357501113 .doneCritText3{fill:#000!important}#mermaid-1575357501113 .titleText{text-anchor:middle;font-size:18px;fill:#000}#mermaid-1575357501113 g.classGroup text{fill:#9370db;stroke:none;font-family:trebuchet ms,verdana,arial;font-size:10px}#mermaid-1575357501113 g.classGroup rect{fill:#ececff;stroke:#9370db}#mermaid-1575357501113 g.classGroup line{stroke:#9370db;stroke-width:1}#mermaid-1575357501113 .classLabel .box{stroke:none;stroke-width:0;fill:#ececff;opacity:.5}#mermaid-1575357501113 .classLabel .label{fill:#9370db;font-size:10px}#mermaid-1575357501113 .relation{stroke:#9370db;stroke-width:1;fill:none}#mermaid-1575357501113 #compositionEnd,#mermaid-1575357501113 #compositionStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-1575357501113 #aggregationEnd,#mermaid-1575357501113 #aggregationStart{fill:#ececff;stroke:#9370db;stroke-width:1}#mermaid-1575357501113 #dependencyEnd,#mermaid-1575357501113 #dependencyStart,#mermaid-1575357501113 #extensionEnd,#mermaid-1575357501113 #extensionStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-1575357501113 .branch-label,#mermaid-1575357501113 .commit-id,#mermaid-1575357501113 .commit-msg{fill:#d3d3d3;color:#d3d3d3}#mermaid-1575357501113 {


color: rgb(58, 65, 69);


font: normal normal 400 normal 18px / 33.3px “Hiragino Sans GB”, “Heiti SC”, “Microsoft YaHei”, sans-serif, Merriweather, serif;


}ProxierSyncRunneripvsiptablesyncProxyRuleswriteLine(iptable)Add/UpdateVirtualServer(syncService)resultAddRealServer(syncEndpoint)resultRestoreAllloop[ Every minSyncPeriod ~ syncPeriod ]ProxierSyncRunneripvsiptable


我们从 ipvs 的源代码和上述的时序图中可以看到,Kubernetes ipvs 的实现其实是依赖于 iptables 的,后者能够辅助它完成一些功能,使用 ipvs 相比 iptables 能够减少节点上的 iptables 规则数量,这也是因为 ipvs 接管了原来存储在 iptables 中的规则。


除了能够提升性能之外,ipvs 也提供了多种类型的负载均衡算法,除了最常见的 Round-Robin 之外,还支持最小连接、目标哈希、最小延迟等算法,能够很好地提升负载均衡的效率。

小结

三种不同的代理模式其实是一个逐渐演化的过程,从最开始运行在用户空间需要『手动』监听端口并对数据包进行转发的用户空间模式,到之后使用运行在内核空间的 iptables 模式,再到 Kubernetes 1.9 版本中出现的 ipvs 模式,几种不同的模式在大量 Service 存在时有数量级别效率差异,而最新的 Kubernetes 中已经使用 ipvs 作为 kube-proxy 的默认代理模式。

总结

Kubernetes 中的 Service 将一组 Pod 以统一的形式对外暴露成一个服务,它利用运行在内核空间的 iptables 或者 ipvs 高效地转发来自节点内部和外部的流量。除此之外,作为非常重要的 Kubernetes 对象,Service 不仅在逻辑上提供了微服务的概念,还引入 LoadBalancer 类型的 Service 无缝对接云服务商提供的复杂资源。


理解 Kubernetes 的 Service 对象能够帮助我们梳理集群内部的网络拓扑关系,也能让我们更清楚它是如何在集群内部实现服务发现、负载均衡等功能的,在后面的文章中我们会展开介绍 kube-proxy 的作用和实现。

相关文章

References


**本文转载自 Draveness 技术博客。


原文链接:https://draveness.me/kubernetes-service


2019-12-03 15:22718

评论

发布
暂无评论
发现更多内容

HyperDock for Mac(mac窗口调整工具)v1.8.0.10中文激活版

mac

苹果mac Windows软件下载 HyperDock 窗口调整工具

如何快速完成PostgreSQL数据迁移?|NineData

NineData

postgresql 数据迁移 不停机发布 NineData 结构迁移

校源行丨开放原子开源基金会赴厦门大学访问交流

开放原子开源基金会

开源 校源行

天翼云加速落地紫金DPU实践应用,让算力供给更高效!

天翼云开发者社区

云计算

安徽阜阳是几线城市?有正规等级保护测评机构吗?

行云管家

等保 等级保护 等保测评机构 阜阳

小灯塔系列-中小企业数字化转型系列研究——项目管理测评报告

向量智库

程序员如何利用低代码平台提升软件开发效率?

互联网工科生

程序员 低代码 PaaS 开发工具 开发效率

在软件开发领域寻找更安全的众包平台?YesPMP助您无忧!

知者如C

TiDB Bot:用 Generative AI 构建企业专属的用户助手机器人

PingCAP

人工智能 数据库 AI TiDB

Ethereum第一笔转账

FunTester

全链路Trace全量存储-重造索引

乘云数字DataBuff

OpenAtom OpenHarmony携千行百业创新成果亮相HDC.Together 2023

开放原子开源基金会

开源 OpenHarmony

Nginx 基本原理与最小配置

timerring

nginx

WAVE SUMMIT2023六大分会场同步开启,飞桨+文心大模型加速区域产业智能化!

飞桨PaddlePaddle

人工智能 paddle 百度飞桨

细数应用软件的缺陷分类

华为云开发者联盟

后端 开发 华为云 华为云开发者联盟 企业号 8 月 PK 榜

制造执行系统(MES)在新能源领域的应用

万界星空科技

新能源 新能源行业

LED电子显示屏幕如何计算它的面积

Dylan

广告 交通 LED显示屏 全彩LED显示屏 体育

企业大数据分析系统可以给企业主带来哪些帮助?

夜雨微澜

小灯塔系列-中小企业数字化转型系列研究——任务管理测评报告

向量智库

华为携手华中地区5大高校倡议共建湖北省高性能计算研究院建设

彭飞

小灯塔系列-中小企业数字化转型系列研究——企业网盘测评报告

向量智库

开源软件下游分发合规性讨论 ——“心寄源”法律沙龙(2023第四期 | 总第九期)成功召开

开放原子开源基金会

开源

数仓中典型的几种不下推语句整改案例

华为云开发者联盟

数据库 后端 华为云 华为云开发者联盟 企业号 8 月 PK 榜

HarmonyOS NEXT新能力,一站式高效开发HarmonyOS应用

HarmonyOS开发者

HarmonyOS

低代码平台:简化软件开发步骤,让开发更简单

高端章鱼哥

软件开发 低代码 JNPF 开发方式

如何基于 ACK Serverless 快速部署 AI 推理服务

阿里巴巴云原生

阿里云 Serverless 容器 云原生 Serverless Kubernetes

TiDB v7.1.0 跨业务系统多租户解决方案

PingCAP

MySQL 数据库 多租户 TiDB

融云荣获「2023 中国数字生态通信领军企业」奖

融云 RongCloud

互联网 通信 数字 融云 AIGC

对标数据科学家,直面AI浪潮丨和鲸助力中国石油大学(华东)理学院,打造有学科特色的数据科学与大数据专业

ModelWhale

大数据 人才培养 数据科学 高等教育 数据科学家

“一日之际在于晨”,欢迎莅临WAVE SUMMIT上午场:Arm 虚拟硬件早餐交流会

飞桨PaddlePaddle

人工智能 paddle 百度飞桨 硬件生态

大模型时代,如何重塑AI人才的培养?知名高校专家为您解答

飞桨PaddlePaddle

人工智能 paddle 百度飞桨

详解 Kubernetes Service 的实现原理_语言 & 开发_Draveness_InfoQ精选文章