经验首页 前端设计 程序设计 Java相关 移动开发 数据库/运维 软件/图像 大数据/云计算 其他经验
当前位置:技术经验 » 数据库/运维 » Kubernetes » 查看文章
Velero系列文章(四):使用Velero进行生产迁移实战
来源:cnblogs  作者:东风微鸣  时间:2022/12/12 9:51:40  对本文有异议

概述

目的

通过 velero 工具, 实现以下整体目标:

  • 特定 namespace 在B A两个集群间做迁移;

具体目标为:

  1. 在B A集群上创建 velero (包括 restic )
  2. 备份 B集群 特定 namespace : caseycui2020:
    1. 备份resources - 如deployments, configmaps等;
      1. 备份前, 排除特定secrets的yaml.
    2. 备份volume数据; (通过restic实现)
      1. 通过"选择性启用" 的方式, 只备份特定的pod volume
  3. 迁移特定 namespace 到 A集群 : caseycui2020:
    1. 迁移resources - 通过include的方式, 仅迁移特定resources;
    2. 迁移volume数据. (通过restic 实现)

安装

  1. 在您的本地目录中创建特定于Velero的凭证文件(credentials-velero):

    使用的是xsky的对象存储: (公司的netapp的对象存储不兼容)

    1. [default]
    2. aws_access_key_id = xxxxxxxxxxxxxxxxxxxxxxxx
    3. aws_secret_access_key = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  2. (openshift) 需要先创建 namespace : velero: oc new-project velero

  3. 默认情况下,用户维度的openshift namespace 不会在集群中的所有节点上调度Pod。

    要在所有节点上计划namespace,需要一个注释:

    1. oc annotate namespace velero openshift.io/node-selector=""

    这应该在安装velero之前完成。

  4. 启动服务器和存储服务。 在Velero目录中,运行:

    1. velero install --provider aws --plugins velero/velero-plugin-for-aws:v1.0.0 --bucket velero --secret-file ./credentials-velero --use-restic --use-volume-snapshots=true --backup-location-config region="default",s3ForcePathStyle="true",s3Url="http://glacier.ewhisper.cn",insecureSkipTLSVerify="true",signatureVersion="4" --snapshot-location-config region="default"

    创建的内容包括:

    1. CustomResourceDefinition/backups.velero.io: attempting to create resource
    2. CustomResourceDefinition/backups.velero.io: created
    3. CustomResourceDefinition/backupstoragelocations.velero.io: attempting to create resource
    4. CustomResourceDefinition/backupstoragelocations.velero.io: created
    5. CustomResourceDefinition/deletebackuprequests.velero.io: attempting to create resource
    6. CustomResourceDefinition/deletebackuprequests.velero.io: created
    7. CustomResourceDefinition/downloadrequests.velero.io: attempting to create resource
    8. CustomResourceDefinition/downloadrequests.velero.io: created
    9. CustomResourceDefinition/podvolumebackups.velero.io: attempting to create resource
    10. CustomResourceDefinition/podvolumebackups.velero.io: created
    11. CustomResourceDefinition/podvolumerestores.velero.io: attempting to create resource
    12. CustomResourceDefinition/podvolumerestores.velero.io: created
    13. CustomResourceDefinition/resticrepositories.velero.io: attempting to create resource
    14. CustomResourceDefinition/resticrepositories.velero.io: created
    15. CustomResourceDefinition/restores.velero.io: attempting to create resource
    16. CustomResourceDefinition/restores.velero.io: created
    17. CustomResourceDefinition/schedules.velero.io: attempting to create resource
    18. CustomResourceDefinition/schedules.velero.io: created
    19. CustomResourceDefinition/serverstatusrequests.velero.io: attempting to create resource
    20. CustomResourceDefinition/serverstatusrequests.velero.io: created
    21. CustomResourceDefinition/volumesnapshotlocations.velero.io: attempting to create resource
    22. CustomResourceDefinition/volumesnapshotlocations.velero.io: created
    23. Waiting for resources to be ready in cluster...
    24. Namespace/velero: attempting to create resource
    25. Namespace/velero: created
    26. ClusterRoleBinding/velero: attempting to create resource
    27. ClusterRoleBinding/velero: created
    28. ServiceAccount/velero: attempting to create resource
    29. ServiceAccount/velero: created
    30. Secret/cloud-credentials: attempting to create resource
    31. Secret/cloud-credentials: created
    32. BackupStorageLocation/default: attempting to create resource
    33. BackupStorageLocation/default: created
    34. VolumeSnapshotLocation/default: attempting to create resource
    35. VolumeSnapshotLocation/default: created
    36. Deployment/velero: attempting to create resource
    37. Deployment/velero: created
    38. DaemonSet/restic: attempting to create resource
    39. DaemonSet/restic: created
    40. Velero is installed! ? Use 'kubectl logs deployment/velero -n velero' to view the status.
  5. (openshift) 将velero ServiceAccount添加到privilegedSCC:

    1. $ oc adm policy add-scc-to-user privileged -z velero -n velero
  6. (openshift) 对于OpenShift版本> = 4.1,修改DaemonSet yaml以请求privileged模式:

    1. @@ -67,3 +67,5 @@ spec:
    2. value: /credentials/cloud
    3. - name: VELERO_SCRATCH_DIR
    4. value: /scratch
    5. + securityContext:
    6. + privileged: true

    或:

    1. oc patch ds/restic --namespace velero --type json -p '[{"op":"add","path":"/spec/template/spec/containers/0/securityContext","value": { "privileged": true}}]'

备份 - B集群

备份集群级别的特定资源

  1. velero backup create <backup-name> --include-cluster-resources=true --include-resources deployments,configmaps

查看备份

  1. velero backup describe YOUR_BACKUP_NAME

备份特定 namespace caseycui2020

排除特定资源

标签为velero.io/exclude-from-backup=true的资源不包括在备份中,即使它包含匹配的选择器标签也是如此。

通过这种方式, 不需要备份的secret 等资源通过velero.io/exclude-from-backup=true 标签(label)进行排除.

通过这种方式排除的secret部分示例如下:

  1. builder-dockercfg-jbnzr
  2. default-token-lshh8
  3. pipeline-token-xt645

使用restic 备份Pod Volume

?? 注意:

该 namespace 下, 以下2个pod volume也需要备份, 但是目前还没正式使用:

  • mycoreapphttptask-callback
  • mycoreapphttptaskservice-callback

通过 "选择性启用" 的方式进行有选择地备份.

  1. 对包含要备份的卷的每个Pod运行以下命令:

    1. oc -n caseycui2020 annotate pod/<mybackendapp-pod-name> backup.velero.io/backup-volumes=jmx-exporter-agent,pinpoint-agent,my-mybackendapp-claim
    2. oc -n caseycui2020 annotate pod/<elitegetrecservice-pod-name> backup.velero.io/backup-volumes=uploadfile

    其中,卷名是容器 spec中卷的名称。

    例如,对于以下pod:

    1. apiVersion: v1
    2. kind: Pod
    3. metadata:
    4. name: sample
    5. namespace: foo
    6. spec:
    7. containers:
    8. - image: k8s.gcr.io/test-webserver
    9. name: test-webserver
    10. volumeMounts:
    11. - name: pvc-volume
    12. mountPath: /volume-1
    13. - name: emptydir-volume
    14. mountPath: /volume-2
    15. volumes:
    16. - name: pvc-volume
    17. persistentVolumeClaim:
    18. claimName: test-volume-claim
    19. - name: emptydir-volume
    20. emptyDir: {}

    你应该运行:

    1. kubectl -n foo annotate pod/sample backup.velero.io/backup-volumes=pvc-volume,emptydir-volume

    如果您使用控制器来管理您的pods,则也可以在pod template spec中提供此批注。

备份及验证

备份namespace及其对象, 以及具有相关annotation的pod volume:

  1. # 生产 namespace
  2. velero backup create caseycui2020 --include-namespaces caseycui2020

查看备份

  1. velero backup describe YOUR_BACKUP_NAME
  2. velero backup logs caseycui2020
  3. oc -n velero get podvolumebackups -l velero.io/backup-name=caseycui2020 -o yaml

describe查看的结果如下:

  1. Name: caseycui2020
  2. Namespace: velero
  3. Labels: velero.io/storage-location=default
  4. Annotations: velero.io/source-cluster-k8s-gitversion=v1.18.3+2cf11e2
  5. velero.io/source-cluster-k8s-major-version=1
  6. velero.io/source-cluster-k8s-minor-version=18+
  7. Phase: Completed
  8. Errors: 0
  9. Warnimybackendapp: 0
  10. Namespaces:
  11. Included: caseycui2020
  12. Excluded: <none>
  13. Resources:
  14. Included: *
  15. Excluded: <none>
  16. Cluster-scoped: auto
  17. Label selector: <none>
  18. Storage Location: default
  19. Velero-Native Snapshot PVs: auto
  20. TTL: 720h0m0s
  21. Hooks: <none>
  22. Backup Format Version: 1.1.0
  23. Started: 2020-10-21 09:28:16 +0800 CST
  24. Completed: 2020-10-21 09:29:17 +0800 CST
  25. Expiration: 2020-11-20 09:28:16 +0800 CST
  26. Total items to be backed up: 591
  27. Items backed up: 591
  28. Velero-Native Snapshots: <none included>
  29. Restic Backups (specify --details for more information):
  30. Completed: 3

定期备份

使用基于cron表达式创建定期计划的备份:

  1. velero schedule create caseycui2020-b-daily --schedule="0 3 * * *" --include-namespaces caseycui2020

另外,您可以使用一些非标准的速记cron表达式:

  1. velero schedule create test-daily --schedule="@every 24h" --include-namespaces caseycui2020

有关更多用法示例,请参见cron软件包的文档。

集群迁移 - 到A集群

使用 BackupsRestores

只要您将每个Velero实例指向相同的云对象存储位置,Velero就能帮助您将资源从一个群集移植到另一个群集。 此方案假定您的群集由同一云提供商托管。 请注意,Velero本身不支持跨云提供程序迁移持久卷快照。 如果要在云平台之间迁移卷数据,请启用restic,它将在文件系统级别备份卷内容。

  1. (集群 B)假设您尚未使用Velero schedule 操作对数据进行检查点检查,则需要首先备份整个群集(根据需要替换<BACKUP-NAME>):

    1. velero backup create <BACKUP-NAME>

    默认备份保留期限以TTL(有效期)表示,为30天(720小时); 您可以使用--ttl <DURATION>标志根据需要进行更改。 有关备份到期的更多信息,请参见velero的工作原理

  2. (集群 A)配置BackupStorageLocationsVolumeSnapshotLocations, 指向 集群1 使用的位置, 使用velero backup-location createvelero snapshot-location create. 要确保配置BackupStorageLocations为 read-only, 通过在velero backup-location create时使用--access-mode=ReadOnly flag (因为我就只有一个bucket, 所以就没有配置只读这一项). 如下为在A集群安装, 安装时已配置了BackupStorageLocationsVolumeSnapshotLocations.

    1. velero install --provider aws --plugins velero/velero-plugin-for-aws:v1.0.0 --bucket velero --secret-file ./credentials-velero --use-restic --use-volume-snapshots=true --backup-location-config region="default",s3ForcePathStyle="true",s3Url="http://glacier.ewhisper.cn",insecureSkipTLSVerify="true",signatureVersion="4" --snapshot-location-config region="default"
  3. (集群A)确保已创建Velero Backup对象。 Velero资源与云存储中的备份文件同步。

    1. velero backup describe <BACKUP-NAME>

    注意:默认同步间隔为1分钟,因此请确保在检查之前等待。 您可以使用Velero服务器的--backup-sync-period标志配置此间隔。

  4. (集群A)一旦确认现在存在正确的备份(<BACKUP-NAME>),就可以使用以下方法还原所有内容: (因为backup 中只有caseycui2020一个 namespace , 所以还原是就不需要--include-namespaces caseycui2020 进行过滤)

    1. velero restore create --from-backup caseycui2020 --include-resources buildconfigs.build.openshift.io,configmaps,deploymentconfigs.apps.openshift.io,imagestreams.image.openshift.io,imagestreamtags.image.openshift.io,imagetags.image.openshift.io,limitranges,namespaces,networkpolicies.networking.k8s.io,persistentvolumeclaims,prometheusrules.monitoring.coreos.com,resourcequotas,rolebindimybackendapp.authorization.openshift.io,rolebindimybackendapp.rbac.authorization.k8s.io,routes.route.openshift.io,secrets,servicemonitors.monitoring.coreos.com,services,templateinstances.template.openshift.io

    因为后面验证persistentvolumeclaimsrestore有问题, 所以后续使用的时候先拿掉这个pvc, 后面再想办法解决:

    1. velero restore create --from-backup caseycui2020 --include-resources buildconfigs.build.openshift.io,configmaps,deploymentconfigs.apps.openshift.io,imagestreams.image.openshift.io,imagestreamtags.image.openshift.io,imagetags.image.openshift.io,limitranges,namespaces,networkpolicies.networking.k8s.io,persistentvolumeclaims,prometheusrules.monitoring.coreos.com,resourcequotas,rolebindimybackendapp.authorization.openshift.io,rolebindimybackendapp.rbac.authorization.k8s.io,routes.route.openshift.io,secrets,servicemonitors.monitoring.coreos.com,services,templateinstances.template.openshift.io

验证2个集群

检查第二个群集是否按预期运行:

  1. (集群A)运行:

    1. velero restore get

    结果如下:

    1. NAME BACKUP STATUS STARTED COMPLETED ERRORS WARNImybackendapp CREATED SELECTOR
    2. caseycui2020-20201021102342 caseycui2020 Failed <nil> <nil> 0 0 2020-10-21 10:24:14 +0800 CST <none>
    3. caseycui2020-20201021103040 caseycui2020 PartiallyFailed <nil> <nil> 46 34 2020-10-21 10:31:12 +0800 CST <none>
    4. caseycui2020-20201021105848 caseycui2020 InProgress <nil> <nil> 0 0 2020-10-21 10:59:20 +0800 CST <none>
  2. 然后运行:

    1. velero restore describe <RESTORE-NAME-FROM-GET-COMMAND>
    2. oc -n velero get podvolumerestores -l velero.io/restore-name=YOUR_RESTORE_NAME -o yaml

    结果如下:

    1. Name: caseycui2020-20201021102342
    2. Namespace: velero
    3. Labels: <none>
    4. Annotations: <none>
    5. Phase: InProgress
    6. Started: <n/a>
    7. Completed: <n/a>
    8. Backup: caseycui2020
    9. Namespaces:
    10. Included: all namespaces found in the backup
    11. Excluded: <none>
    12. Resources:
    13. Included: buildconfigs.build.openshift.io, configmaps, deploymentconfigs.apps.openshift.io, imagestreams.image.openshift.io, imagestreamtags.image.openshift.io, imagetags.image.openshift.io, limitranges, namespaces, networkpolicies.networking.k8s.io, persistentvolumeclaims, prometheusrules.monitoring.coreos.com, resourcequotas, rolebindimybackendapp.authorization.openshift.io, rolebindimybackendapp.rbac.authorization.k8s.io, routes.route.openshift.io, secrets, servicemonitors.monitoring.coreos.com, services, templateinstances.template.openshift.io
    14. Excluded: nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io
    15. Cluster-scoped: auto
    16. Namespace mappimybackendapp: <none>
    17. Label selector: <none>
    18. Restore PVs: auto

如果遇到问题,请确保Velero在两个群集中的相同namespace中运行。

我这边碰到问题, 就是openshift里边, 使用了imagestream和imagetag, 然后对应的镜像拉不过来, 容器没有启动.

容器没有启动, podvolume也没有恢复成功.

  1. Name: caseycui2020-20201021110424
  2. Namespace: velero
  3. Labels: <none>
  4. Annotations: <none>
  5. Phase: PartiallyFailed (run 'velero restore logs caseycui2020-20201021110424' for more information)
  6. Started: <n/a>
  7. Completed: <n/a>
  8. Warnimybackendapp:
  9. Velero: <none>
  10. Cluster: <none>
  11. Namespaces:
  12. caseycui2020: could not restore, imagetags.image.openshift.io "mybackendapp:1.0.0" already exists. Warning: the in-cluster version is different than the backed-up version.
  13. could not restore, imagetags.image.openshift.io "mybackendappno:1.0.0" already exists. Warning: the in-cluster version is different than the backed-up version.
  14. ...
  15. Errors:
  16. Velero: <none>
  17. Cluster: <none>
  18. Namespaces:
  19. caseycui2020: error restoring imagestreams.image.openshift.io/caseycui2020/mybackendapp: ImageStream.image.openshift.io "mybackendapp" is invalid: []: Internal error: imagestreams "mybackendapp" is invalid: spec.tags[latest].from.name: Invalid value: "mybackendapp@sha256:6c5ab553a97c74ad602d2427a326124621c163676df91f7040b035fa64b533c7": error generating tag event: imagestreamimage.image.openshift.io ......
  20. Backup: caseycui2020
  21. Namespaces:
  22. Included: all namespaces found in the backup
  23. Excluded: <none>
  24. Resources:
  25. Included: buildconfigs.build.openshift.io, configmaps, deploymentconfigs.apps.openshift.io, imagestreams.image.openshift.io, imagestreamtags.image.openshift.io, imagetags.image.openshift.io, limitranges, namespaces, networkpolicies.networking.k8s.io, persistentvolumeclaims, prometheusrules.monitoring.coreos.com, resourcequotas, rolebindimybackendapp.authorization.openshift.io, rolebindimybackendapp.rbac.authorization.k8s.io, routes.route.openshift.io, secrets, servicemonitors.monitoring.coreos.com, services, templateinstances.template.openshift.io
  26. Excluded: nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io
  27. Cluster-scoped: auto
  28. Namespace mappimybackendapp: <none>
  29. Label selector: <none>
  30. Restore PVs: auto

迁移问题总结

目前总结问题如下:

  1. imagestreams.image.openshift.io, imagestreamtags.image.openshift.io, imagetags.image.openshift.io 里的镜像没有成功导入; 确切地说是latest这个tag没有导入成功. imagestreamtags.image.openshift.io生效也需要时间.

  2. persistentvolumeclaims 迁移过来后报错, 报错如下:

    1. phase: Lost

    原因是: A B集群的StorageClass的配置是不同的, 所以B集群的PVC, 在A集群想要直接binding是不可能的. 而且创建后无法直接修改, 需要删掉重新创建.

  3. Routes 域名, 有部分域名是特定于A B集群的域名, 如: jenkins-caseycui2020.b.caas.ewhisper.cn迁移到A集群调整为: jenkins-caseycui2020.a.caas.ewhisper.cn

  4. podVolume 数据没有迁移.

latest这个tag没有导入成功

手动导入, 命令如下: (1.0.1 为ImageStream的最新的版本)

  1. oc tag xxl-job-admin:1.0.1 xxl-job-admin:latest

PVC phase Lost问题

如果手动创建, 需要对PVC yaml进行调整. 调整前后PVC如下:

B集群原YAML:

  1. kind: PersistentVolumeClaim
  2. apiVersion: v1
  3. metadata:
  4. annotations:
  5. pv.kubernetes.io/bind-completed: 'yes'
  6. pv.kubernetes.io/bound-by-controller: 'yes'
  7. volume.beta.kubernetes.io/storage-provisioner: csi.trident.netapp.io
  8. selfLink: /api/v1/namespaces/caseycui2020/persistentvolumeclaims/jenkins
  9. resourceVersion: '77304786'
  10. name: jenkins
  11. uid: ffcabc42-845d-4cdf-8c7c-56e97cb5ea82
  12. creationTimestamp: '2020-10-21T03:05:46Z'
  13. managedFields:
  14. - manager: kube-controller-manager
  15. operation: Update
  16. apiVersion: v1
  17. time: '2020-10-21T03:05:46Z'
  18. fieldsType: FieldsV1
  19. fieldsV1:
  20. 'f:status':
  21. 'f:phase': {}
  22. - manager: velero-server
  23. operation: Update
  24. apiVersion: v1
  25. time: '2020-10-21T03:05:46Z'
  26. fieldsType: FieldsV1
  27. fieldsV1:
  28. 'f:metadata':
  29. 'f:annotations':
  30. .: {}
  31. 'f:pv.kubernetes.io/bind-completed': {}
  32. 'f:pv.kubernetes.io/bound-by-controller': {}
  33. 'f:volume.beta.kubernetes.io/storage-provisioner': {}
  34. 'f:labels':
  35. .: {}
  36. 'f:app': {}
  37. 'f:template': {}
  38. 'f:template.openshift.io/template-instance-owner': {}
  39. 'f:velero.io/backup-name': {}
  40. 'f:velero.io/restore-name': {}
  41. 'f:spec':
  42. 'f:accessModes': {}
  43. 'f:resources':
  44. 'f:requests':
  45. .: {}
  46. 'f:storage': {}
  47. 'f:storageClassName': {}
  48. 'f:volumeMode': {}
  49. 'f:volumeName': {}
  50. namespace: caseycui2020
  51. finalizers:
  52. - kubernetes.io/pvc-protection
  53. labels:
  54. app: jenkins-persistent
  55. template: jenkins-persistent-monitored
  56. template.openshift.io/template-instance-owner: 5a0b28c3-c760-451b-b92f-a781406d9e91
  57. velero.io/backup-name: caseycui2020
  58. velero.io/restore-name: caseycui2020-20201021110424
  59. spec:
  60. accessModes:
  61. - ReadWriteOnce
  62. resources:
  63. requests:
  64. storage: 5Gi
  65. volumeName: pvc-414efafd-8b22-48da-8c20-6025a8e671ca
  66. storageClassName: nas-data
  67. volumeMode: Filesystem
  68. status:
  69. phase: Lost

调整后:

  1. kind: PersistentVolumeClaim
  2. apiVersion: v1
  3. metadata:
  4. name: jenkins
  5. namespace: caseycui2020
  6. labels:
  7. app: jenkins-persistent
  8. template: jenkins-persistent-monitored
  9. template.openshift.io/template-instance-owner: 5a0b28c3-c760-451b-b92f-a781406d9e91
  10. spec:
  11. accessModes:
  12. - ReadWriteOnce
  13. resources:
  14. requests:
  15. storage: 5Gi
  16. storageClassName: nas-data
  17. volumeMode: Filesystem

podVolume 数据没有迁移

可以手动迁移, 命令如下:

  1. # 登录B集群
  2. # 先把B 集群/opt/prometheus数据拿出来到当前文件夹
  3. oc rsync xxl-job-admin-5-9sgf7:/opt/prometheus .
  4. # 上边rsync命令会创建个prometheus的目录
  5. cd prometheus
  6. # 登录A集群
  7. # 再把数据拷贝进去(拷贝之前得先确保这个pod启动起来) (可以先把`JAVA_OPTS`删掉)
  8. oc rsync ./ xxl-job-admin-2-6k8df:/opt/prometheus/

总结

本文写的比较早, 后面 OpenShift 出了基于 Velero 封装的 OpenShift 专有的迁移工具, 可以直接通过它提供的工具进行迁移.

另外, OpenShift 集群上限制很多, 另外也有很多专属于 OpenShift 的资源, 所以在实际使用上和标准 K8S 的差别还是比较大的, 需要仔细注意.

本次虽然尝试失败, 但是其中的思路还是可供借鉴的.

系列文章

???参考文档

三人行, 必有我师; 知识共享, 天下为公. 本文由东风微鸣技术博客 EWhisper.cn 编写.

原文链接:https://www.cnblogs.com/east4ming/p/16975290.html

 友情链接:直通硅谷  点职佳  北美留学生论坛

本站QQ群:前端 618073944 | Java 606181507 | Python 626812652 | C/C++ 612253063 | 微信 634508462 | 苹果 692586424 | C#/.net 182808419 | PHP 305140648 | 运维 608723728

W3xue 的所有内容仅供测试,对任何法律问题及风险不承担任何责任。通过使用本站内容随之而来的风险与本站无关。
关于我们  |  意见建议  |  捐助我们  |  报错有奖  |  广告合作、友情链接(目前9元/月)请联系QQ:27243702 沸活量
皖ICP备17017327号-2 皖公网安备34020702000426号