扫二维码与项目经理沟通
我们在微信上24小时期待你的声音
解答本文疑问/技术咨询/运营咨询/技术建议/互联网交流
参考文章:
创新互联公司专注于企业营销型网站建设、网站重做改版、云安网站定制设计、自适应品牌网站建设、HTML5建站、商城系统网站开发、集团公司官网建设、外贸网站建设、高端网站制作、响应式网页设计等建站业务,价格优惠性价比高,为云安等各大城市提供网站开发制作服务。
https://ieevee.com/tech/2018/05/16/k8s-rbd.html
https://zhangchenchen.github.io/2017/11/17/kubernetes-integrate-with-ceph/
https://docs.openshift.com/container-platform/3.5/install_config/storage_examples/ceph_rbd_dynamic_example.html
https://jimmysong.io/kubernetes-handbook/practice/using-ceph-for-persistent-storage.html
感谢以上作者提供的技术参考,这里我加以整理,分别实现了多主数据库集群和主从数据库结合Ceph RDB的实现方式。以下配置只为测试使用,不能做为生产配置。
在K8S的持久化存储中主要有以下几种分类:
volume: 就是直接挂载在pod上的组件,k8s中所有的其他存储组件都是通过volume来跟pod直接联系的。volume有个type属性,type决定了挂载的存储是什么,常见的比如:emptyDir,hostPath,nfs,rbd,以及下文要说的persistentVolumeClaim等。跟docker里面的volume概念不同的是,docker里的volume的生命周期是跟docker紧紧绑在一起的。这里根据type的不同,生命周期也不同,比如emptyDir类型的就是跟docker一样,pod挂掉,对应的volume也就消失了,而其他类型的都是永久存储。详细介绍可以参考Volumes
Persistent Volumes:顾名思义,这个组件就是用来支持永久存储的,Persistent Volumes组件会抽象后端存储的提供者(也就是上文中volume中的type)和消费者(即具体哪个pod使用)。该组件提供了PersistentVolume和PersistentVolumeClaim两个概念来抽象上述两者。一个PersistentVolume(简称PV)就是后端存储提供的一块存储空间,具体到ceph rbd中就是一个image,一个PersistentVolumeClaim(简称PVC)可以看做是用户对PV的请求,PVC会跟某个PV绑定,然后某个具体pod会在volume 中挂载PVC,就挂载了对应的PV。
添加ceph的yum源:
[Ceph]
name=Ceph packages for $basearch
baseurl=https://mirrors.aliyun.com/ceph/rpm-mimic/el7/$basearch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
[Ceph-noarch]
name=Ceph noarch packages
baseurl=https://mirrors.aliyun.com/ceph/rpm-mimic/el7/noarch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
[ceph-source]
name=Ceph source packages
baseurl=https://mirrors.aliyun.com/ceph/rpm-mimic/el7/SRPMS
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc
安装ceph-common:
yum install ceph-common -y
如果安装过程出现依赖报错,可以通过如下方式解决:
yum install -y yum-utils && \
yum-config-manager --add-repo https://dl.fedoraproject.org/pub/epel/7/x86_64/ && \
yum install --nogpgcheck -y epel-release && \
rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7 && \
rm -f /etc/yum.repos.d/dl.fedoraproject.org*
yum -y install ceph-common
将ceph配置文件拷贝到各个k8s的node节点
[root@ceph-1 ~]# scp /etc/ceph k8s-node:/etc/
通过使用一个简单的volume,测试集群环境是否正常,在实际的应用中,需要永久保存的数据不能使用volume的方式。
创建新的镜像时,需要禁用某些不支持的属性:
rbd create foobar -s 1024 -p k8s
rbd feature disable k8s/foobar object-map fast-diff deep-flatten
查看镜像信息:
# rbd info k8s/foobar
rbd image 'foobar':
size 1 GiB in 256 objects
order 22 (4 MiB objects)
id: ad9b6b8b4567
block_name_prefix: rbd_data.ad9b6b8b4567
format: 2
features: layering, exclusive-lock
op_features:
flags:
create_timestamp: Tue Apr 23 17:37:39 2019
这里指定了ceph的 admin.keyring文件作为认证密钥:
# cat test.yaml
apiVersion: v1
kind: Pod
metadata:
name: rbd
spec:
containers:
- image: nginx
name: rbd-rw
volumeMounts:
- name: rbdpd
mountPath: /mnt
volumes:
- name: rbdpd
rbd:
monitors:
- '192.168.20.41:6789'
pool: k8s
image: foobar
fsType: xfs
readOnly: false
user: admin
keyring: /etc/ceph/ceph.client.admin.keyring
如果需要永久保存数据(当pod删除后数据不会丢失),我们需要使用PV(PersistentVolume),和PVC(PersistentVolumeClaim)的方式。
rbd create -s 1024 k8s/pv
rbd feature disable k8s/pv object-map fast-diff deep-flatten
查看镜像信息:
# rbd info k8s/pv
rbd image 'pv':
size 1 GiB in 256 objects
order 22 (4 MiB objects)
id: adaa6b8b4567
block_name_prefix: rbd_data.adaa6b8b4567
format: 2
features: layering, exclusive-lock
op_features:
flags:
create_timestamp: Tue Apr 23 19:09:58 2019
grep key /etc/ceph/ceph.client.admin.keyring |awk '{printf "%s", $NF}'|base64
apiVersion: v1
kind: Secret
metadata:
name: ceph-secret
type: "kubernetes.io/rbd"
data:
key: QVFBbk1MaGNBV2laSGhBQUVOQThRWGZyQ3haRkJDNlJaWTNJY1E9PQ==
---
# cat ceph-rbd-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: ceph-rbd-pv
spec:
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
rbd:
monitors:
- '192.168.20.41:6789'
pool: k8s
image: pv
user: admin
secretRef:
name: ceph-secret
fsType: xfs
readOnly: false
persistentVolumeReclaimPolicy: Recycle
# cat ceph-rbd-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: ceph-rbd-pv-claim
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
# cat test3-pvc.yaml
apiVersion: v1
kind: Pod
metadata:
name: rbd-nginx
spec:
containers:
- image: nginx
name: rbd-rw
volumeMounts:
- name: rbd-pvc
mountPath: /mnt
volumes:
- name: rbd-pvc
persistentVolumeClaim:
claimName: ceph-rbd-pv-claim
简单来说,storage配置了要访问ceph RBD的IP/Port、用户名、keyring、pool,等信息,我们不需要提前创建image;当用户创建一个PVC时,k8s查找是否有符合PVC请求的storage class类型,如果有,则依次执行如下操作:
通过这种方式管理员只要创建好storage class就行了,后面的事情用户自己就可以搞定了。如果想要防止资源被耗尽,可以设置一下Resource Quota。
当pod需要一个卷时,直接通过PVC声明,就可以根据需求创建符合要求的持久卷。
# cat storageclass.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast
provisioner: kubernetes.io/rbd
parameters:
monitors: 192.168.20.41:6789
adminId: admin
adminSecretName: ceph-secret
pool: k8s
userId: admin
userSecretName: ceph-secret
fsType: xfs
imageFormat: "2"
imageFeatures: "layering"
RBD只支持 ReadWriteOnce 和 ReadOnlyAll,不支持ReadWriteAll。注意这两者的区别点是,不同nodes之间是否可以同时挂载。同一个node上,即使是ReadWriteOnce,也可以同时挂载到2个容器上的。
创建应用的时候,需要同时创建 pv和pod,二者通过storageClassName关联。pvc中需要指定其storageClassName为上面创建的sc的name(即fast)。
# cat pvc.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: rbd-pvc-pod-pvc
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 1Gi
storageClassName: fast
创建pod
# cat pod.yaml
apiVersion: v1
kind: Pod
metadata:
labels:
test: rbd-pvc-pod
name: ceph-rbd-sc-pod1
spec:
containers:
- name: ceph-rbd-sc-nginx
image: nginx
volumeMounts:
- name: ceph-rbd-vol1
mountPath: /mnt
readOnly: false
volumes:
- name: ceph-rbd-vol1
persistentVolumeClaim:
claimName: rbd-pvc-pod-pvc
在使用Storage Class时,除了使用PVC的方式声明要使用的持久卷,还可通过创建一个volumeClaimTemplates进行声明创建(StatefulSets中的存储设置),如果涉及到多个副本,可以使用StatefulSets配置:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: nginx
spec:
selector:
matchLabels:
app: nginx
serviceName: "nginx"
replicas: 3
template:
metadata:
labels:
app: nginx
spec:
terminationGracePeriodSeconds: 10
containers:
- name: nginx
image: nginx
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "fast"
resources:
requests:
storage: 1Gi
但注意不要用Deployment。因为,如果Deployment的副本数是1,那么还是可以用的,跟Pod一致;但如果副本数 >1 ,此时创建deployment后会发现,只启动了1个Pod,其他Pod都在ContainerCreating状态。过一段时间describe pod可以看到,等volume等很久都没等到。
官方文档:https://kubernetes.io/docs/tasks/run-application/run-replicated-stateful-application/
statefulset(1.5之前叫做petset),statefulset与deployment,replicasets是一个级别的。不过Deployments和ReplicaSets是为无状态服务而设计。statefulset则是为了解决有状态服务的问题。它的应用场景如下:
由应用场景可知,statefuleset特别适合mqsql,redis等数据库集群。相应的,一个statefuleset有以下三个部分:
如果k8s集群中已经创建了ceph 的secret可以跳过此步
生成一个加密的key
grep key /etc/ceph/ceph.client.admin.keyring |awk '{printf "%s", $NF}'|base64
将生成的key创建一个Secret
apiVersion: v1
kind: Secret
metadata:
name: ceph-secret
namespace: galera
type: "kubernetes.io/rbd"
data:
key: QVFBbk1MaGNBV2laSGhBQUVOQThRWGZyQ3haRkJDNlJaWTNJY1E9PQ==
---
# cat storageclass.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast
provisioner: kubernetes.io/rbd
parameters:
monitors: 192.168.20.41:6789,192.168.20.42:6789,192.168.20.43:6789
adminId: admin
adminSecretName: ceph-secret
pool: k8s
userId: admin
userSecretName: ceph-secret
fsType: xfs
imageFormat: "2"
imageFeatures: "layering"
galera-service.yaml
apiVersion: v1
kind: Service
metadata:
annotations:
service.alpha.kubernetes.io/tolerate-unready-endpoints: "true"
name: galera
namespace: galera
labels:
app: mysql
spec:
ports:
- port: 3306
name: mysql
# *.galear.default.svc.cluster.local
clusterIP: None
selector:
app: mysql
这里使用V1版本的StatefulSet,和之前的版本相比,v1版本是当前的稳定版本,同时与之前的beta版的区别是v1版本需要添加spec.selector.matchLabels的参数,此参数需要与spec.template.metadata.labels保持一致。
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql
namespace: galera
spec:
selector:
matchLabels:
app: mysql
serviceName: "galera"
replicas: 3
template:
metadata:
labels:
app: mysql
spec:
initContainers:
- name: install
image: mirrorgooglecontainers/galera-install:0.1
imagePullPolicy: Always
args:
- "--work-dir=/work-dir"
volumeMounts:
- name: workdir
mountPath: "/work-dir"
- name: config
mountPath: "/etc/mysql"
- name: bootstrap
image: debian:jessie
command:
- "/work-dir/peer-finder"
args:
- -on-start="/work-dir/on-start.sh"
- "-service=galera"
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
volumeMounts:
- name: workdir
mountPath: "/work-dir"
- name: config
mountPath: "/etc/mysql"
containers:
- name: mysql
image: mirrorgooglecontainers/mysql-galera:e2e
ports:
- containerPort: 3306
name: mysql
- containerPort: 4444
name: sst
- containerPort: 4567
name: replication
- containerPort: 4568
name: ist
args:
- --defaults-file=/etc/mysql/my-galera.cnf
- --user=root
readinessProbe:
# TODO: If docker exec is buggy just use gcr.io/google_containers/mysql-healthz:1.0
exec:
command:
- sh
- -c
- "mysql -u root -e 'show databases;'"
initialDelaySeconds: 15
timeoutSeconds: 5
successThreshold: 2
volumeMounts:
- name: datadir
mountPath: /var/lib/
- name: config
mountPath: /etc/mysql
volumes:
- name: config
emptyDir: {}
- name: workdir
emptyDir: {}
volumeClaimTemplates:
- metadata:
name: datadir
annotations:
volume.beta.kubernetes.io/storage-class: "fast"
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi
查看pod状态已经正常
[root@master-1 ~]# kubectl get pod -n galera
NAME READY STATUS RESTARTS AGE
mysql-0 1/1 Running 0 48m
mysql-1 1/1 Running 0 43m
mysql-2 1/1 Running 0 38m
数据库集群建立:
[root@master-1 ~]# kubectl exec mysql-1 -n galera -- mysql -uroot -e 'show status like "wsrep_cluster_size";'
Variable_name Value
wsrep_cluster_size 3
查看pv绑定:
[root@master-1 mysql-cluster]# kubectl get pvc -l app=mysql -n galera
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
datadir-mysql-0 Bound pvc-6e5a1c45-666b-11e9-ad20-000c29016590 1Gi RWO fast 3d20h
datadir-mysql-1 Bound pvc-25683cfd-666c-11e9-ad20-000c29016590 1Gi RWO fast 3d20h
datadir-mysql-2 Bound pvc-c024b422-666c-11e9-ad20-000c29016590 1Gi RWO fast 3d20h
测试数据库:
kubectl exec mysql-2 -n galera -- mysql -uroot -e <
查看数据:
# kubectl run mysql-client --image=mysql:5.7 -i -t --rm --restart=Never -- mysql -h 10.2.58.7 -e "SELECT * FROM demo.messages"
If you don't see a command prompt, try pressing enter.
+---------+
| message |
+---------+
| hello |
+---------+
pod "mysql-client" deleted
如果pod之间互相访问,查询数据库就需要定义一个svc, 这里定义一个连接mysql的svc:
apiVersion: v1
kind: Service
metadata:
name: mysql-read
namespace: galera
labels:
app: mysql
spec:
ports:
- name: mysql
port: 3306
selector:
app: mysql
通过使用Pod来访问数据库:
# kubectl run mysql-client --image=mysql:5.7 -i -t --rm --restart=Never -- mysql -h mysql-read.galera -e "SELECT * FROM demo.messages"
+---------+
| message |
+---------+
| hello |
+---------+
pod "mysql-client" deleted
官方参考文档
在ceph 集群中创建一个kube的pool,用于数据库的存储池:
[root@ceph-1 ~]# ceph osd pool create kube 128
pool 'kube' created
新定义一个storageclass:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: mysql
provisioner: kubernetes.io/rbd
parameters:
monitors: 192.168.20.41:6789,192.168.20.42:6789,192.168.20.43:6789
adminId: admin
adminSecretName: ceph-secret
pool: kube
userId: admin
userSecretName: ceph-secret
fsType: xfs
imageFormat: "2"
imageFeatures: "layering"
由于要使用statefulSet进行主从数据库的部署,这里需要创建一个headless的service,和一个用于读库的service:
# Headless service for stable DNS entries of StatefulSet members.
apiVersion: v1
kind: Service
metadata:
name: mysql
labels:
app: mysql
spec:
ports:
- name: mysql
port: 3306
clusterIP: None
selector:
app: mysql
---
# Client service for connecting to any MySQL instance for reads.
# For writes, you must instead connect to the master: mysql-0.mysql.
apiVersion: v1
kind: Service
metadata:
name: mysql-read
labels:
app: mysql
spec:
ports:
- name: mysql
port: 3306
selector:
app: mysql
由于要进行主从同步,所以必须主库和从库必须要有相应的配置:
apiVersion: v1
kind: ConfigMap
metadata:
name: mysql
labels:
app: mysql
data:
master.cnf: |
# Apply this config only on the master.
[mysqld]
log-bin
slave.cnf: |
# Apply this config only on slaves.
[mysqld]
super-read-only
这里指定了使用StorageClass,使用RBD存储,同时需要使用一个xtrabackup的镜像进行数据同步:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql
spec:
selector:
matchLabels:
app: mysql
serviceName: mysql
replicas: 3
template:
metadata:
labels:
app: mysql
spec:
initContainers:
- name: init-mysql
image: mysql:5.7
command:
- bash
- "-c"
- |
set -ex
# Generate mysql server-id from pod ordinal index.
[[ `hostname` =~ -([0-9]+)$ ]] || exit 1
ordinal=${BASH_REMATCH[1]}
echo [mysqld] > /mnt/conf.d/server-id.cnf
# Add an offset to avoid reserved server-id=0 value.
echo server-id=$((100 + $ordinal)) >> /mnt/conf.d/server-id.cnf
# Copy appropriate conf.d files from config-map to emptyDir.
if [[ $ordinal -eq 0 ]]; then
cp /mnt/config-map/master.cnf /mnt/conf.d/
else
cp /mnt/config-map/slave.cnf /mnt/conf.d/
fi
volumeMounts:
- name: conf
mountPath: /mnt/conf.d
- name: config-map
mountPath: /mnt/config-map
- name: clone-mysql
image: tangup/xtrabackup:1.0
command:
- bash
- "-c"
- |
set -ex
# Skip the clone if data already exists.
[[ -d /var/lib/mysql/mysql ]] && exit 0
# Skip the clone on master (ordinal index 0).
[[ `hostname` =~ -([0-9]+)$ ]] || exit 1
ordinal=${BASH_REMATCH[1]}
[[ $ordinal -eq 0 ]] && exit 0
# Clone data from previous peer.
ncat --recv-only mysql-$(($ordinal-1)).mysql 3307 | xbstream -x -C /var/lib/mysql
# Prepare the backup.
xtrabackup --prepare --target-dir=/var/lib/mysql
volumeMounts:
- name: data
mountPath: /var/lib/mysql
subPath: mysql
- name: conf
mountPath: /etc/mysql/conf.d
containers:
- name: mysql
image: mysql:5.7
env:
- name: MYSQL_ALLOW_EMPTY_PASSWORD
value: "1"
ports:
- name: mysql
containerPort: 3306
volumeMounts:
- name: data
mountPath: /var/lib/mysql
subPath: mysql
- name: conf
mountPath: /etc/mysql/conf.d
resources:
requests:
cpu: 500m
memory: 1Gi
livenessProbe:
exec:
command: ["mysqladmin", "ping"]
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
readinessProbe:
exec:
# Check we can execute queries over TCP (skip-networking is off).
command: ["mysql", "-h", "127.0.0.1", "-e", "SELECT 1"]
initialDelaySeconds: 5
periodSeconds: 2
timeoutSeconds: 1
- name: xtrabackup
image: tangup/xtrabackup:1.0
ports:
- name: xtrabackup
containerPort: 3307
command:
- bash
- "-c"
- |
set -ex
cd /var/lib/mysql
# Determine binlog position of cloned data, if any.
if [[ -f xtrabackup_slave_info ]]; then
# XtraBackup already generated a partial "CHANGE MASTER TO" query
# because we're cloning from an existing slave.
mv xtrabackup_slave_info change_master_to.sql.in
# Ignore xtrabackup_binlog_info in this case (it's useless).
rm -f xtrabackup_binlog_info
elif [[ -f xtrabackup_binlog_info ]]; then
# We're cloning directly from master. Parse binlog position.
[[ `cat xtrabackup_binlog_info` =~ ^(.*?)[[:space:]]+(.*?)$ ]] || exit 1
rm xtrabackup_binlog_info
echo "CHANGE MASTER TO MASTER_LOG_FILE='${BASH_REMATCH[1]}',\
MASTER_LOG_POS=${BASH_REMATCH[2]}" > change_master_to.sql.in
fi
# Check if we need to complete a clone by starting replication.
if [[ -f change_master_to.sql.in ]]; then
echo "Waiting for mysqld to be ready (accepting connections)"
until mysql -h 127.0.0.1 -e "SELECT 1"; do sleep 1; done
echo "Initializing replication from clone position"
# In case of container restart, attempt this at-most-once.
mv change_master_to.sql.in change_master_to.sql.orig
mysql -h 127.0.0.1 <
查看pod:
[root@master-1 ~]# kubectl get po
NAME READY STATUS RESTARTS AGE
mysql-0 2/2 Running 2 110m
mysql-1 2/2 Running 0 109m
mysql-2 2/2 Running 0 16m
pvc:
[root@master-1 ~]# kubectl get pvc |grep mysql|grep -v fast
data-mysql-0 Bound pvc-3737108a-6a2a-11e9-ac56-000c296b46ac 1Gi RWO mysql 5h53m
data-mysql-1 Bound pvc-279bdca0-6a4a-11e9-ac56-000c296b46ac 1Gi RWO mysql 114m
data-mysql-2 Bound pvc-fbe153bc-6a52-11e9-ac56-000c296b46ac 1Gi RWO mysql 51m
Ceph集群上自动创建的镜像:
[root@ceph-1 ~]# rbd list kube
kubernetes-dynamic-pvc-2ee47370-6a4a-11e9-bb82-000c296b46ac
kubernetes-dynamic-pvc-39a42869-6a2a-11e9-bb82-000c296b46ac
kubernetes-dynamic-pvc-fbead120-6a52-11e9-bb82-000c296b46ac
向主库写入数据,使用headless server所提供的 podname.headlessname 的形式就可以直接访问POD, 这在DNS解析中是固定的。这里访问mysql-0就使用mysql-0.mysql:
kubectl run mysql-client --image=mysql:5.7 -i --rm --restart=Never --\
mysql -h mysql-0.mysql <
使用mysql-read去访问数据库数据:
# kubectl run mysql-client --image=mysql:5.7 -i -t --rm --restart=Never -- mysql -h mysql-read -e "SELECT * FROM test.messages"
+---------+
| message |
+---------+
| hello |
+---------+
可以使用如下命令去循环的查看当前是mysql-read连接的数据库:
kubectl run mysql-client-loop --image=mysql:5.7 -i -t --rm --restart=Never --\
bash -ic "while sleep 1; do mysql -h mysql-read -e 'SELECT @@server_id,NOW()'; done"
+-------------+---------------------+
| @@server_id | NOW() |
+-------------+---------------------+
| 102 | 2019-04-28 20:24:11 |
+-------------+---------------------+
+-------------+---------------------+
| @@server_id | NOW() |
+-------------+---------------------+
| 101 | 2019-04-28 20:27:35 |
+-------------+---------------------+
+-------------+---------------------+
| @@server_id | NOW() |
+-------------+---------------------+
| 100 | 2019-04-28 20:18:38 |
+-------------+---------------------+
我们在微信上24小时期待你的声音
解答本文疑问/技术咨询/运营咨询/技术建议/互联网交流