为保证系统更高的可用性,需要对重要的关键业务做双机热备
https://access.redhat.com/documentation/zh-cn/red_hat_enterprise_linux/7/html/high_availability_add-on_reference/index
https://blog.csdn.net/m0_51277041/article/details/124147404
https://www.cnblogs.com/chimeiwangliang/p/7975911.html
| 参数 | 说明 | 
|---|---|
| 10.0.0.11 | 主节点 | 
| 10.0.0.12 | 备节点 | 
| 10.0.0.10 | 虚拟IP | 
cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.0.0.11 node1
10.0.0.12 node2
让node1 和 node2 之间均可以使用密匙访问
# node1与node2都操作
ssh-keygen 
ssh-copy-id -i /root/.ssh/id_rsa.pub root@node1
ssh-copy-id -i /root/.ssh/id_rsa.pub root@node2
yum install -y fence-agents-all corosync pacemaker pcs
pacemaker 使用的用户名为hacluster ,软件安装时此用户已经建立,需要设置其密码
passwd hacluster
启动pcsd 服务,并配置各节点之间的认证,让节点之间可以互相通信
需要在两个节点上都开启pcsd服务
systemctl start pcsd.service 
systemctl enable pcsd.service
pcsd服务启动之后,监听2224端口,可以访问管理页面
https://10.0.0.11:2224
以下命令仅需要在node1上执行即可
[root@node1 ~]# pcs cluster auth node1 node2
Username: hacluster
Password: 
node1: Authorized
node2: Authorized
在节点node1 和 node2上配置nginx
yum install nginx -y
echo "welcome to node1" > /usr/share/nginx/html/index.html
systemctl start nginx.service
curl http://10.0.0.11
pacemaker 可以控制nginx服务的启动和关闭,因此在node1 和 node2上配置完nginx并测试之后关闭nginx服务
完成以上工作之后,就可以在节点node1上新建一个集群并启动
新建一个名为 mycluster 的集群
 集群节点包括node1 和 node2
[root@node1 ~]# pcs cluster setup --name mycluster node1 node2
Destroying cluster on nodes: node1, node2...
node1: Stopping Cluster (pacemaker)...
node2: Stopping Cluster (pacemaker)...
node1: Successfully destroyed cluster
node2: Successfully destroyed clusterSending 'pacemaker_remote authkey' to 'node1', 'node2'
node1: successful distribution of the file 'pacemaker_remote authkey'
node2: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
node1: Succeeded
node2: SucceededSynchronizing pcsd certificates on nodes node1, node2...
node1: Success
node2: Success
Restarting pcsd on the nodes in order to reload the certificates...
node1: Success
node2: Success
启动集群并设置集群自启动
[root@node1 ~]# pcs cluster start --all
node1: Starting Cluster (corosync)...
node2: Starting Cluster (corosync)...
node2: Starting Cluster (pacemaker)...
node1: Starting Cluster (pacemaker)...
[root@node1 ~]# pcs cluster enable --all
node1: Cluster Enabled
node2: Cluster Enabled
查看集群状态
[root@node1 ~]# pcs status 
Cluster name: myclusterWARNINGS:
No stonith devices and stonith-enabled is not falseStack: corosync
Current DC: node1 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Tue Mar 21 16:52:39 2023
Last change: Tue Mar 21 16:51:33 2023 by hacluster via crmd on node12 nodes configured
0 resource instances configuredOnline: [ node1 node2 ]No resourcesDaemon Status:corosync: active/enabledpacemaker: active/enabledpcsd: active/enabled
node1 上新建集群之后,所有的设置都会同步到node2上,而在集群状态中可以看出node1 和 node2均已在线,集群使用的服务都已激活并启动
集群状态中的 "No resources" 中可以看到集群还没有任何资源,接下来为集群添加VIP和服务
添加一个名为VIP的IP地址资源
 使用 heartbeat 作为心跳检测
 集群每隔30s检查该资源一次
[root@node1 ~]# pcs resource create VIP ocf:heartbeat:IPaddr2 ip=10.0.0.10 cidr_netmask=24 nic=eth0 op monitor interval=30s
添加一个名为 web 的nginx资源
[root@node1 ~]# pcs resource create web systemd:nginx op monitor interval=30s
查看服务资源
pcs resource list |grep nginx
如果删除,执行
[root@node1 ~]# pcs resource delete web
查看集群状态
[root@node1 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: node2 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Wed Mar 22 08:44:59 2023
Last change: Wed Mar 22 08:33:45 2023 by root via cibadmin on node12 nodes configured
2 resource instances configuredOnline: [ node1 node2 ]Full list of resources:VIP	(ocf::heartbeat:IPaddr2):	Started node1web	(systemd:nginx):	Started node1Daemon Status:corosync: active/enabledpacemaker: active/enabledpcsd: active/enabled
添加资源后还需对资源调整,让VIP和Web这两个资源捆绑在一起,以免出现VIP节点在节点node1上,而nginx运行在node2上的情况。
另一个情况则是有可能集群先启动nginx,然后在启用VIP,这是不正确的
捆绑
[root@node1 ~]# pcs constraint colocation add web VIP INFINITY
如果删除,执行
[root@node1 ~]# pcs constraint colocation remove web VIP
设置资源的启动停止顺序
先启动VIP,然后在启动web
[root@node1 ~]# pcs constraint order start VIP then start web
如果node1与node2的硬件配置不同,那么应该调整节点的优先级,让资源运行于硬件配置较好的服务器上,待其失效后在转移至较低配置的服务器上,这就需要配置优先级(pacemaker 中称为 Location)
调整Location
数值越大表示优先级越高
[root@node1 ~]# pcs constraint location web prefers node1=10
[root@node1 ~]# pcs constraint location web prefers node2=5
[root@node1 ~]# pcs property set stonith-enabled=false
[root@node1 ~]# crm_simulate -sLCurrent cluster status:
Online: [ node1 node2 ]VIP	(ocf::heartbeat:IPaddr2):	Started node1web	(systemd:nginx):	Started node1Allocation scores:
pcmk__native_allocate: VIP allocation score on node1: 10
pcmk__native_allocate: VIP allocation score on node2: 5
pcmk__native_allocate: web allocation score on node1: 10
pcmk__native_allocate: web allocation score on node2: -INFINITYTransition Summary:
提示:在本次操作中没有设置fence设备,集群在启动的时候可能会遇到一些错误,可以使用命令 pcs property set stonith-enabled=false 禁用fence设备
至此,pacemaker集群已经配置完成了,重新启动集群所有设置生效
[root@node1 ~]# pcs cluster stop --all
node2: Stopping Cluster (pacemaker)...
node1: Stopping Cluster (pacemaker)...
node2: Stopping Cluster (corosync)...
node1: Stopping Cluster (corosync)...
[root@node1 ~]# pcs cluster start --all
node1: Starting Cluster (corosync)...
node2: Starting Cluster (corosync)...
node1: Starting Cluster (pacemaker)...
node2: Starting Cluster (pacemaker)...
[root@node1 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: node2 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Wed Mar 22 08:44:59 2023
Last change: Wed Mar 22 08:33:45 2023 by root via cibadmin on node12 nodes configured
2 resource instances configuredOnline: [ node1 node2 ]Full list of resources:VIP	(ocf::heartbeat:IPaddr2):	Started node1web	(systemd:nginx):	Started node1Daemon Status:corosync: active/enabledpacemaker: active/enabledpcsd: active/enabled
[root@node1 ~]# ip a
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00inet 127.0.0.1/8 scope host lovalid_lft forever preferred_lft foreverinet6 ::1/128 scope host valid_lft forever preferred_lft forever
2: eth0:  mtu 1500 qdisc pfifo_fast state UP group default qlen 1000link/ether 00:0c:29:1a:a4:00 brd ff:ff:ff:ff:ff:ffinet 10.0.0.11/24 brd 10.0.0.255 scope global noprefixroute eth0valid_lft forever preferred_lft foreverinet 10.0.0.10/24 brd 10.0.0.255 scope global secondary eth0valid_lft forever preferred_lft foreverinet6 fe80::16dc:d558:23b:696d/64 scope link noprefixroute valid_lft forever preferred_lft forever
3: docker0:  mtu 1500 qdisc noqueue state DOWN group default link/ether 02:42:32:bd:68:14 brd ff:ff:ff:ff:ff:ffinet 172.17.0.1/16 brd 172.17.255.255 scope global docker0valid_lft forever preferred_lft forever
   [root@node1 ~]# systemctl status nginx.service 
● nginx.service - Cluster Controlled nginxLoaded: loaded (/usr/lib/systemd/system/nginx.service; disabled; vendor preset: disabled)Drop-In: /run/systemd/system/nginx.service.d└─50-pacemaker.confActive: active (running) since Wed 2023-03-22 08:38:33 CST; 10min agoDocs: http://nginx.org/en/docs/Process: 1467 ExecStart=/usr/sbin/nginx -c /etc/nginx/nginx.conf (code=exited, status=0/SUCCESS)Main PID: 1468 (nginx)Tasks: 2Memory: 3.3MCGroup: /system.slice/nginx.service├─1468 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf└─1469 nginx: worker processMar 22 08:38:33 node1 systemd[1]: Starting Cluster Controlled nginx...
Mar 22 08:38:33 node1 systemd[1]: Can't open PID file /var/run/nginx.pid (yet?) after start: No such file or directory
Mar 22 08:38:33 node1 systemd[1]: Started Cluster Controlled nginx.[root@node1 ~]# curl http://10.0.0.10
welcome to node1
启动后正常情况下VIP设置在主机点10.0.0.11上。如主节点故障,则节点node2自动接管服务,方法是直接重启节点node1,然后观察备节点是否接管了主节点的资源
重启node1,在node2观察
[root@node2 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: node2 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Wed Mar 22 08:44:59 2023
Last change: Wed Mar 22 08:33:45 2023 by root via cibadmin on node12 nodes configured
2 resource instances configuredOnline: [ node1 node2 ]Full list of resources:VIP	(ocf::heartbeat:IPaddr2):	Started node1web	(systemd:nginx):	Started node1Daemon Status:corosync: active/enabledpacemaker: active/enabledpcsd: active/enabled
[root@node2 ~]# ip a
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00inet 127.0.0.1/8 scope host lovalid_lft forever preferred_lft foreverinet6 ::1/128 scope host valid_lft forever preferred_lft forever
2: eth0:  mtu 1500 qdisc pfifo_fast state UP group default qlen 1000link/ether 00:0c:29:c2:52:cb brd ff:ff:ff:ff:ff:ffinet 10.0.0.12/24 brd 10.0.0.255 scope global noprefixroute eth0valid_lft forever preferred_lft foreverinet 10.0.0.10/24 brd 10.0.0.255 scope global secondary eth0valid_lft forever preferred_lft foreverinet6 fe80::16dc:d558:23b:696d/64 scope link tentative noprefixroute dadfailed valid_lft forever preferred_lft foreverinet6 fe80::959b:b4bc:f1b2:41f3/64 scope link noprefixroute valid_lft forever preferred_lft forever[root@node2 ~]# systemctl status nginx.service 
● nginx.service - Cluster Controlled nginxLoaded: loaded (/usr/lib/systemd/system/nginx.service; disabled; vendor preset: disabled)Drop-In: /run/systemd/system/nginx.service.d└─50-pacemaker.confActive: active (running) since Wed 2023-03-22 08:51:59 CST; 8s agoDocs: http://nginx.org/en/docs/Process: 2546 ExecStart=/usr/sbin/nginx -c /etc/nginx/nginx.conf (code=exited, status=0/SUCCESS)Main PID: 2548 (nginx)CGroup: /system.slice/nginx.service├─2548 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf└─2550 nginx: worker processMar 22 08:51:59 node2 systemd[1]: Starting Cluster Controlled nginx...
Mar 22 08:51:59 node2 systemd[1]: Can't open PID file /var/run/nginx.pid (yet?) after start: No such file or directory
Mar 22 08:51:59 node2 systemd[1]: Started Cluster Controlled nginx.[root@node2 ~]# curl http://10.0.0.10
welcome to node2
  节点 node1优先级高,恢复后VIP 和 Web又会重新被节点 node1接管
下一篇:redis分布式锁及其应用