家里厅里有三台linux主机在跑虚拟机, 一台windows主机在跑wsl2 - ubuntu 20.04
分别是
硬件 | 网络连接方式 | OS | ip | 虚拟机s |
---|---|---|---|---|
EUC i5 7250U 16G | wifi | win10 | 10.0.1.223 | wsl2 - 随机ip |
MineFine S500 R7 5800H 64G | 网线 | Zorin OS 16.2 (Ubuntu 20.04 LTS) | 10.0.1.198 | vm1 - 10.0.1.156 vm2 - 10.0.1.157 vm3 - 10.0.1.158 |
新创云 i7 5500U 8G | 网线 | Ubuntu server 22.04 LTS | 10.0.1.107 | vm0 - 10.0.1.151 vm1 - 10.0.1.152 |
ThinkPad X230 i5 3210M 16G | 网线 | Ubuntu server 22.04 LTS | 10.0.1.22 | vm0 - 10.0.1.154 |
长期开着4台服务器不划算, 大部分功能都配置在wsl2, 也就是EUC i5上, 其余3台linux服务器只是为了跑K8S 项目, 平时应该关注。
我这边电费每度电0.61 元, 假如每台linux服务待机40W, 那么一天的电费是0.04 * 24 * 3 = 1.76 元, 一年就是642 元!
所以需求是:
在wsl2 的Jenkins 创建两个job, 分别控制 其他 3台linux主机的开关机, 关机前必须先关闭所有运行的vm, 开机后开启所有的vm。
远程唤醒(开机)有两个条:
主机必须用网线连接, wifi 下是不支持唤醒的
触发唤醒的主机和被唤醒的主机必须在同1个网段。 这导致1个问题, 因为我的Jenkins job是跑下wsl2下的, 而wsl2 的ip是不能与宿主的win10 系统同1个网段的, 平时访问必须进行端口转发。 这代表我们不能直接在wsl2 去唤醒其他主机, 必须利用 win10系统去执行唤醒的命令。
也就是 客户机 -> wsl2 jenkins -> win10 系统 -> 唤醒 其他linux
首先先确定linux主机正在用哪个有线端口工作, 和它的Mac地址
ifconfig
cat /etc/netplan/00-installer-config.yaml
然后检查对应有线网卡的wakeonlan 功能是否被开启
gateman@MoreFine-S500:~$ sudo ethtool eno1
Settings for eno1:Supported ports: [ TP MII ]Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full 2500baseT/Full Supported pause frame use: Symmetric Receive-onlySupports auto-negotiation: YesSupported FEC modes: Not reportedAdvertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full 2500baseT/Full Advertised pause frame use: Symmetric Receive-onlyAdvertised auto-negotiation: YesAdvertised FEC modes: Not reportedLink partner advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Link partner advertised pause frame use: Symmetric Receive-onlyLink partner advertised auto-negotiation: YesLink partner advertised FEC modes: Not reportedSpeed: 1000Mb/sDuplex: FullPort: Twisted PairPHYAD: 0Transceiver: internalAuto-negotiation: onMDI-X: UnknownSupports Wake-on: pumbgWake-on: dLink detected: yes
首先这个sudo ethtool xxx 命令必须用root 执行, 否则不会显示wakeonlan 的设置
Supports Wake-on: pumbg --> 只要带有字符g, 代表这个块网卡支持网络唤醒, 否则需要在主板bios开启wakeonlan功能
Wake-on: d -> d 代表未开启, g代表开启
gateman@MoreFine-S500:~$ sudo ethtool -s eno1 wol g
注意上面的命令只是临时开启wol g, 重启后很可能会变会关闭状态.
这是我们必须修改网卡的设置, 令到它开机后就设置为 wol g
在netplan的配置文件中加上wakeonlan: true
gateman@MoreFine-S500:~$ sudo vi /etc/netplan/00-installer-config.yaml
# it is the network config written by 'subiquity'
network:ethernets:eno1:dhcp4: falsewakeonlan: trueenp2s0:dhcp4: falsewlp3s0:dhcp4: falsebridges:br0:interfaces: [eno1]dhcp4: noaddresses: [10.0.1.198/24]routes:- to: defaultvia: 10.0.1.1nameservers:addresses: [119.29.29.29, 8.8.8.8]version: 2
记得应用设置
sudo netplan apply
做好上面第4步后, 这是我们应该可以用其它同网段linux主机去执行下面的命令去唤醒其他主机
wakeonlan xx:xx:xx:xx:xx:xx # -- 网卡mac地址
但是很奇怪, 上面的命令不能唤醒我的x230,应该是还要加上别的什么参数。
但是这个世界有个很好用的网络工具(基本功能免费)Fing。
安装它后,发现x230 可以被唤醒了。
更关键的是Fing 提供windows下和linux下的命令行工具(CLI), 方便整合开发。
网址:
https://www.fing.com/products/development-toolkit
因为我实际要在windows服务器去唤醒(wsl2 不在同一个网段), 所以下载window版本
安装后, 可以用下面的命令来测试唤醒机器
fing --wol 70:70:fc:00:85:5b@10.0.1.198/24 # Mac地址 和ip都需要提供
其实到上面第5步为止, 我们的远程开关机已经打通了,
下面的步骤只是为了在jenkins上配1个开机和1个关机的job
我的Jenkins是跑在wsl2上的, 首先要把3个物理linux主机的ip 分在同1个组
vi /etc/ansible/hosts
[physical_servers]
10.0.1.107
10.0.1.122
10.0.1.198
当然不要忘了把wsl2的 ssh key 安装到3台服务器上
ssh-copy-id -i ~/.ssh/id_rsa.pub gateman@x230
接下来我们就可以利用Ansible处理这个组了, 不用单独去对每台主机做处理。
shellscript:
gateman@DESKTOP-UIU9RFJ:/opt/apps/playbooks/remoteserver$ cat shutdown_all_vms.sh
#!/bin/bash
echo "=== shutting down all kvm vms ===\n"
for i in $(virsh list | grep running | awk '{print $2}');
doecho "shutting down " $ivirsh shutdown $i;
done
sleep 10 # sleep 10 seconds
virsh list --all
playbook:
无费两个步骤, 1, 上传脚本, 2. 执行脚本
gateman@DESKTOP-UIU9RFJ:/opt/apps/playbooks/remoteserver$ cat shutdown_all_vms.yml
---
- hosts: "{{servers}}"remote_user: "{{ansible_user}}"gather_facts: falsetasks:- name: print debug msg for paramatersdebug:msg:- "servers is: {{servers}}"- "ansible_user is: {{ansible_user}}"- name: copy shutdown vm shell script to remote servercopy:src: ./shutdown_all_vms.shdest: /tmpbackup: nomode: 0775- name: excute the shutdown vm scriptscript: ./shutdown_all_vms.shargs:chdir: /tmpenvironment: # https://stackoverflow.com/questions/59522902/vms-are-not-visible-to-virsh-command-executed-using-ansible-shell-taskLIBVIRT_DEFAULT_URI: qemu:///systemregister: cmdresult- name: show stdout cmdresultdebug:msg: "{{ cmdresult.stdout }}"- name: show stderr cmdresultdebug:msg: "{{ cmdresult.stderr }}"
由于shutdown命令需要root
有两个方案, 要么用root去执行ansible, 代表必须安装ssh key 到3台主机的root账号下, 太危险
另1个方案就是令到普通用户可以免密码执行sudo (前提是在sudo group)
这里选择第2个
vi /etc/sudoers # 修改下面这一行
# Allow members of group sudo to execute any command
%sudo ALL=(ALL:ALL) NOPASSWD:ALL
这样sudo 用户可以用 sudo shutdown -h now 来关机
只是简单地去执行 sudo shutdown -h now 命令, 记得加上ignore_unreachable: true, 否则ansible会认为这个job执行失败。
gateman@DESKTOP-UIU9RFJ:/opt/apps/playbooks/remoteserver$ cat shutdown_server.yml
---
- hosts: "{{servers}}"remote_user: "{{ansible_user}}"gather_facts: falsetasks:- name: print debug msg for paramatersdebug:msg:- "servers is: {{servers}}"- "ansible_user is: {{ansible_user}}"- name: excute the shutdown vm scriptshell: sudo shutdown -h nowignore_errors: trueignore_unreachable: true # this will work for shutdown cases
首先先写1个common的执行某个ansible playbook的job
pipeline {agent { node { label 'master' } }stages {stage('display parameters') {steps {echo "servers is ${servers}"echo "ansible_user is ${ansible_user}"echo "playbook_path is ${playbook_path}"}}stage('run playbook'){steps {script {sh "ansible-playbook -e \"servers=${servers} ansible_user=${ansible_user}\" -vv ${playbook_path}"}}}}post {failure {emailext (subject: "FAILED: Job '${env.JOB_NAME} [${env.BUILD_NUMBER}]'",body: """FAILED: Job '${env.JOB_NAME} [${env.BUILD_NUMBER}]':
Check console output at "env.BUILD_URL}">${env.JOB_NAME} [${env.BUILD_NUMBER}]"
""",to: "nvd11@163.com",from: "nvd11@163.com")} }
}
再写1个关机的job
无非分两步, 一是去执行第1个playbook 去关闭所有的vm, 第二是去关闭物理机
def shut_vm_plybk = "/opt/apps/playbooks/remoteserver/shutdown_all_vms.yml"
def shut_server_plybk = "/opt/apps/playbooks/remoteserver/shutdown_server.yml"pipeline {agent { node { label 'master' } }stages {stage ('display parameters') {steps {echo "servers is ${servers}"echo "ansible_user is ${ansible_user}"}}stage ('shutdown all vms in the servers') {steps {build job: 'common_run_playbook', parameters: [[$class: 'StringParameterValue', name: 'servers', value: "${servers}"],[$class: 'StringParameterValue', name: 'ansible_user', value: "${ansible_user}"],[$class: 'StringParameterValue', name: 'playbook_path', value: "${shut_vm_plybk}"]]}}stage ('shutdown physical server') {steps {build job: 'common_run_playbook', parameters: [[$class: 'StringParameterValue', name: 'servers', value: "${servers}"],[$class: 'StringParameterValue', name: 'ansible_user', value: "${ansible_user}"],[$class: 'StringParameterValue', name: 'playbook_path', value: "${shut_server_plybk}"]]}}}post {success {emailext (subject: "SUCCESSFUL: Job '${env.JOB_NAME} [${env.BUILD_NUMBER}]'",body: """SUCCESSFUL: Job '${env.JOB_NAME} [${env.BUILD_NUMBER}]':
Check console output at "env.BUILD_URL}">${env.JOB_NAME} [${env.BUILD_NUMBER}]"
""",to: "nvd11@163.com",from: "nvd11@163.com")}failure {emailext (subject: "FAILED: Job '${env.JOB_NAME} [${env.BUILD_NUMBER}]'",body: """FAILED: Job '${env.JOB_NAME} [${env.BUILD_NUMBER}]':
Check console output at "env.BUILD_URL}">${env.JOB_NAME} [${env.BUILD_NUMBER}]"
""",to: "nvd11@163.com",from: "nvd11@163.com")} }}
接下来就是开机job了
由于远程唤醒前, servers都不在线, 我们是无法用ansible去处理的。
所以接下来这个job要做三步
def playbook_path="/opt/apps/playbooks"
def ip_list_str = ""
def ip_list = []
def mac_map = ["10.0.1.107":"00:e0:0a:f2:12:26","10.0.1.122":"3c:97:0e:59:14:87", ]
def start_vm_plybk = "/opt/apps/playbooks/remoteserver/startup_all_vms.yml"pipeline {agent { node { label 'master' } }stages {stage('display parameters') {steps {echo "servers is ${servers}"echo "ansible_user is ${ansible_user}"}}stage('use ansible --list host command to get the ip list'){steps {script {ip_list_str = sh(returnStdout: true, script: 'ansible physical_servers --list-host | grep -v hosts')ip_list = ip_list_str.split('\n') as List}echo "ip_list_str = ${ip_list_str}"println ip_list}}stage('loop the ip list to start them'){steps {script {for(ip in ip_list){ip = ip.trim()echo "get a ip --> ${ip}"def mac_addr = mac_map[ip]echo "mac address --> ${mac_addr}"sh " \"/mnt/d/Program Files (x86)/Fing/bin/fing.exe\" --wol ${mac_addr}@${ip}/24"}sleep(90) // 由于x230开机较慢, 必须等1分钟半才能执行后面的 启动vm命令}}}...
开机后就可以用ansible了,
这一步与第7步很类似的
shell script:
gateman@DESKTOP-UIU9RFJ:/opt/apps/playbooks/remoteserver$ cat startup_all_vms.sh
#!/bin/bash
echo "=== starting all kvm vms that contain the word vm ==="
for i in $(virsh list --all | grep vm | awk '{print $2}')do virsh start $i
done
sleep 10
virsh list --all
playbook:
gateman@DESKTOP-UIU9RFJ:/opt/apps/playbooks/remoteserver$ cat startup_all_vms.yml
---
- hosts: "{{servers}}"remote_user: "{{ansible_user}}"gather_facts: falsetasks:- name: print debug msg for paramatersdebug:msg:- "servers is: {{servers}}"- "ansible_user is: {{ansible_user}}"- name: copy shutdown vm shell script to remote servercopy:src: ./startup_all_vms.shdest: /tmpbackup: nomode: 0775- name: excute the shutdown vm scriptscript: ./startup_all_vms.shargs:chdir: /tmpenvironment: # https://stackoverflow.com/questions/59522902/vms-are-not-visible-to-virsh-command-executed-using-ansible-shell-taskLIBVIRT_DEFAULT_URI: qemu:///systemregister: cmdresult- name: show stdout cmdresultdebug:msg: "{{ cmdresult.stdout }}"- name: show stderr cmdresultdebug:msg: "{{ cmdresult.stderr }}"
其实就是完成第11 步的job
def playbook_path="/opt/apps/playbooks"
def ip_list_str = ""
def ip_list = []
def mac_map = ["10.0.1.107":"00:e0:0a:f2:12:26","10.0.1.122":"3c:97:0e:59:14:87", ]
def start_vm_plybk = "/opt/apps/playbooks/remoteserver/startup_all_vms.yml"pipeline {agent { node { label 'master' } }stages {stage('display parameters') {steps {echo "servers is ${servers}"echo "ansible_user is ${ansible_user}"}}stage('use ansible --list host command to get the ip list'){steps {script {ip_list_str = sh(returnStdout: true, script: 'ansible physical_servers --list-host | grep -v hosts')ip_list = ip_list_str.split('\n') as List}echo "ip_list_str = ${ip_list_str}"println ip_list}}stage('loop the ip list to start them'){steps {script {for(ip in ip_list){ip = ip.trim()echo "get a ip --> ${ip}"def mac_addr = mac_map[ip]echo "mac address --> ${mac_addr}"sh " \"/mnt/d/Program Files (x86)/Fing/bin/fing.exe\" --wol ${mac_addr}@${ip}/24"}sleep(90)}}}stage ('startup all vms in the servers') {steps {build job: 'common_run_playbook', parameters: [[$class: 'StringParameterValue', name: 'servers', value: "${servers}"],[$class: 'StringParameterValue', name: 'ansible_user', value: "${ansible_user}"],[$class: 'StringParameterValue', name: 'playbook_path', value: "${start_vm_plybk}"]]}}stage('completed') {steps {println 'build is completed'}}}post {success {emailext (subject: "SUCCESSFUL: Job '${env.JOB_NAME} [${env.BUILD_NUMBER}]'",body: """SUCCESSFUL: Job '${env.JOB_NAME} [${env.BUILD_NUMBER}]':
Check console output at "env.BUILD_URL}">${env.JOB_NAME} [${env.BUILD_NUMBER}]"
""",to: "nvd11@163.com",from: "nvd11@163.com")}failure {emailext (subject: "FAILED: Job '${env.JOB_NAME} [${env.BUILD_NUMBER}]'",body: """FAILED: Job '${env.JOB_NAME} [${env.BUILD_NUMBER}]':
Check console output at "env.BUILD_URL}">${env.JOB_NAME} [${env.BUILD_NUMBER}]"
""",to: "nvd11@163.com",from: "nvd11@163.com")} }}
到这里为止, 要做事情做完了。
我们拥有了2个开关机的job! 想开机关机时, 用房间电脑打开jenkins 网页去执行这个两job就行, 不用人手跑去机柜开人手开关机
至于怎么快速验证 物理机和虚拟机有无被关闭开启?
两个方法:
1.是利用 kvm 的 vm manger , 但是这个kvm客户端没有windows版本, 掂!
2, 更合适的方法? 当然是用grafana + prometheus 啦,
参考:
https://blog.csdn.net/nvd11/article/details/128030197