步骤一: 获取登录 cookie,避免selenium-wire每次重新登录。
步骤二:进入专辑页后,通过输入页号,点击跳转按钮,加载新的目录。
步骤三:循环遍历每次加载的目录,对30个音频逐一打开新的页面。在新页面中,
利用cookie,免登陆,通过selenium-wire单击播放,获得response,从中获取m4a音频连接。
总体的python模块引用
# -*- coding:utf-8 -*-
import time
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.expected_conditions import title_contains
from seleniumwire import webdriver
from win32com.client import DispatchHeaders = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'
}
首先,selenium-wire打开xima页面,一般处于未登录状态,脚本找到登录按钮并打开登录框架,
填写用户名,密码,注意这里我给人工验证留了时间,也可以通过超级鹰自动验证(但调试时间长),
步骤一仅仅实施一次,所以验证码通过人工拖动执行,并不浪费时间。登录成功后,打印cookie。
cookie是每个人不同的,所以必须执行一次步骤一。
url应该自行输入xima的实际网址。
def get_sound_cookie(url):wb = webdriver.Chrome()wb.get(url)wb.find_element(By.XPATH, '//*[@id="rootHeader"]/div/div[2]/div/div/img').click()wb.implicitly_wait(2)time.sleep(10)wb.find_element(By.XPATH, '//input[@id="accountName"]').send_keys('13426045412')wb.find_element(By.XPATH, '//input[@id="accountPWD"]').send_keys('Pingguo3%')wb.find_element(By.XPATH, '//button[@class="login-btn"]').click()wb.implicitly_wait(2)time.sleep(10)wb.find_element(By.XPATH, '//xm-player/div[@class="play-btn U_s"]/i').click()wb.implicitly_wait(2)time.sleep(3)cookies = wb.get_cookies()print(cookies) time.sleep(10)
打印出来 cookies之后,准备复制到步骤二的代码中使用。
包括两个函数,第一个是单个音频页面,需要cookie免登陆操作,单击播放,获得响应,提取m4a地址。
def get_sound_response(url):cookies = [{'domain':......your cookie pasted here......]wb = webdriver.Chrome()wb.get(url)wb.implicitly_wait(3)wb.delete_all_cookies()for cookie in cookies:if 'expiry' in cookie:del cookie['expiry']wb.add_cookie(cookie)wb.refresh()time.sleep(3)wb.get(url)wb.implicitly_wait(3)wb.find_element(By.XPATH, '//xm-player/div[@class="play-btn U_s"]/i').click()wb.implicitly_wait(5)time.sleep(5)m4a = Nonefor request in wb.requests:if request.response:if request.response.headers['Content-Type'] == "audio/mp4":#print("m4a : ", request.url)m4a = request.url# print("#########")# print(request.response.headers['Content-Type'])# print(request.url)time.sleep(5)if m4a is not None:return m4aelse:print("Not Valid: " + url)return ""
第二个函数,是第一个函数的上层函数,负责遍历所有目录。它不需要cookie免登陆,如果你想处于登陆状态,那就打开注释掉的部分。为了灵活,函数中可以指定从哪一页到哪一页。并在这个层次上,调用迅雷下载,给m4a文件改名与目录一致。
#iter all page links
def get_m4a_lists(url, album_id, pagelow, pagehigh):o = Dispatch("ThunderAgent.Agent64.1")wb = webdriver.Chrome()wb.get(url+album_id)# wb.implicitly_wait(3)# wb.delete_all_cookies()# for cookie in cookies:# if 'expiry' in cookie:# del cookie['expiry']# wb.add_cookie(cookie)# wb.refresh()# time.sleep(3)# wb.get(url+album_id)wb.implicitly_wait(3)for pagen in range(pagelow,pagehigh+1):wb.find_element(By.XPATH, '//div[@class="quick-jump N_t"]/form/input[@type="number"]').send_keys(pagen)wb.find_element(By.XPATH, '//form/button[@type="submit"]').click()wb.implicitly_wait(5)time.sleep(5)m4a_lists = wb.find_elements(By.XPATH, '//a[starts-with(@href,"/sound/")]')for page in m4a_lists:name = page.textm4a_page = page.get_attribute('href')m4a = get_sound_response(m4a_page)print(m4a)o.AddTask(m4a, name)o.CommitTasks()
最后加个main调用即可。有三种调用方式,步骤一使用get_sound_cookie,
步骤二,可以get_sound_response单个调试或者单个下载,或者get_m4a_list批量下载。
if __name__ == '__main__':url = "https://www.xima.com/album/"album_id = "68117318"get_m4a_lists(url, album_id, 1, 2)#get_sound_cookie('https://www.xima.com/sound/539598850')#get_sound_response('https://www.xima.com')