爬虫爬取数据出现只有表头或者需要验证的情况

最新推荐文章于 2026-01-16 22:02:25 发布

原创

最新推荐文章于 2026-01-16 22:02:25 发布 · 1.7k 阅读

标签

#python #爬虫

收录于

本文主要介绍了在使用Python爬虫抓取猫眼电影数据时遇到只有表头或需验证的情况。问题可能源于猫眼的反爬策略，如仅设置'User-Agent'导致的验证页面跳转，以及频繁请求引发的空表头。解决方案包括在headers中添加cookie，将http改为https，以及更换访问IP以避免被识别为爬虫。参考了相关博客文章进行问题排查和解决。

问题描述：

小白在学习爬虫爬取猫眼电影的时候出现了只有空表头的情况：

学习使用的代码为：

import requests
import bs4
from requests.exceptions import RequestException
import openpyxl


def get_one_page(url, headers):
    try:
        response = requests.get(url, headers=headers)
        if response.status_code == 200:
            return response.text
        return None
    except RequestException:
        return None


def parse_one_page(html):
    soup = bs4.BeautifulSoup(html, 'lxml')
    # 获取电影名
    movies = []
    targets = soup.find_all(class_='name')
    for each in targets:
        movies.append(each.get_text())
    # 获取评分
    scores =