我会考虑使用 自定义的Retry
Middleware
,它类似于内置的。
import logginglogger = logging.getLogger(__name__)class RetryMiddleware(object): def process_response(self, request, response, spider): if 'var PageIsLoaded = false;' in response.body: logger.warning('parse_page encountered an incomplete rendering of {}'.format(response.url)) return self._retry(request) or response return response def _retry(self, request): logger.debug("Retrying %(request)s", {'request': request}) retryreq = request.copy() retryreq.dont_filter = True return retryreq
并且不要忘记激活它。
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)