试试这个,也许对于这个简单的事情来说太多了,但是它起作用了:
def match_class(target): target = target.split() def do_match(tag): try: classes = dict(tag.attrs)["class"] except KeyError: classes = "" classes = classes.split() return all(c in classes for c in target) return do_matchhtml = """<div ><div ><div ><span>The actual data is some where here</span></div></div></div>"""from BeautifulSoup import BeautifulSoupsoup = BeautifulSoup(html)matches = soup.findAll(match_class("feeditemcontent cxfeeditemcontent"))for m in matches: print m print "-"*10matches = soup.findAll(match_class("feeditembody"))for m in matches: print m print "-"*10
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)