您应该使用一些HTML解析库,例如
lxml:
from lxml import etrees = """<table> <tr><th>Event</th><th>Start Date</th><th>End Date</th></tr> <tr><td>a</td><td>b</td><td>c</td></tr> <tr><td>d</td><td>e</td><td>f</td></tr> <tr><td>g</td><td>h</td><td>i</td></tr></table>"""table = etree.HTML(s).find("body/table")rows = iter(table)headers = [col.text for col in next(rows)]for row in rows: values = [col.text for col in row] print dict(zip(headers, values))
{'End Date': 'c', 'Start Date': 'b', 'Event': 'a'}{'End Date': 'f', 'Start Date': 'e', 'Event': 'd'}{'End Date': 'i', 'Start Date': 'h', 'Event': 'g'}
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)