我正在尝试解析HTML,特别是:
<HTML> <head> <Title>Insert CLever Title</Title> </head> <body> What don't you like? <select ID="some stuff"> <option name="first" Font="green">boilerplate</option> <option selected name="second" Font="blue">parsing HTML with regexes</option> <option name="third" Font="red">closing Tags for option elements </select> That was short. </body></HTML>
我的代码是:
{-# LANGUAGE FlexibleContexts,RankNTypes #-}module Main whereimport System.Environment (getArgs)import Data.Map hIDing (null)import Text.Parsec hIDing ((<|>),label,many,optional)import Text.Parsec.Tokenimport Control.applicativedata HTML = Element { tag :: String,attributes :: Map String (Maybe String),children :: [HTML] } | Text { contents :: String } deriving (Show,Eq)type HTMLParser a = forall s u m. Stream s m Char => ParsecT s u m aHTMLDoc :: HTMLParser HTMLHTMLDoc = do spaces doc <- HTML spaces >> eof return docHTML :: HTMLParser HTMLHTML = text <|> elementtext :: HTMLParser HTMLtext = Text <$> (many1 $noneOf "<")label :: HTMLParser Stringlabel = many1 . oneOf $['a' .. 'z'] ++ ['A' .. 'Z']value :: HTMLParser Stringvalue = between (char '"') (char '"') (many anyChar) <|> labelattribute :: HTMLParser (String,Maybe String)attribute = (,) <$> label <*> (optionMaybe $spaces >> char '=' >> spaces >> value)element :: HTMLParser HTMLelement = do char '<' >> spaces tag <- label -- at least one space between each attribute and what was before attributes <- fromList <$> many (space >> spaces >> attribute) spaces >> char '>' -- nested HTML children <- many HTML optional $string "</" >> spaces >> string tag >> spaces >> char '>' return $Element tag attributes childrenmain = do source : _ <- getArgs result <- parse HTMLDoc source <$> readfile source print result
问题似乎是我的解析器不喜欢关闭标签 – 似乎贪婪假设<总是意味着一个开头的标签(据我所知):
% HTMLParser temp.HTMLleft "temp.HTML" (line 3,column 32):unexpected "/"expecting white space
我已经玩了一会儿,我不知道为什么它没有回溯过char’<'比赛.
解决方法 就像ehird说的那样,我需要尝试使用:attribute = (,) <$> label <*> (optionMaybe . try $spaces >> char '=' >> spaces >> value) --...attributes <- fromList <$> many (try $space >> spaces >> attribute)--...children <- many $try HTMLoptional . try $string "</" >> spaces >> string tag >> spaces >> char '>'总结
以上是内存溢出为你收集整理的html – Haskell:为什么我的解析器没有正确回溯?全部内容,希望文章能够帮你解决html – Haskell:为什么我的解析器没有正确回溯?所遇到的程序开发问题。
如果觉得内存溢出网站内容还不错,欢迎将内存溢出网站推荐给程序员好友。
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)