html – Haskell:为什么我的解析器没有正确回溯?

html – Haskell:为什么我的解析器没有正确回溯?,第1张

概述我决定自学如何使用Parsec,而且我用我自己指定的玩具项目打了一个路障. 我正在尝试解析HTML,特别是: <html> <head> <title>Insert Clever Title</title> </head> <body> What don't you like? <select id="some stuff"> <option nam 我决定自学如何使用Parsec,而且我用我自己指定的玩具项目打了一个路障.

我正在尝试解析HTML,特别是:

<HTML>  <head>    <Title>Insert CLever Title</Title>  </head>  <body>    What don't you like?    <select ID="some stuff">      <option name="first" Font="green">boilerplate</option>      <option selected name="second" Font="blue">parsing HTML with regexes</option>      <option name="third" Font="red">closing Tags for option elements    </select>    That was short.  </body></HTML>

我的代码是:

{-# LANGUAGE FlexibleContexts,RankNTypes #-}module Main whereimport System.Environment (getArgs)import Data.Map hIDing (null)import Text.Parsec hIDing ((<|>),label,many,optional)import Text.Parsec.Tokenimport Control.applicativedata HTML = Element { tag :: String,attributes :: Map String (Maybe String),children :: [HTML] }          | Text { contents :: String }  deriving (Show,Eq)type HTMLParser a = forall s u m. Stream s m Char => ParsecT s u m aHTMLDoc :: HTMLParser HTMLHTMLDoc = do  spaces  doc <- HTML  spaces >> eof  return docHTML :: HTMLParser HTMLHTML = text <|> elementtext  :: HTMLParser HTMLtext = Text <$> (many1 $noneOf "<")label :: HTMLParser Stringlabel = many1 . oneOf $['a' .. 'z']  ++ ['A' .. 'Z']value :: HTMLParser Stringvalue = between (char '"') (char '"') (many anyChar) <|> labelattribute :: HTMLParser (String,Maybe String)attribute = (,) <$> label <*> (optionMaybe $spaces >> char '=' >> spaces >> value)element :: HTMLParser HTMLelement = do  char '<' >> spaces  tag <- label  -- at least one space between each attribute and what was before  attributes <- fromList <$> many (space >> spaces >> attribute)  spaces >> char '>'   -- nested HTML  children <- many HTML  optional $string "</" >> spaces >> string tag >> spaces >> char '>'  return $Element tag attributes childrenmain = do  source : _ <- getArgs  result <- parse HTMLDoc source <$> readfile source  print result

问题似乎是我的解析器不喜欢关闭标签 – 似乎贪婪假设<总是意味着一个开头的标签(据我所知):

% HTMLParser temp.HTMLleft "temp.HTML" (line 3,column 32):unexpected "/"expecting white space

我已经玩了一会儿,我不知道为什么它没有回溯过char’<'比赛.

解决方法 就像ehird说的那样,我需要尝试使用:

attribute = (,) <$> label <*> (optionMaybe . try $spaces >> char '=' >> spaces >> value) --...attributes <- fromList <$> many (try $space >> spaces >> attribute)--...children <- many $try HTMLoptional . try $string "</" >> spaces >> string tag >> spaces >> char '>'
总结

以上是内存溢出为你收集整理的html – Haskell:为什么我的解析器没有正确回溯?全部内容,希望文章能够帮你解决html – Haskell:为什么我的解析器没有正确回溯?所遇到的程序开发问题。

如果觉得内存溢出网站内容还不错,欢迎将内存溢出网站推荐给程序员好友。

欢迎分享,转载请注明来源:内存溢出

原文地址: https://outofmemory.cn/web/1058960.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-05-25
下一篇 2022-05-25

发表评论

登录后才能评论

评论列表(0条)

保存