如何用正则表达式去掉html标签_随笔

用正则表达式去掉html标签，下面是它的代码，直接复制就可以用的。

代码：

public

static string StripHTML(string HTML) //google "StripHTML" 得到 {

string[] Regexs = {

@"<script[^>]*?>.*?</script>",

@"<(\/\s*)?!?((\w+:)?\w+)(\w+(\s*=?\s*(([""'])(\\[""'tbnr]|[^\7])*?\7|\w+)|.{0})|\s)*?(\/\s*)?>",

@"([\r\n])[\s]+", @"&(quot|#34)",

@"&(amp|#38)", @"&(lt|#60)",

@"&(gt|#62)", @"&(nbsp|#160)",

@"&(iexcl|#161)",

@"&(cent|#162)",

@"&(pound|#163)",

@"&(copy|#169)", @"(\d+)",

@"-->", @"<!--.*\n" }string[]

Replaces = { "", "", "", "\"", "&",

"<", ">", " ", "\xa1", //chr(161),

"\xa2", //chr(162), "\xa3", //chr(163), "\xa9", //chr(169), "",

"\r\n", "" }string s = HTMLfor (int i = 0i <

Regexs.Lengthi++) { s = new Regex(Regexs[i],

RegexOptions.Multiline | RegexOptions.IgnoreCase).Replace(s,

Replaces[i])} s.Replace("<", "")

s.Replace(">", "")s.Replace("\r\n", "")return s

} }

</?font[^><]*>这个只却掉font标签的,保留除font以外的所有标签,如<img><p>等等. 同样的你需要去掉其他标签,只需要将里面的font换你要去掉的,就可以了.

</?[^/?(img)|(p)][^><]*>这个保留(这里我写的保留了img,p这两个标签)你指定的标签,其他的(包括font)全去掉, 如果你还有其他的标签想保留,直接在里面加一个 |(xxx)就行了,

</?[a-zA-Z]+[^><]*>这个就是我最上面写的那个,会去掉所有的标签,包括font .

</?[a-zA-Z]+[^><]*>这个表达式可以去掉所有HTML的标签

JAVA代码可以这样写:

public static String delTagsFContent(String content){

String patternTag = "</?[a-zA-Z]+[^><]*>"

String patternBlank = "(^\\s*)|(\\s*$)"

return content.replaceAll(patternTag, "").replaceAll(patternBlank, "")

}

1.PHP替换

//原内容

$content = "<tanle><tr><td>这是第一个td内容</td><td>这是第二个td内容</td></tr></table>"

//匹配模式

$preg = '/<.*?>/is'

//所有的html标签都将被替换成空字符串

$content = preg_replace($preg,'',$content)

//在页面输出

echo $content

结果：

这是第一个td内容这是第二个td内容 2. javascript替换html标签和空白字符

<html>

<head>

window.onload = function (){

//获取body文档的内容包括了html标签

var content = document.body.innerHTML

//匹配模式|查找所有html标签和空白字符

var reg = /<.+?>|\s+/ig

//替换符合替换模式的内容为空字符串，相当于删除

content = content.replace(reg,'')

//d出结果

alert(content)

}

</script>

</head>

<body>

<table>

<tr>

<td>这是第一个td内容</td>

<td>这是第二个td内容</td>

<td>这是第三个td内容</td>

</tr>

</table>

</body>

</html>

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/zaji/7196534.html

如何用正则表达式去掉html标签

发表评论

评论列表（0条）