匹配所有html标记开始和结束的正则表达式_随笔

//用正则表达式过滤脚本

public string wipeScript(string html)

{

System.Text.RegularExpressions.Regex regex1 = new

System.Text.RegularExpressions.Regex(@"<script[\s\S]+</script

*>",System.Text.RegularExpressions.RegexOptions.IgnoreCase)

System.Text.RegularExpressions.Regex regex2 = new

System.Text.RegularExpressions.Regex(@" href *= *[\s\S]*script

*:",System.Text.RegularExpressions.RegexOptions.IgnoreCase)

System.Text.RegularExpressions.Regex regex3 = new

System.Text.RegularExpressions.Regex(@"

on[\s\S]*=",System.Text.RegularExpressions.RegexOptions.IgnoreCase)

System.Text.RegularExpressions.Regex regex4 = new

System.Text.RegularExpressions.Regex(@"<iframe[\s\S]+</iframe

*>",System.Text.RegularExpressions.RegexOptions.IgnoreCase)

System.Text.RegularExpressions.Regex regex5 = new

System.Text.RegularExpressions.Regex(@"<frameset[\s\S]+</frameset

*>",System.Text.RegularExpressions.RegexOptions.IgnoreCase)

html = regex1.Replace(html, "")//过滤<script></script>标记

html = regex2.Replace(html, "")//过滤href=javascript: (<A>) 属性

html = regex3.Replace(html, " _disibledevent=")//过滤其它控件的on...事件

html = regex4.Replace(html, "")//过滤iframe

html = regex5.Replace(html, "")//过滤frameset

return html

}

//对输入的字符串是否含有<script></script>判断

public bool IsValidScript(string html)

{

return Regex.IsMatch(html,@"<script[\s\S]+</script *>")

}

1、过滤所有html标签的属性的正则表达式：

$search = array ("'<script[^>]*?>.*?</script>'si", // 去掉 JavaScript

"'<[\/\!]*?[^<>]*?>'si", // 去掉 HTML 标记

"'([\r\n])[\s]+'",// 去掉空白字符

"'&(quot|#34)'i",// 替换 HTML 实体

"'&(amp|#38)'i",

"'&(lt|#60)'i",

"'&(gt|#62)'i",

"'&(nbsp|#160)'i"

) // 作为 PHP 代码运行

$replace = array ("","","\\1","\"","&","<",">"," ")

$html = preg_replace($search, $replace, $html)

顶

(?<=>)[^<>]+(?=<)

假如html标签里面有一句：

String a = "<style type=\"text/css\">div \n" +

"{ margin: 0padding: 0outline: 0}</style>"

我如何把这一句取出来呢，包括标签。

用正则表达式：<style([\\s\\S]*)</style>

扩展资料：

正则表达式匹配HTML标签

方法一：

var str = '<p class="odd" id="odd">123</p>'

var pattern = /<\/?[a-zA-Z]+(\s+[a-zA-Z]+=".*")*>/g

console.log(str.match(pattern))

方法二：

var str = '<p class="odd" id="odd">123</p>'

var pattern = /<[^>]+>/g

console.log(str.match(pattern))

方法三：

var str = '<input type="text" value=">" name="username" />'

var pattern = /<(?:[^"'>]|"[^"]*"|'[^']*')*>/g

console.log(str.match(pattern))

说明：()表示捕获分组，()会把每个分组里的匹配的值保存起来，使用$n(n是一个数字，表示第n个捕获组的内容)

(?:)表示非捕获分组，和捕获分组唯一的区别在于，非捕获分组匹配的值不会保存起来

没有引用的需求的话，采用非捕获性分组，更为简洁；

方法四：

var str = '<input type="text" value=">" name="username" />'

var pattern = /<(?:[^"'>]|(["'])[^"']*\1)*>/g

console.log(str.match(pattern))

</script>

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/zaji/7207077.html

匹配所有html标记开始和结束的正则表达式

发表评论

评论列表（0条）