最近在数据处理时,用到了正则匹配,在数据 Column 类型处理时用到的是 regexp_extract,其中具体方法,如下
def regexp_extract(e: Column, exp: String, groupIdx: Int): Column = withExpr { RegExpExtract(e.expr, lit(exp).expr, lit(groupIdx).expr) }
不过,直接使用scala的正则是Regex类
scala.util.matching.Regex
记录下测试方法:
package com.qihoo.icebase.apollo.test import scala.util.matching.Regex object TestRegex { def main(args: Array[String]): Unit = { val logInfo = "requestURI:/c?app=2&p=3&did=14 test(Datetime) 0042334&industry=42Ztest(DatetimeCCCD)" val regSameTokenProc: Regex = """test(([w:.><-s\/]*))""".r println("findFirstIn:------" + regSameTokenProc.findFirstIn(logInfo).getOrElse("")) println("findFirstMatchIn.get.group:------" + regSameTokenProc.findFirstMatchIn(logInfo).getOrElse(null)) val matchResult: Regex.Match = regSameTokenProc.findFirstMatchIn(logInfo).getOrElse(null) if (matchResult != null) { println("match", matchResult.group(1)) } else { println("match null") } println("nfindAllIn:") regSameTokenProc.findAllIn(logInfo).toList.foreach(println(_)) println("nfindAllMatchIn:") regSameTokenProc.findAllMatchIn(logInfo).foreach(item => println(item.group(1))) println("n") val date = """(dddd)-(dd)-(dd)""".r "2015-05-23" match { case date(year, month, day) => println(year, month, day) } "2014-05-23" match { case date(year, month, _*) => println("The year of the date is " + year) } "2014-05-23" match { case date(_*) => println("It is a date") } } }
测试结果数据:
findFirstIn:------test(Datetime) findFirstMatchIn.get.group:------test(Datetime) (match,Datetime) findAllIn: test(Datetime) test(DatetimeCCCD) findAllMatchIn: Datetime DatetimeCCCD (2015,05,23) The year of the date is 2014 It is a date
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)