gsub( "[^[:alnum:]']"," ",db$text )
或者如何使用tm包保持字内短划线:
removePunctuation(db$text,preserve_intra_word_dashes = TRUE)
但我无法找到同时做到这两点的方法.例如,如果我的原始句子是:
"Interested in energy/the environment/etc.? Congrats to our new e-board! Ben,Nathan,Jenny,and Adam,y'all are sure to lead the club in a great direction next year! #obama #swag"
我希望它是:
"Interested in energy the environment etc Congrats to our new e-board Ben Nathan Jenny and Adam y'all are sure to lead the club in a great direction next year obama swag"
当然,会有额外的空白区域,但我可以在以后删除它们.
我将非常感谢你的帮助.
解决方法 使用 character classesgsub("[^[:alnum:]['-]",db$text)## "Interested in energy the environment etc Congrats to our new e-board Ben Nathan Jenny and Adam y'all are sure to lead the club in a great direction next year obama swag"总结
以上是内存溢出为你收集整理的string – 删除除撇号和R中的字内短划线之外的标点符号全部内容,希望文章能够帮你解决string – 删除除撇号和R中的字内短划线之外的标点符号所遇到的程序开发问题。
如果觉得内存溢出网站内容还不错,欢迎将内存溢出网站推荐给程序员好友。
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)