Apache OpenNLP的初探

Apache OpenNLP的初探,第1张

https://blog.csdn.net/Richard_vi/article/details/78909939?utm_medium=distribute.pc_relevant.none-task-blog-2%7Edefault%7EBlogCommendFromBaidu%7Edefault-5.control&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2%7Edefault%7EBlogCommendFromBaidu%7Edefault-5.control

 

 

环境:IDEA+jdk8+maven 3.5.2新建maven项目,添加nlp的maven依赖:

<dependency> <groupId>org.apache.opennlp</groupId> <artifactId>opennlp-tools</artifactId> <version>1.8.4</version></dependency>

然后就可以使用nlp的开发工具了。我们来看一些实例:

//divide sentences public static void SentenceDetect() throws IOException { String paragraph = "Hi. How are you? This is JD_Dog. He is my good friends.He is very kind.but he is no more handsome than me. "; InputStream is = new FileInputStream("E:\\NLP_Practics\\models\\en-sent.bin"); SentenceModel model = new SentenceModel(is); SentenceDetectorME sdetector = new SentenceDetectorME(model); String sentences[] = sdetector.sentDetect(paragraph); for (String single : sentences) { System.out.println(single); } is.close(); }

  

这是一个英文分词的实例,我们首先要去下载英文分词的模型,在这里,我将它放到了E:\NLP_Practics\models\目录下。关于更多模型的下载可以在地址:http://maven.tamingtext.com/opennlp-models/models-1.5/中找到。我们来看下对应的输出结果:

Hi. How are you? This is JD_Dog. He is my good friends.He is very kind.but he is no more handsome than me.

  是不是很神奇呢?哈哈哈也没什么可神奇的。这里只是使用现有的一个简单模型做了一个示范,模型是从大量的训练数据中具象出来的,因此分析的结果好坏还要取决于你使用的模型。我们再看一个英文分词的例子:

//devide words public static void Tokenize() throws IOException { InputStream is = new FileInputStream("E:\\NLP_Practics\\models\\en-token.bin"); TokenizerModel model = new TokenizerModel(is); Tokenizer tokenizer = new TokenizerME(model); String tokens[] = tokenizer.tokenize("Hi. How are you? This is Richard. Richard is still single. please help him find his girl"); for (String a : tokens) System.out.println(a); is.close(); }

  运行结果:

Hi . How are you ? This is Richard . Richard is still single . please help him find his girl

  

 

完整测试代码:

package package01; import opennlp.tools.sentdetect.SentenceDetectorME; import opennlp.tools.sentdetect.SentenceModel; import opennlp.tools.tokenize.Tokenizer; import opennlp.tools.tokenize.TokenizerME; import opennlp.tools.tokenize.TokenizerModel; import java.io.FileInputStream; import java.io.IOException; import java.io.InputStream; public class Test01 { //divide sentences public static void SentenceDetect() throws IOException { String paragraph = "Hi. How are you? This is JD_Dog. He is my good friends.He is very kind.but he is no more handsome than me. "; InputStream is = new FileInputStream("E:\\NLP_Practics\\models\\en-sent.bin"); SentenceModel model = new SentenceModel(is); SentenceDetectorME sdetector = new SentenceDetectorME(model); String sentences[] = sdetector.sentDetect(paragraph); for (String single : sentences) { System.out.println(single); } is.close(); } //devide words public static void Tokenize() throws IOException { InputStream is = new FileInputStream("E:\\NLP_Practics\\models\\en-token.bin"); TokenizerModel model = new TokenizerModel(is); Tokenizer tokenizer = new TokenizerME(model); String tokens[] = tokenizer.tokenize("Hi. How are you? This is Richard. Richard is still single. please help him find his girl"); for (String a : tokens) System.out.println(a); is.close(); } public static void main(String[] args) throws IOException { // Test01.SentenceDetect(); Test01.Tokenize(); } }

  

Apache OpenNLP的初探

欢迎分享,转载请注明来源:内存溢出

原文地址: https://outofmemory.cn/zaji/1006605.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-05-22
下一篇 2022-05-22

发表评论

登录后才能评论

评论列表(0条)

保存