python自然语言处理的第10章 分析句子的意思,这一章在拆解英语的语法及对应于计算机语言的概念和内容变得更深更多,相应地需要理解和记忆的东西增多。整章读下来很吃力,可能大致有了个理解。
我们已经有了分析器和基于特征的语法,我们能否做一些类似分析句子的意思这样有用的事情?
本章的目的是要回答下列问题:我们如何能表示自然语言的意思,使计算机能够处理这些表示?我们怎样才能将意思表示与无限的句子集合关联?我们怎样才能使用程序来连接句子的意思表示到知识的存储?import nltk
这里写目录标题1 自然语言理解1.1 查询数据库1.2 自然语言、语义和逻辑2命题逻辑3 一阶逻辑3.1 句法3.2 一阶定理证明3.3 一阶逻辑语言总结3.4 真值模型3.5 独立变量和赋值3.6 量化3.7 量词范围歧义3.8 模型的建立4 英语句子的语义4.1 基于特征的文法中的合成语义学4.2 λ演算4.3 量化的NP4.4 及物动词4.5 再述量词歧义5段落语义层5.1 段落表示理论5.2 段落处理6 小结1 自然语言理解1.1 查询数据库"""文法sql0.fcfg说明如何将句子意思表示与句子分析串联组装。每个短语结构规则为特征SEM构建值作补充。你可以看到这些补充非常简单;在每一种情况下,我们对分割的子成分用字符串连接 *** 作+来组成父成分的值。"""import nltknltk.data.show_cfg('grammars/book_grammars/sql0.fcfg')"""% start SS[SEM=(?np + WHERE + ?vp)] -> NP[SEM=?np] VP[SEM=?vp]VP[SEM=(?v + ?pp)] -> IV[SEM=?v] PP[SEM=?pp]VP[SEM=(?v + ?ap)] -> IV[SEM=?v] AP[SEM=?ap]NP[SEM=(?det + ?n)] -> Det[SEM=?det] N[SEM=?n]PP[SEM=(?p + ?np)] -> P[SEM=?p] NP[SEM=?np]AP[SEM=?pp] -> A[SEM=?a] PP[SEM=?pp]NP[SEM='Country="greece"'] -> 'Greece'NP[SEM='Country="china"'] -> 'China'Det[SEM='SELECT'] -> 'Which' | 'What'N[SEM='City FROM city_table'] -> 'citIEs'IV[SEM=''] -> 'are'A -> 'located'P[SEM=''] -> 'in'"""
# 这使我们能够分析SQL查询:from nltk import load_parsercp = load_parser('grammars/book_grammars/sql0.fcfg')query = 'What citIEs are located in China'trees = next(cp.parse(query.split()))answer = trees[0].label()['SEM']answer2 = trees[1].label()['SEM']q = ' 'q = ' '.join(answer) + " WHERE" q = q + ' '.join(answer2)print(q)# SELECT City FROM city_table WHERE Country="china"
# 最后,我们在数据库city.db上执行查询,检索出一些结果from nltk.sem import chat80rows = chat80.sql_query('corpora/city_database/city.db',q)for r in rows: print(r[0]) # 由于每行r是一个单元素的元组,我们输出元组的成员,而不是元组本身"canton chungking dairen harbin kowloon mukden peking shanghai sian tIEntsin"
在本章后面,我们将使用模型来帮助评估英语句子的真假,并用这种方式来说明表示意思的一些方法。然而,在进入更多细节之前,让我们将从更广阔的角度进行讨论,回到我们在1.5节简要提到过的主题。一台计算机可以理解句子的意思吗?我们该如何判断它是否能理解﹖这类似与问:“计算机能思考吗?”2命题逻辑命题逻辑使我们能只表示语言结构的对应与句子的特定连接词的那些部分。命题逻辑的基本表达式是命题符号,通常写作P、Q、R等。
# NLTKs Expression object can process logical Expressions into varIoUs subclasses of Expressionimport nltkread_expr = nltk.sem.Expression.fromstringprint(read_expr('-(P & Q)'))print(read_expr('P & Q'))print(read_expr('P <-> -- P'))
From a computational perspective,** logics give us an important tool for performing inference**.
Suppose you state that Freedonia is not to the north of Sylvania, and you give as your reasons that Sylvania is to the north of Freedonia. In this case, you have produced an argument.The sentence Sylvania is to the north of Freedonia is the assumption of the argument while Freedonia is not to the north of Sylvania is the conclusion.The step of moving from one or more ssumptions to a conclusion is called inference. Informally, it is common to write arguments in a format where the conclusion is preceded by therefore."""参数可以通过使用证明系统来测试“句法有效性”。在稍后的10.3节中我们会再多讲一些这个。NLTK中的 inference模块通过一个第三方定理证明器Prover9的接口,可以进行逻辑证明。推理机制的输入首先必须用LogicParser()分析成逻辑表达式。"""lp = nltk.sem.Expression.fromstringSnF = read_expr('SnF')NotFnS = read_expr('-FnS')R = read_expr('SnF -> -FnS')prover = nltk.Prover9()prover.prove(NotFnS, [SnF, R])
一个命题逻辑的模型需要为每个可能的公式分配值True或False。
val = nltk.Valuation([('P', True), ('Q', True), ('R', False)])"""我们使用一个配对的链表初始化一个估值,每个配对由一个语义符号和一个语义值组成。所产生的对象基本上只是一个字典,映射逻辑符号(作为字符串处理)为适当的值"""val['P']
dom = set()g = nltk.Assignment(dom)
# 让我们用val初始化模型 m:m = nltk.Model(dom, val) # 每一个模型都有一个evaluate()方法,可以确定逻辑表达式,如命题逻辑的公式,的语义值;# 当然,这些值取决于最初我们分配给命题符号如P、Q和R的真值。print(m.evaluate('(P & Q)', g))print(m.evaluate('(P & R)', g))print(m.evaluate('(P | R)', g))
3 一阶逻辑本章的剩余部分,我们将通过翻译自然语言表达式为一阶逻辑来表示它们的意思。
下一步我们将描述如何构造一阶逻辑公式,然后是这样的公式如何用来评估模型。
read_expr = nltk.sem.Expression.fromstringexpr = read_expr('walk(angus)', type_check=True)print(expr.argument)print(expr.argument.type)print(expr.function)print(expr.function.type)
为什么我们在这个例子的结尾看到<e,?>呢?
虽然类型检查器会尝试推断出尽可能多的类型,在这种情况下,它并没有能够推断出 walk的类型,所以其结果的类型是未知的。
虽然我们期望walk的类型是<e, t>,迄今为止类型检查器知道的,在这个上下文中可能是一些其他类型,如<e,e>或<e,<e, t>>。
要帮助类型检查器,我们需要指定一个信号,作为一个字典来实施,明确的与非逻辑常量类型关联:
sig = {'walk': '<e, t>'}expr = read_expr('walk(angus)', signature=sig)expr.function.type
>>> read_expr = nltk.sem.Expression.fromstring>>> read_expr('dog(cyril)').free()set()>>> read_expr('dog(x)').free(){Variable('x')}>>> read_expr('own(angus, cyril)').free()set()>>> read_expr('exists x.dog(x)').free()set()>>> read_expr('((some x. walk(x)) -> sing(x))').free(){Variable('x')}>>> read_expr('exists x.own(y, x)').free()
>>> read_expr('dog(x)').free(){Variable('x')}>>> read_expr('own(angus, cyril)').free()set()>>> read_expr('exists x.dog(x)').free()set()>>> read_expr('((some x. walk(x)) -> sing(x))').free(){Variable('x')}>>> read_expr('exists x.own(y, x)').free()
3.2 一阶定理证明>>> NotFnS = read_expr('-north_of(f, s)') # [1]>>> SnF = read_expr('north_of(s, f)') # [2]>>> R = read_expr('all x. all y. (north_of(x, y) -> -north_of(y, x))') # [3]>>> prover = nltk.Prover9() # [4]>>> prover.prove(NotFnS, [SnF, R]) # [5]
3.3 一阶逻辑语言总结3.4 真值模型>>> v = """... bertIE => b... olive => o... cyril => c... boy => {b}... girl => {o}... dog => {c}... walk => {o, c}... see => {(b, o), (c, b), (o, c)}... """>>> val = nltk.Valuation.fromstring(v)>>> print(val){'bertIE': 'b', 'boy': {('b',)}, 'cyril': 'c', 'dog': {('c',)}, 'girl': {('o',)}, 'olive': 'o', 'see': {('o', 'c'), ('c', 'b'), ('b', 'o')}, 'walk': {('c',), ('o',)}}
3.5 独立变量和赋值3.6 量化
>>> fmla1 = read_expr('girl(x) | boy(x)')>>> m.satisfIErs(fmla1, 'x', g){'b', 'o'}>>> fmla2 = read_expr('girl(x) -> walk(x)')>>> m.satisfIErs(fmla2, 'x', g){'c', 'b', 'o'}>>> fmla3 = read_expr('walk(x) -> girl(x)')>>> m.satisfIErs(fmla3, 'x', g){'b', 'o'}
3.7 量词范围歧义>>> v2 = """... bruce => b... elspeth => e... julia => j... matthew => m... person => {b, e, j, m}... admire => {(j, b), (b, b), (m, e), (e, m)}... """>>> val2 = nltk.Valuation.fromstring(v2)
>>> dom2 = val2.domain>>> m2 = nltk.Model(dom2, val2)>>> g2 = nltk.Assignment(dom2)>>> fmla4 = read_expr('(person(x) -> exists y.(person(y) & admire(x, y)))')>>> m2.satisfIErs(fmla4, 'x', g2)
>>> fmla5 = read_expr('(person(y) & all x.(person(x) -> admire(x, y)))')>>> m2.satisfIErs(fmla5, 'y', g2)set()
>>> fmla6 = read_expr('(person(y) & all x.((x = bruce | x = julia) -> admire(x, y)))')>>> m2.satisfIErs(fmla6, 'y', g2){'b'}
3.8 模型的建立>>> a3 = read_expr('exists x.(man(x) & walks(x))')>>> c1 = read_expr('mortal(socrates)')>>> c2 = read_expr('-mortal(socrates)')>>> mb = nltk.Mace(5)>>> print(mb.build_model(None, [a3, c1]))True>>> print(mb.build_model(None, [a3, c2]))True>>> print(mb.build_model(None, [c1, c2]))False
>>> a4 = read_expr('exists y. (woman(y) & all x. (man(x) -> love(x,y)))')>>> a5 = read_expr('man(adam)')>>> a6 = read_expr('woman(eve)')>>> g = read_expr('love(adam,eve)')>>> mc = nltk.MaceCommand(g, assumptions=[a4, a5, a6])>>> mc.build_model()True
>>> print(mc.valuation){'C1': 'b', 'adam': 'a', 'eve': 'a', 'love': {('a', 'b')}, 'man': {('a',)}, 'woman': {('a',), ('b',)}}
>>> a7 = read_expr('all x. (man(x) -> -woman(x))')>>> g = read_expr('love(adam,eve)')>>> mc = nltk.MaceCommand(g, assumptions=[a4, a5, a6, a7])>>> mc.build_model()True>>> print(mc.valuation){'C1': 'c', 'adam': 'a', 'eve': 'b', 'love': {('a', 'c')}, 'man': {('a',)}, 'woman': {('c',), ('b',)}}
On reflection, we can see that there is nothing in our premises which says that Eve is the only woman in the domain of discourse, so the countermodel in fact is acceptable. If we wanted to rule it out, we would have to add a further assumption such as exists y. all x. (woman(x) -> (x = y)) to ensure that there is only one woman in the model.
4 英语句子的语义4.1 基于特征的文法中的合成语义学@H_639_403@
4.2 λ演算>>> read_expr = nltk.sem.Expression.fromstring>>> expr = read_expr(r'\x.(walk(x) & chew_gum(x))')>>> expr<LambdaExpression \x.(walk(x) & chew_gum(x))>>>> expr.free()set()>>> print(read_expr(r'\x.(walk(x) & chew_gum(y))'))\x.(walk(x) & chew_gum(y))
>>> expr = read_expr(r'\x.(walk(x) & chew_gum(x))(gerald)')>>> print(expr)\x.(walk(x) & chew_gum(x))(gerald)>>> print(expr.simplify()) [1](walk(gerald) & chew_gum(gerald))
>>> print(read_expr(r'\x.\y.(dog(x) & own(y, x))(cyril)').simplify())\y.(dog(cyril) & own(y,cyril))>>> print(read_expr(r'\x y.(dog(x) & own(y, x))(cyril, angus)').simplify()) [1](dog(cyril) & own(angus,cyril))
>>> expr1 = read_expr('exists x.P(x)')>>> print(expr1)exists x.P(x)>>> expr2 = expr1.Alpha_convert(nltk.sem.Variable('z'))>>> print(expr2)exists z.P(z)>>> expr1 == expr2True
>>> expr3 = read_expr('\P.(exists x.P(x))(\y.see(y, x))')>>> print(expr3)(\P.exists x.P(x))(\y.see(y,x))>>> print(expr3.simplify())exists z1.see(z1,x)
4.3 量化的NP4.4 及物动词
>>> read_expr = nltk.sem.Expression.fromstring>>> tvp = read_expr(r'\X x.X(\y.chase(x,y))')>>> np = read_expr(r'(\P.exists x.(dog(x) & P(x)))')>>> vp = nltk.sem.ApplicationExpression(tvp, np)>>> print(vp)(\X x.X(\y.chase(x,y)))(\P.exists x.(dog(x) & P(x)))>>> print(vp.simplify())\x.exists z2.(dog(z2) & chase(x,z2))
>>> from nltk import load_parser>>> parser = load_parser('grammars/book_grammars/simple-sem.fcfg', trace=0)>>> sentence = 'Angus gives a bone to every dog'>>> tokens = sentence.split()>>> for tree in parser.parse(tokens):... print(tree.label()['SEM'])all z2.(dog(z2) -> exists z1.(bone(z1) & give(angus,z1,z2)))
>>> sents = ['Irene walks', 'Cyril bites an ankle']>>> grammar_file = 'grammars/book_grammars/simple-sem.fcfg'>>> for results in nltk.interpret_sents(sents, grammar_file):... for (synrep, semrep) in results:... print(synrep)(S[SEM=<walk(irene)>] (NP[-LOC, NUM='sg', SEM=<\P.P(irene)>] (PropN[-LOC, NUM='sg', SEM=<\P.P(irene)>] Irene)) (VP[NUM='sg', SEM=<\x.walk(x)>] (IV[NUM='sg', SEM=<\x.walk(x)>, TNS='pres'] walks)))(S[SEM=<exists z3.(ankle(z3) & bite(cyril,z3))>] (NP[-LOC, NUM='sg', SEM=<\P.P(cyril)>] (PropN[-LOC, NUM='sg', SEM=<\P.P(cyril)>] Cyril)) (VP[NUM='sg', SEM=<\x.exists z3.(ankle(z3) & bite(x,z3))>] (TV[NUM='sg', SEM=<\X x.X(\y.bite(x,y))>, TNS='pres'] bites) (NP[NUM='sg', SEM=<\Q.exists x.(ankle(x) & Q(x))>] (Det[NUM='sg', SEM=<\P Q.exists x.(P(x) & Q(x))>] an) (Nom[NUM='sg', SEM=<\x.ankle(x)>] (N[NUM='sg', SEM=<\x.ankle(x)>] ankle)))))
>>> v = """... bertIE => b... olive => o... cyril => c... boy => {b}... girl => {o}... dog => {c}... walk => {o, c}... see => {(b, o), (c, b), (o, c)}... """>>> val = nltk.Valuation.fromstring(v)>>> g = nltk.Assignment(val.domain)>>> m = nltk.Model(val.domain, val)>>> sent = 'Cyril sees every boy'>>> grammar_file = 'grammars/book_grammars/simple-sem.fcfg'>>> results = nltk.evaluate_sents([sent], grammar_file, m, g)[0]>>> for (syntree, semrep, value) in results:... print(semrep)... print(value)all z4.(boy(z4) -> see(cyril,z4))True
4.5 再述量词歧义>>> from nltk.sem import cooper_storage as cs>>> sentence = 'every girl chases a dog'>>> trees = cs.parse_with_bindops(sentence, grammar='grammars/book_grammars/storage.fcfg')>>> semrep = trees[0].label()['SEM']>>> cs_semrep = cs.CooperStore(semrep)>>> print(cs_semrep.core)chase(z2,z4)>>> for bo in cs_semrep.store:... print(bo)bo(\P.all x.(girl(x) -> P(x)),z2)bo(\P.exists x.(dog(x) & P(x)),z4)
>>> cs_semrep.s_retrIEve(trace=True)Permutation 1 (\P.all x.(girl(x) -> P(x)))(\z2.chase(z2,z4)) (\P.exists x.(dog(x) & P(x)))(\z4.all x.(girl(x) -> chase(x,z4)))Permutation 2 (\P.exists x.(dog(x) & P(x)))(\z4.chase(z2,z4)) (\P.all x.(girl(x) -> P(x)))(\z2.exists x.(dog(x) & chase(z2,x)))
>>> for reading in cs_semrep.readings:... print(reading)exists x.(dog(x) & all z3.(girl(z3) -> chase(z3,x)))all x.(girl(x) -> exists z4.(dog(z4) & chase(x,z4)))
5段落语义层段落中的一个句子的解释依赖它前面的句子。
5.1 段落表示理论"""为了解析文法 drt.fcfg,我们在 load_parser()调用中指定特征结构中的SEM值用DrtParser解析替代默认的LogicParser。""">>> from nltk import load_parser>>> parser = load_parser('grammars/book_grammars/drt.fcfg', logic_parser=nltk.sem.drt.DrtParser())>>> trees = List(parser.parse('Angus owns a dog'.split()))>>> print(trees[0].label()['SEM'].simplify())([x,z2],[Angus(x), dog(z2), own(x,z2)])
5.2 段落处理我们解释一句话时会使用丰富的上下文知识,一部分取决于前面的内容,一部分取决于我们的背景假设。DRT提供了一个句子的含义如何集成到前面段落表示中的理论,但是在前面的讨论中明显缺少这两个部分。首先,一直没有尝试纳入任何一种推理;第二,我们只处理了个别句子。这些遗漏由模块nltk.inference.discourse纠正。
>>> dt = nltk.discourseTester(['A student dances', 'Every student is a person'])>>> dt.readings()s0 readings:s0-r0: exists x.(student(x) & dance(x))s1 readings:s1-r0: all x.(student(x) -> person(x))
>>> dt.add_sentence('No person dances', consistchk=True)Inconsistent discourse: d0 ['s0-r0', 's1-r0', 's2-r0']: s0-r0: exists x.(student(x) & dance(x)) s1-r0: all x.(student(x) -> person(x)) s2-r0: -exists x.(person(x) & dance(x))
>>> dt.retract_sentence('No person dances', verbose=True)Current sentences ares0: A student dancess1: Every student is a person
>>> dt.add_sentence('A person dances', informchk=True)Sentence 'A person dances' under reading 'exists x.(person(x) & dance(x))':Not informative relative to thread 'd0'
discourse模块可适应语义歧义,筛选出不可接受的读法。下面的例子调用glue语义和 DRT。由于glue语义模块被配置为使用的覆盖面广的Malt依存关系分析器,输入(Every dog chases a boy.He runs.)需要分词和标注。
>>> from nltk.tag import RegexpTagger>>> tagger = RegexpTagger(... [('^(chases|runs)$', 'VB'),... ('^(a)$', 'ex_quant'),... ('^(every)$', 'univ_quant'),... ('^(dog|boy)$', 'NN'),... ('^(He)$', 'PRP')... ])>>> rc = nltk.DrtglueReadingCommand(depparser=nltk.MaltParser(tagger=tagger))>>> dt = nltk.discourseTester(['Every dog chases a boy', 'He runs'], rc)>>> dt.readings()s0 readings:s0-r0: ([],[(([x],[dog(x)]) -> ([z3],[boy(z3), chases(x,z3)]))])s0-r1: ([z4],[boy(z4), (([x],[dog(x)]) -> ([],[chases(x,z4)]))])s1 readings:s1-r0: ([x],[PRO(x), runs(x)])
>>> dt.readings(show_thread_readings=True)d0: ['s0-r0', 's1-r0'] : INVALID: AnaphoraResolutionExceptiond1: ['s0-r1', 's1-r0'] : ([z6,z10],[boy(z6), (([x],[dog(x)]) ->([],[chases(x,z6)])), (z10 = z6), runs(z10)])
>>> dt.readings(show_thread_readings=True, filter=True)d1: ['s0-r1', 's1-r0'] : ([z12,z15],[boy(z12), (([x],[dog(x)]) ->([],[chases(x,z12)])), (z17 = z12), runs(z15)])
6 小结 总结 以上是内存溢出为你收集整理的python自然语言处理 |分析句子的意思全部内容,希望文章能够帮你解决python自然语言处理 |分析句子的意思所遇到的程序开发问题。
如果觉得内存溢出网站内容还不错,欢迎将内存溢出网站推荐给程序员好友。
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)