我想修改下面的脚本,以便它从脚本生成的随机数量的句子中创建段落.换句话说,在添加换行符之前连接一个随机数(如1-5)的句子.
脚本工作正常,但输出是由换行符分隔的短句.我想把一些句子收集成段落.
关于最佳实践的任何想法?谢谢.
""" from: http://code.activestate.com/recipes/194364-the-markov-chain-algorithm/?in=lang-python"""import random;import sys;stopword = "\n" # Since we split on whitespace,this can never be a wordstopsentence = (".","!","?",) # Cause a "new sentence" if found at the end of a wordsentencesep = "\n" #String used to seperate sentences# GENERATE tablew1 = stopwordw2 = stopwordtable = {}for line in sys.stdin: for word in line.split(): if word[-1] in stopsentence: table.setdefault( (w1,w2),[] ).append(word[0:-1]) w1,w2 = w2,word[0:-1] word = word[-1] table.setdefault( (w1,[] ).append(word) w1,word# Mark the end of the filetable.setdefault( (w1,[] ).append(stopword)# GENERATE SENTENCE OUTPUTmaxsentences = 20w1 = stopwordw2 = stopwordsentencecount = 0sentence = []while sentencecount < maxsentences: newword = random.choice(table[(w1,w2)]) if newword == stopword: sys.exit() if newword in stopsentence: print ("%s%s%s" % (" ".join(sentence),newword,sentencesep)) sentence = [] sentencecount += 1 else: sentence.append(newword) w1,newword
编辑01:
好吧,我拼凑了一个简单的“段落包装器”,它可以很好地将句子收集到段落中,但它与句子生成器的输出相混淆 – 我对第一个单词的重复性过高,例如,其他的问题.
但前提是声音;我只需要弄清楚为什么句子循环的功能受到段落循环的添加的影响.如果您能看到问题,请告知:
#### usage: $python markov_sentences.py < input.txt > output.txt# from: http://code.activestate.com/recipes/194364-the-markov-chain-algorithm/?in=lang-python###import random;import sys;stopword = "\n" # Since we split on whitespace,) # Cause a "new sentence" if found at the end of a wordparagraphsep = "\n\n" #String used to seperate sentences# GENERATE tablew1 = stopwordw2 = stopwordtable = {}for line in sys.stdin: for word in line.split(): if word[-1] in stopsentence: table.setdefault( (w1,[] ).append(stopword)# GENERATE ParaGRAPH OUTPUTmaxparagraphs = 10paragraphs = 0 # reset the outer 'while' loop counter to zerowhile paragraphs < maxparagraphs: # start outer loop,until maxparagraphs is reached w1 = stopword w2 = stopword stopsentence = (".",) sentence = [] sentencecount = 0 # reset the inner 'while' loop counter to zero maxsentences = random.randrange(1,5) # random sentences per paragraph while sentencecount < maxsentences: # start inner loop,until maxsentences is reached newword = random.choice(table[(w1,w2)]) # random word from word table if newword == stopword: sys.exit() elif newword in stopsentence: print ("%s%s" % (" ".join(sentence),newword),end=" ") sentencecount += 1 # increment the sentence counter else: sentence.append(newword) w1,newword print (paragraphsep) # newline space paragraphs = paragraphs + 1 # increment the paragraph counter# EOF
编辑02:
将以下句子中的句子= []添加到elif语句中.以机智;
elif newword in stopsentence: print ("%s%s" % (" ".join(sentence),end=" ") sentence = [] # I have to be here to make the new sentence start as an empty List!!! sentencecount += 1 # increment the sentence counter
编辑03:
这是此脚本的最后一次迭代.感谢悲伤帮助整理出来.我希望其他人可以玩得开心,我知道我会的. 总结
以上是内存溢出为你收集整理的python – 如何从马尔可夫链输出创建段落?全部内容,希望文章能够帮你解决python – 如何从马尔可夫链输出创建段落?所遇到的程序开发问题。
如果觉得内存溢出网站内容还不错,欢迎将内存溢出网站推荐给程序员好友。
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)