ruby – 打印文件中最常用的n个单词(字符串)

ruby – 打印文件中最常用的n个单词(字符串),第1张

概述目标: Write a function that takes two parameters: (1) a String representing a text document and (2) an integer providing the number of items to return. Implement the function such that it returns a list 目标:

Write a function that takes two parameters: (1) a String representing a text document and (2) an integer provIDing the number of items to return. Implement the function such that it returns a List of Strings ordered by word frequency,the most frequently occurring word first. Use your best judgement to decIDe how words are separated. Your solution should run in O(n) time where n is the number of characters in the document.

我的想法是,在最坏的情况下,函数的输入可以是文档中的单词总数,从而减少了按照频率对单词进行排序的问题.这使我认为如果我使用比较排序方法,时间复杂度的下限将是O(n log n).所以,我的想法是,最好的方法是实现计数排序.这是我的代码.

我想告诉我,我的分析是否正确,我已经用我对时间复杂度的概念注释了代码,但它肯定是不正确的.这段代码的实际时间和空间复杂度是多少?如果有任何替代方法可以在实践中使用,我还想听听这是否是一个好方法.

### n is number of characters in string,k is number of words ###def word_frequencIEs(string,n)  words = string.split(/\s/)  # O(n)  max = 0  min = float::INFINITY  frequencIEs = words.inject(Hash.new(0)) do |hash,word|  # O(k)    occurrences = hash[word] += 1                     # O(1)    max = occurrences if occurrences > max            # O(1)    min = occurrences if occurrences < min            # O(1)    hash;                                             # O(1)    end  ### perform a counting sort ###  sorted = Array.new(max + words.length)  delta = 0  frequencIEs.each do |word,frequency|   #O(k)    p word + "--" + frequency.to_s    index = frequency    if sorted[index]      sorted[index] = sorted[index].push(word)  # ??? I think O(1).    else      sorted[index] = [word]                    # O(1)    end  end  return sorted.compact.flatten[-n..-1].reverse     ### Compact is O(k).  Flatten is O(k).  Reverse is O(k). So O(3k)end### Total --- O(n + 5k) = O(n).  Correct? ### And the space complexity is O(n) for the hash + O(2k) for the sorted array.  ### So total O(n).text = "hi hello hi my name is what what hi hello hi this is a test test test test hi hi hi what hello these are some words these these"p word_frequencIEs(text,4)
解决方法 两种方式:

def word_counter(string,max)  string.split(/\s+/)        .group_by{|x|x}        .map{|x,y|[x,y.size]}         .sort_by{|_,size| size} # Have to sort =/        .last(max)enddef word_counter(string,max)  # Create a Hash and a List to store values in.  word_counter,max_storage = Hash.new(0),[]  #Split the string an and add each word to the hash:  string.split(/\s+/).each{|word| word_counter[word] += 1}  # Take each word and add it to the List (so that the List_index = word_count)  # I also add the count,but that is not really needed  word_counter.each{|key,val| max_storage[val] = [*max_storage[val]] << [key,val]}  # Higher count will always be at the end,remove nils and get the last "max" elements.  max_storage.compact.flatten(1).last(max)end
总结

以上是内存溢出为你收集整理的ruby – 打印文件中最常用的n个单词(字符串)全部内容,希望文章能够帮你解决ruby – 打印文件中最常用的n个单词(字符串)所遇到的程序开发问题。

如果觉得内存溢出网站内容还不错,欢迎将内存溢出网站推荐给程序员好友。

欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/langs/1290988.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-06-09
下一篇 2022-06-09

发表评论

登录后才能评论

评论列表(0条)

保存