【NLP】Representation Learning for Natural Language Processing

【NLP】Representation Learning for Natural Language Processing,第1张

  • To build an effective machine learning system, we first transform useful information on raw data into internal representations such as feature vectors.
  • Conventional machine learning systems adopt careful feature engineering as preprocessing to build feature representations from raw data.
  • The distributional hypothesis that linguistic objects with similar distributions have similar meanings is the basis for distributed word representation learning.

One-hot representation
  • assigns a unique index to each word → a high-dimensional sparse representation
  • cannot capture the semantic relatedness among words (The difference between cat and dog is as much as the difference between cat and bed in one-hot word representation)
  • inflexible to deal with new words in a real-world scenario

Representation learning
  • Representation learning aims to learn informative representations of objects from raw data automatically. /// Distributed representation has been proved to be more effificient because it usually has low dimensions that can prevent the sparsity issue.
  • Deep learning is a typical approach for representation learning.
Development of representation learning in NLP

Representation LearningMajor characteristics
N-gram ModelPredicts the next item in a sequence based on its previous n-1 items ∈ probabilistic language model
Bag-of-wordsdisregarding the orders of these words in the document: ①each word that has appeared in the document corresponds to a unique and nonzero dimension. ②a score can be computed for each word (e.g., the numbers of occurrences) to indicate the weights
TF-IDFBoW → Moreover, researchers usually take the importance of different words into consideration, rather than treat all the words equally
Neural Probabilistic Language Model (NPLM)NPLM first assigns a distributed vector for each word, then uses a neural network to predict the next word. 例如,前馈神经网络语言模型、循环神经网络语言模型、长短期记忆的循环神经网络语言模型。

Word embeddings: 

Word2Vec, GloVe, fastText

Inspired by NPLM, there came many methods that embed words into distributed representations. ///  Word embeddings in the NLP pipeline map discrete words into informative low-dimensional vectors.

Pre-trained Language Models (PLM):

ELMo, BERT

take complicated context in text into consideration /// calculate dynamic representations for the words based on their context, which is especially useful for the words with multiple meanings /// pretrained fine-tuning pipeline

The Pre-trained language model family


Applications Neural Relation Extraction
  • Sentence-Level NRE: A basic form of sentence-level NRE consists of three components: (a) an input encoder to give a representation for each input word (Word Embeddings, Position Embeddings, Part-of-speech (POS) Tag Embeddings, WordNet Hypernym Embeddings)(b) a sentence encoder which computes either a single vector or a sequence of vectors to represent the original sentence. (c) a relation classifier which calculates the conditional probability distribution of all relations.

  • Bag-Level NRE: utilizing information from multiple sentences (bag-level) rather than a single sentence (sentence-level) to decide if a relation holds between two entities. A basic form of bag-level NRE consists of four components: (a) an input encoder similar to sentence-level NRE, (b) a sentence encoder similar to sentence-level NRE, (c) a bag encoder which computes a vector representing all related sentences in a bag, and (d) a relation classifier similar to sentence-level NRE which takes bag vectors as input instead of sentence vectors.

Topic Model
  • Topic modeling algorithms do not require any prior annotations or labeling of the documents. 

主题模型∈生成模型,一篇文章中每个词都是通过 “以一定概率选择某个主题,

并从这个主题中以一定概率选择某个词语” 这样一个过程得到的。

for each document in the collection, we generate the words in a two-stage process:

1. Randomly choose a distribution over topics.

2. For each word in the document,

    • Randomly choose a topic from the distribution over topics in step #1.

    • Randomly choose a word from the corresponding distribution over the vocabulary.

欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/langs/723477.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-04-26
下一篇 2022-04-26

发表评论

登录后才能评论

评论列表(0条)

保存