- To build an effective machine learning system, we first transform useful information on raw data into internal representations such as feature vectors.
- Conventional machine learning systems adopt careful feature engineering as preprocessing to build feature representations from raw data.
- The distributional hypothesis that linguistic objects with similar distributions have similar meanings is the basis for distributed word representation learning.
One-hot representation
- assigns a unique index to each word → a high-dimensional sparse representation
- cannot capture the semantic relatedness among words (The difference between cat and dog is as much as the difference between cat and bed in one-hot word representation)
- inflexible to deal with new words in a real-world scenario
Representation learning
- Representation learning aims to learn informative representations of objects from raw data automatically. /// Distributed representation has been proved to be more effificient because it usually has low dimensions that can prevent the sparsity issue.
- Deep learning is a typical approach for representation learning.
Representation Learning | Major characteristics |
---|---|
N-gram Model | Predicts the next item in a sequence based on its previous n-1 items ∈ probabilistic language model |
Bag-of-words | disregarding the orders of these words in the document: ①each word that has appeared in the document corresponds to a unique and nonzero dimension. ②a score can be computed for each word (e.g., the numbers of occurrences) to indicate the weights |
TF-IDF | BoW → Moreover, researchers usually take the importance of different words into consideration, rather than treat all the words equally |
Neural Probabilistic Language Model (NPLM) | NPLM first assigns a distributed vector for each word, then uses a neural network to predict the next word. 例如,前馈神经网络语言模型、循环神经网络语言模型、长短期记忆的循环神经网络语言模型。 |
Word embeddings: Word2Vec, GloVe, fastText | Inspired by NPLM, there came many methods that embed words into distributed representations. /// Word embeddings in the NLP pipeline map discrete words into informative low-dimensional vectors. |
Pre-trained Language Models (PLM): ELMo, BERT | take complicated context in text into consideration /// calculate dynamic representations for the words based on their context, which is especially useful for the words with multiple meanings /// pretrained fine-tuning pipeline |
Applications Neural Relation Extraction
-
Sentence-Level NRE: A basic form of sentence-level NRE consists of three components: (a) an input encoder to give a representation for each input word (Word Embeddings, Position Embeddings, Part-of-speech (POS) Tag Embeddings, WordNet Hypernym Embeddings). (b) a sentence encoder which computes either a single vector or a sequence of vectors to represent the original sentence. (c) a relation classifier which calculates the conditional probability distribution of all relations.
-
Bag-Level NRE: utilizing information from multiple sentences (bag-level) rather than a single sentence (sentence-level) to decide if a relation holds between two entities. A basic form of bag-level NRE consists of four components: (a) an input encoder similar to sentence-level NRE, (b) a sentence encoder similar to sentence-level NRE, (c) a bag encoder which computes a vector representing all related sentences in a bag, and (d) a relation classifier similar to sentence-level NRE which takes bag vectors as input instead of sentence vectors.
- Topic modeling algorithms do not require any prior annotations or labeling of the documents.
主题模型∈生成模型,一篇文章中每个词都是通过 “以一定概率选择某个主题,
并从这个主题中以一定概率选择某个词语” 这样一个过程得到的。
for each document in the collection, we generate the words in a two-stage process:
1. Randomly choose a distribution over topics.
2. For each word in the document,
• Randomly choose a topic from the distribution over topics in step #1.
• Randomly choose a word from the corresponding distribution over the vocabulary.
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)