Attentive Neural Network for Named Entity Recognition in Vietnamese

39

Abstract

The paper proposes an attentive neural network for the task of named entity recognition in Vietnamese. The proposed attentive neural model makes use of character-based language models and word embeddings to encode words as vector representations. A neural network architecture of encoder, attention, and decoder layers is then utilized to encode knowledge of input sentences and to label entity tags. The experimental results show that the proposed attentive neural network achieves the state-of-the-art results on the benchmark named entity recognition datasets in Vietnamese in comparison to both hand-crafted features based models and neural models. Index Terms—named entity recognition, neural network, conditional random fields.

Introduction

Named entity recognition (NER) is one of fundamental sequence labeling tasks as well as other tasks such as word segmentation, part-of-speech (POS) tagging, or noun phrase chunking. The NER task aims to identify named entities in the given texts and then to assign named entities to particular entity types such as location, organization or person name. NER task plays a crucial role in natural language understanding and downstream applications such as relation extraction, entity linking, question answering, or machine translation.

In the previous studies, NER approaches make use of linear statistical models to label entity tags such as hidden Markov models (HMM), maximum entropy models (ME), or conditional random fields (CRF) ([1]). However, most those kinds of models rely heavily on hand-crafted features and taskspecific resources, leading that those models are difficult to adapt to new tasks or to shift to new domains. For example, in English, orthographic features and external resources of gazetteers are commonly used in NER task. For Vietnamese, the approach in [2], [3] used the information of word, word shapes, part-of-speech tags, chunking tags as hand-crafted features for CRF to label entity tags.

In the past few years, neural networks for NER have been proposed to deal with drawbacks of statistical-based NER models by extracting automatically features instead creating heavily hand-crafted features. Neural architectures for NER often make use of the combination of either recurrent neural network (RNN) and CRF or convolution neural network (CNN) and CRF to extract automatically information from the inputs and detect NER labels. Reference [4], among others, proposed a neural architecture by using recurrent neural network with long short-term memory units (LSTM) ([5]) and CRF to label NER tags. Moreover, the combination of bidirectional LSTM, CNN, and CRF is introduced to obtain benefits from both word- and character-level representations automatically for detecting NER labels as in [6]. Recently, as in [7], a combination of language model (LM), LSTM, and CRF is used to extract knowledge from raw texts and empower the sequence labeling task including NER task. For Vietnamese, a non-hand-crafted feature based model which is combination of LSTM, CNN, and CRF is applied to solve the task of Vietnamese NER as in [8]. Moreover, ZA-NER model ([9]) which is based on a combination of bidirectional LSTM and CRF is proposed to extract named entities.

This paper introduces an attentive neural network (VNER) for Vietnamese NER task without using any handcrafted features or task-specific resources. In the proposed neural network, the authors incorporate a neural language model to encode the character-based words. Similar to [7], the prediction of the next character in the language model is adapted to predict the next word. Moreover, the pre-trained word embeddings are also utilized to extract knowledge from word level. The concatenation of character-based word and pre-trained word embedding is then used as the vector representation of a wordor token-layer. A bidirectional LSTM is then applied as an encoder layer to encode the knowledge of the input sentence. Then make use of a LSTM as a decoder together with an attention mechanism to decode the outputs of encoder layer. Finally, a CRF layer is used to model context dependencies and entity labels.

For the experiment, authors evaluate the VNER model on two benchmark datasets of Vietnamese NER task which are VLSP2016 ([10]) and VLSP-20181 NER datasets. The experimental results show that the VNER model achieves the state-of-theart results compared to both hand-crafted based models and neural models.

See more HERE.

Related posts: