In online systems with large numbers of users, the demand for automated chatbots to serve users is increasing. Chatbot systems can be used to support or replace customer care officers in several tasks that can be automated.

For example, question-answering chatbot can answer automatically questions about the services which a company provides; a hospital can use the chatbot on its website to obtain patients’ information or to assist patients with initial information about the symptoms, or to guide procedures for registration of medical examination and treatment.

Chatbot systems communicate with humans by voice (like Siri) or by text (like chatbots developed on Facebook Messenger platform). No matter what kind of communication means, chatbot needs to understand input texts so that it can provide the right answers forcustomers. The component responsible for this work in a chatbot system is called NLU (Natural Language Understanding), which incorporates a number of natural language processing (NLP) techniques.

In this article, we introduce three basic NLP problems when one develops a chatbot system and some typical approaches. We focus onchatbot systems used in the closed domain and applyretrieval-based model. The information retrieval-based model is a model in which the chatbot provides feedback that is prepared in advance or in accordance with certain patterns. This model is different from the generative model, in which the chatbot’s responses are automatically generated by learning from a dialogue data set (read more from reference [1]). Most of chatbot systems that are deployed in practice follow information retrieval-based models and are applied in certain application domains.

The three NLP problems covered in this article are: 1) Intent classification or intent detection; 2) Information extraction; and 3) Dialogue management. At last, we also point out the challenges of developing the chatbot system and the limitations of the current technology.

User intent detection

In common, users often visit the chatbot system with a desire that the system will take action to help themselves on a certain issue. For example, users of the chatbot system which supports booking air tickets may offer their booking requests at the beginning of the conversation. To provide accurate support, the chatbot needs to determine the intent of the user. User intent detection will determine how the next conversation between the user and the chatbot will take place. Therefore, if the user intent is misinterpreted, chatbot will give incorrect responses.. At that time, the user may feel disgusted and have no intention of using the system again. The problem of detecting user intent is therefore very important in the chatbot system.

For closed domains, we can limit the number of user intentions to a finite set of defined intents, which are related to the business operations that chatbot can support. With this limitation, the problem of detecting user intents can be formalized as the text categorization problem. With input being a saying of the user, the classification system determines the intent corresponding to that saying from the set of intents that have been defined.

To build an intent classification model, we need a training dataset that includes different expressions for each intent. For example, with the same question about the weather in Hanoi today, users can use the following expressions:

– What is the weather today in Hanoi?

– Does Hanoi rain today?

– What is the temperature in Hanoi today?

– Excused me, when going out today should we bring a raincoat?

It can be said that the training step for the intent classification problem is one of the most important tasks in developing the chatbot system, and it has a huge impact on the quality of the chatbot system. This work also requires considerable time and effort of chatbot developers.

Machine learning model for the problem of categorizing user intent

Once the training data for the intent categorization problem is available, we will model the problem into a text categorization problem. Text categorization is a classic problem in the NLP field and Text mining. The text categorization model for the intent classification problem is expressed in the following form:

We are given a training set consisting of pairs (message, intent), D = {(x(1), y(1)),…, (x(n), y(n))}, where x(i) is a message and y(i) is the corresponding intent for x(i) . The intent y(i) is in a finite set Κ including intents that are defined already. We need to learn from this training data, a classification  model Θ, which is capable of classifying a new message into one of the intents in the set K. The architecture of the intent categorization system is illustrated in Figure 1.

Figure 1. Achitecture of the intent categorization system

The intent categorization system has some basic components:

  • Data pre-processing
  • Feature extraction
  • Model training
  • Categorizing

In the data pre-processing stage, we will perform the “cleaning” of data such as removing redundant information, standardizing data such as turning misspelled words into correct ones, standardizing Abbreviations, etc. Pre-processing data plays an important role in the chatbot system due to the specificity of the chatting and conversational language: abbreviation, misspelling, or “teencode”.

After pre-processing and obtaining the data that has been cleaned, we will extract the features from this data. In machine learning, this step is called feature extraction or feature engineering. In traditional machine learning model (before deep learning model is widely applied), the feature extraction step affects the accuracy of the classification model significantly. To extract good features, we need to carefully analyze dataand also to use expert knowledge in each specific application domain.

The training step uses extracted features as input and  applies machine learning algorithms to learn a classification model. Classification models may be classification rules (if using decision trees) or a weight vector corresponding to extracted features (as in logistic regression model, SVM, or Neural network).

After having an intent classification model, we can use it to classify a new message. The input message also goes through preprocessing and extraction steps, then the classification model determines the “score” for each intent in the set and gives out the intent which has the highest score.

Model based on content matching

The intent classification model based on statistical machine learning requires training data including different expressions for each intent. This training data is usually prepared manually. The data preparation step takes quite some time and effort, especially in applications where the  number of intents is relatively large.

An approach that can reduce the effort required to prepare training data is content matching approach. In this approach, we still need to prepare the data, each intent has at least one corresponding question. With a given message, we will apply a content matching algorithm to match the message with each question in the dataset. The answer to the question with the closest content to the input will be returned. In practical application, we can give list (e.g: top 3) most appropriate answers for the user to choose.

The method of matching information is quite suitable for chatbot systems used for answering frequently asked questions (FAQ). We can take advantage of existing FAQ data to create a FAQ Chatbot by content matching method without creating training data as in the statistical machine learning model.

One of the challenges of the content-matching model is thathandlingdifferent expressions for the same question requireshand-crafted rules. Since the number of samplesfor each intent is small, the matching model will have to use rules or semantic resources to handle different variations when expressing a word, phrase, or an sentence. Sentences 1) and 2) in the example below use different expressions for the same customer’s question of a telecom company about slow network condition.

  1. Ad, why is my home network so slow recently?
  2. My network lags many times, so frustrated.

In the example above, if we use the content-matching model, the system needs to recognize that the word “slow” and “lag” (the language used on the Internet) have the same meaning.

Currently, semantic resources for Vietnamese language processing are not sufficient, so the approach based on statistical machine learning model or hybrid model – combining both statistical machine learning andcontent matching may be more appropriate for Vietnamese chatbots.

Information Extraction

Besides detectingthe intent in a user’s message, we need to extract the information we need in it. The information to be extracted in a message is usually entities of certain types. For example, when a customer wants to book an airplane ticket, the system needs to know the departure and destination location, the date and time the customer wants to travel, etc. NLU components of chatbot systems usually support following entity types(read more in reference [2]):

  • Location
  • Datetime
  • Number
  • Contact
  • Distance
  • Duration
Could you please book me a flight to London on 25th this month ?

Figure 2: Assign word labels according to B-I-O model in extracting information

The input of a information extraction module is a message. The information extraction module needs to locate the entities in the statement (from start and end of entity). The following example illustrates a messageand entities extracted from that.

  • Input message: Could you please book me a flight to London on 25th this month ?
  • The message with identified entities: Could you please book me a flight to [London]LOCATION on [25th this month]TIME ?

In the sentence above there are two entities (in the [ ] with the corresponding entity types written in subscript font).

The common approach to the problem of extracting information is to formalizethe problem into a sequence labeling problem. The input of a sequence labeling problem is a sequence of words, and the output is a sequence of labels corresponding to the words in the input. We will use machine learning models to learn a labeling model from a set of input data including pairs (x1…xn, y1…yn), where x1…xn is the sequence of words, y1…yn is the sequence of labels. The length of the sequences in the dataset may vary.

In the information extraction problem, the label set for the words in the input sentence usually uses the BIO model, in which B stands for “Beginning”, I for “Inside”, and O for “Outside”. When we know the position of the first word of an entity and words within that entity, we can determine position of that entity in the sentence. In the example above, the sequence of labels corresponding to the sequence of words in the input message is illustrated in Figure 2.

The popular sequence labelingalgorithm is the Hidden Markov Models (HMM) [3], Conditional Random Fields (CRF)[4]. With textual data, CRF model usually outperforms HMM model. There are several of open sources setting CRF tool for sequence labeling problem such as CRF ++ [5], CRF Suite [6], Mallet [7], and more.

Recently, Recurrent Neural Networks have been used widely for sequence labeling. The Recurrent Neural Networks model has been proved effective with textual data because it models the dependency relationship between words in the sentence. For example, the Recurrent Neural Network is applied to POS Tagging problem or problem of named entity recognition [8].

Dialogue Management

In long conversation between a person and a chatbot, the chatbot needs to remember the context or manage dialogue states. The problem of dialogue management is important to ensure that the communication between people and machines is smooth.

The function of the dialogue management component is to receive input from NLU component, to manage dialogue states, dialogue contexts, and to transmit output to Natural Language Generation (NLG). For example, the dialogue management module in an air ticket booking chatbot needs to know when the user has provided enough information for booking tickets to create a ticket to the system or when they need to reconfirm the information put by that user. Currently, chatbot products typically use Finite State Automata (FSA) model, Frame-based model (Slot Filling), or a combination of these two models.


Figure 3: Illustration of Dialogue Management using Finite State Automata (FSA) model

FSA is the simplest dialogue management model. For example, imagine a customer care system of a telecom company, serving customers who complain about slow network issues. The task of chatbot is to ask the customer’s name, phone number, Internet package name he/she is using, the actual Internet speed. Figure 3 illustrates a dialogue management model for chatbot customer care. FSA states correspond to questions that dialogue manager asks the user. The links between the states corresponding to actions that the chatbot would take. These actions depend on user’s response to the questions. In FSA model, chatbot is the user-oriented side of the conversation.

The advantages of FSA model are simple and the chatbot will pre-define the response that user wants. However, FSA model is not really suitable for complex chatbot systems or when users offer different information in the same message. In the example above, when a user simultaneously provides both name and phone number, if the chatbot continues asking for the phone number, the user may feel uncomfortable.

A Frame-based model (also called Form-based) can solve the problem that FSA model faces. Frame-based model is based on predefined frames to navigate the conversation. Each frame contains the required slots and corresponding questions that dialogue manager asks the user. This model allows user to fill in the various slots in the frame. Figure 4 is an example of a frame for the chatbot above.

Slot Question
Full name What is your name?
Phone number What is your phone number?
Internet package name What is the name of your Internet package?
Actual Internet speed What is your current Internet speed?

Figure 4: Frame for chatbot to ask for information (in slow Internet connection example)

The dialogue manager using Framework-based model will ask questions to the customer, fill in the slots based on the information that customer provided until it collects enough necessary information. When the user answers multiple questions at the same time, the system will have to fill in the corresponding slots and remember to not ask questions that have already been answered.

In complex application domains, a dialogue can have many different frames. The problem for chatbot developers is how to know when to switch between frames. An approach commonly used to manage the change of control between frames is to define production rules. These rules are based on a number of elements, such as the last messages or questions an user has asked.


Although NLP and Machine Learning fields have improved a lot, there are still many challenges in chatbot development that researchers need to overcome. We list two issues below.

The first problem is coreference. In speaking and writing, we often use short way to address the objects we mentioned earlier. For example, while writing or speaking English, people may use pronouns like “it”, “they”, “he”, … Without contextual information and a coreference analyzer, it is very difficult for chatbots to know what/who these words refer to. Failure to identify the correct object to which these alternative words refers may cause chatbot to misunderstand user’s dialogue. This challenge is quite apparent in long conversations.

The second problem is how to reduce the effort in annotating data while developing chatbot. According to the above approaches, when developing a chatbot application, the developer needs to label the training data for the intent classifier and the named entity recognizer . In complex application domains (such as mecial and health care), it is quite expensive to create such datasets. Therefore, the development of methods to utilize available data sources in the enterprise to reduce the amount of data required to label as well as ensuring the accuracy of the natural language processing models is necessary.


In this article, we have introduced three basic NLP problems in developing chatbot systems which are used in the closed domain and follow the retrieval-based model. Within the volume of an article, we can not provide more detailed information about the mentioned models and newer approaches in chatbot development (eg, Generative Hierarchical Neural Network approach – sequence to sequence [9]). Interested readers can read more in the references.


  1. Stefan Kojouharov. “Ultimate Guide to Leveraging NLP & Machine Learning for your Chatbot”. On Chatbotlife.
  2. Pavlo Bashmakov. Advanced Natural Language Processing Tools for Bot Makers – LUIS,, and others (UPDATED).
  3. Michael Collins. Hidden Markov models and tagging (sequence labeling) problems.
  4. Michael Collins. Log-Linear Models, MEMMs, and CRFs.
  5. Taku Kudo. CRF++: Yet Another CRF toolkit.
  6. Naoaki Okazaki. CRFsuite: A fast implementation of Conditional Random Fields (CRFs).
  7. Mallet toolkit:
  8. Zhiheng Huang, Wei Xu, Kai Yu. 2015. Bidirectional LSTM-CRF Models for Sequence Tagging. On arxiv,
  9. Iulian V. Serban, Alessandro Sordoni, Yoshua Bengio, Aaron Courville, Joelle Pineau. Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models. On arxiv,
  10. Jurafsky, D., & Martin, J. H. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Chapter 24. “Dialogue and Conversational Agents”.

Pham Quang Nhat Minh – FPT Technology Research Institute (FTRI)

Related posts: