How to build a AI chatbot using NLTK and Deep Learning.

In this article, we are going to build a simple but efficient AI Chatbot using Python, NLTK, TensorFlow, and Neural networks.

Chatbots are really helpful these days. Together with Artificial Intelligence and Machine Learning chatbots can interact with humans like how humans interact with each other. The implementation of chatbots is helpful in many cases from customer support to personal assistants. So building your own chatbot for your personal uses or for business makes sense. In this article, we are going to build a simple but efficient AI Chatbot using Python, NLTK, TensorFlow, and Neural networks. This chatbot is highly customizable and can make changes as you want.

Concept:

The Chatbot works based on DNN(Deep Neural Network) to identify the patterns of sentences given by the user as input and pick a random response related to that query. The NLTK Library in Python has functions that help to figure out the most relevant words from a sentence or paragraph and stem the words into their root meaning and can reduce them,(for instance, the root meaning or stem of the word 'going' is 'go'). This process is known as StemmingThe words are then converted into their corresponding numerical values since the Neural Networks only understand numbers. The process of converting text into numerical values is known as One-Hot Encoding. When the data preprocessing is completed we'll create Neural Networks using 'TFlearn' and then fit the training data into it. After the successful training, the model is able to predict the tags that are related to the user's query.

Installing Libraries using pip

First, you need to install NLTK, TFlearn, and Tensorflow.

$ pip install nltk tensorflow tflearn

After the installation, you may want to download the 'Punkt' model from NLTK corpora. 

>>> import nltk

>>> nltk.download("punkt")

Then we need a file 'intents.json' which is the data used to train our Neural Network.

intents.json

 {"intents": [
        {"tag": "greeting",
         "patterns": ["Hi there", "How are you", "Is anyone there?","Hey","Hola", "Hello", "Good day"],
         "responses": ["Hello", "Good to see you again", "Hi there, how can I help?"],
         "context": [""]
        },
        {"tag": "goodbye",
         "patterns": ["Bye", "See you later", "Goodbye", "Nice chatting to you, bye", "Till next time"],
         "responses": ["See you!", "Have a nice day", "Bye! Come back again soon."],
         "context": [""]
        },
        {"tag": "thanks",
         "patterns": ["Thanks", "Thank you", "That's helpful", "Awesome, thanks", "Thanks for helping me"],
         "responses": ["My pleasure", "You're Welcome"],
         "context": [""]
        },
        {"tag": "query",
         "patterns": ["What is big bang?"],
         "responses": ["The Big Bang theory is the prevailing cosmological model explaining the existence of the observable universe from the earliest known periods through its subsequent large-scale evolution."],
         "context": [""]
        }
    ]
 }

Here each intent contains a tag, patterns, responses, and context. Patterns are the data that the user is more likely to type and responses are the results from the chatbot. This data file above only contains a very little amount of data. So to alter this chatbot as you like, provide more tags, patterns, and responses for the way how you want it to do.

Importing modules

>>> import nltk
>>> from nltk.stem.lancaster import LancasterStemmer
>>> import numpy as np
>>> import tflearn
>>> import tensorflow as tf
>>> import json
>>> import pickle
>>> import random
>>>

Implementation

Now we can load the 'intents.json' file and start the process.

#Loading intents.json
with open('intents.json') as intents:
  data = json.load(intents)

stemmer = LancasterStemmer()

# getting informations from intents.json--
words = []
labels = []
x_docs = []
y_docs = []

for intent in data['intents']:
  for pattern in intent['patterns']:
    wrds = nltk.word_tokenize(pattern)
    words.extend(wrds)
        x_docs.append(wrds)
        y_docs.append(intent['tag'])

    if intent['tag'] not in labels:
            labels.append(intent['tag'])

Here we loaded the 'intents.json' file and retrieved some data. Now it's time to start the data preprocessing.

# Stemming the words and removing duplicate elements.
words = [stemmer.stem(w.lower()) for w in words if w not in "?"]
words = sorted(list(set(words)))
labels = sorted(labels)

We stemmed the words and also removed the duplicate words from the list of words. Here the Lancaster Stemmer algorithm is used to reduce words into their stem.

One-Hot Encoding and preparing training data-

training = []
output = []
out_empty = [0 for _ in range(len(labels))]

# One hot encoding, Converting the words to numerals
for x, doc in enumerate(x_docs):
    bag = []
    wrds = [stemmer.stem(w) for w in doc]
    for w in words:
        if w in wrds:
            bag.append(1)
        else:
            bag.append(0)


    output_row = out_empty[:]
    output_row[labels.index(y_docs[x])] = 1

    training.append(bag)
    output.append(output_row)


training = np.array(training)
output = np.array(output)

The resulting training and output data are One-Hot encoded. Literally, the words are converted into a form of ones and zeros which are then appended to the training list as well as the output list and then converted to NumPy arrays.

Training the Neural Network

First, we need to create our model using Neural Networks.

net = tflearn.input_data(shape=[None, len(training[0])])
net = tflearn.fully_connected(net, 10)
net = tflearn.fully_connected(net, 10)
net = tflearn.fully_connected(net, 10)
net = tflearn.fully_connected(net, len(output[0]), activation='softmax')
net = tflearn.regression(net)

model = tflearn.DNN(net)
model.fit(training, output, n_epoch=500, batch_size=8, show_metric=True)
model.save('model.tflearn')

The first layer is the input layer with the parameter of the equal-sized input data. Then the middle three are the hidden layers that are responsible for all the processing of the input data. The output layer gives the probabilities of different words there in the training data.

The training data is fitted into the model and set the epochs to 500 in which the training will continue until it reaches 500 iterations. Lastly, we saved the model with the TFlearn extension.

Making predictions

Remember, we trained the model with a list of words or we can say a bag of words, so to make predictions we need to do the same as well. Now we can create a function that provides us a bag of words for our model prediction. 

def bag_of_words(s, words):
    bag = [0 for _ in range(len(words))]
    s_words = nltk.word_tokenize(s)
    s_words = [stemmer.stem(word.lower()) for word in s_words]

    for s_word in s_words:
        for i, w in enumerate(words):
            if w == s_word:
                bag[i] = 1

    return np.array(bag)

This function helps to create a bag of words for our model, Now let's create a chat function that ties all this together.

def chat():

    while True:
        inp = input("\n\nYou: ")
        if inp.lower() == 'quit':
            break

    #Porbability of correct response
        results = model.predict([bag_of_words(inp, words)])

    # Picking the greatest number from probability
        results_index = np.argmax(results)

        tag = labels[results_index]

        for tg in data['intents']:

            if tg['tag'] == tag:
                responses = tg['responses']
                print("Bot:\t" + random.choice(responses))

In this chat function, we have only a few things to notice. First, the model predicts the results using the bag of words and the user input, Then it returns a list of probabilities. Among the probabilities, the highest number is more likely to be the result the user is expecting. So we are selecting the index of highest probability and finding the tag and responses of that particular index. Then we can pick some random responses from the list of responses. 

Output:


Here the chatbot can actually identify the pattern of the user input and can respond according to that. You can add more tags, patterns, responses, and intents to make the bot more user-friendly.

The final version of the bot

Great! we have created our own AI chatbot. Now we can make some changes in the code since whenever you run this code it will always train the model continuously. We don't want to do that, So using some try & except blocks and saving our trained model and training data in a pickle file will help us to load the trained model instead of training it again. This avoids frequent training of our Neural Network.

import nltk
from nltk.stem.lancaster import LancasterStemmer
import numpy as np
import tflearn
import tensorflow as tf
import json
import pickle
import random

#Loading intents.json
with open('intents.json') as intents:
  data = json.load(intents)

stemmer = LancasterStemmer()

try:
    with open('data.pickle','rb') as f:
        words, labels, training, output = pickle.load(f)
except:
# Fetching and Feeding information--
    words = []
    labels = []
    x_docs = []
    y_docs = []

    for intent in data['intents']:
        for pattern in intent['patterns']:
            wrds = nltk.word_tokenize(pattern)
            words.extend(wrds)
            x_docs.append(wrds)
            y_docs.append(intent['tag'])

            if intent['tag'] not in labels:
                labels.append(intent['tag'])

    words = [stemmer.stem(w.lower()) for w in words if w not in "?"]
    words = sorted(list(set(words)))
    labels = sorted(labels)

    training = []
    output = []

    out_empty = [0 for _ in range(len(labels))]

    # One hot encoding, Converting the words to numerals
    for x, doc in enumerate(x_docs):
        bag = []
        wrds = [stemmer.stem(w) for w in doc]
        for w in words:
            if w in wrds:
                bag.append(1)
            else:
                bag.append(0)


        output_row = out_empty[:]
        output_row[labels.index(y_docs[x])] = 1

        training.append(bag)
        output.append(output_row)


    training = np.array(training)
    output = np.array(output)

    with open('data.pickle','wb') as f:
        pickle.dump((words, labels, training, output), f)


net = tflearn.input_data(shape=[None, len(training[0])])
net = tflearn.fully_connected(net, 8)
net = tflearn.fully_connected(net, 8)
net = tflearn.fully_connected(net, len(output[0]), activation='softmax')
net = tflearn.regression(net)

model = tflearn.DNN(net)

try:
    model.load("model.tflearn")
except:

    model.fit(training, output, n_epoch=100, batch_size=8, show_metric=True)
    model.save('model.tflearn')


def bag_of_words(s, words):
    bag = [0 for _ in range(len(words))]
    s_words = nltk.word_tokenize(s)
    s_words = [stemmer.stem(word.lower()) for word in s_words]

    for se in s_words:
        for i, w in enumerate(words):
            if w == se:
                bag[i] = 1

    return np.array(bag)


def chat():
    print("The bot is ready to talk!!(Type 'quit' to exit)")
    while True:
        inp = input("\nYou: ")
        if inp.lower() == 'quit':
            break

    #Porbability of correct response
        results = model.predict([bag_of_words(inp, words)])

    # Picking the greatest number from probability
        results_index = np.argmax(results)

        tag = labels[results_index]


        for tg in data['intents']:

            if tg['tag'] == tag:
                responses = tg['responses']
            print("Bot:" + random.choice(responses))


chat()