To bring my therapist bot to another level I decided to learn more about how to make a bot more intelligent. For that we have to take a look at artificial neural networks, deep learning, natural language processing and so on. Clearly, to use something like there is no need to understand all of that stuff, but I don’t like to do something without understanding how it works.

Our first stop is artificial neural networks. I find this book pretty good to learn about the topic:

Basically, artificial neural network consists of a large number of artificial neurones which are interconnected. There are two types of them: perceptrons and sigmoid neurones.

First one is relatively simple and was invented in 50-60-ies of the previous century, and are not commonly used nowadays, but  knowing the principles of their work is essential for understanding a modern technology.


Perceptrons have several inputs and one output which depends on the values and weights of each input and the bias of the neurone itself.

How machines learn: neural networks

Using the perceptrons you could build the network which can solve any computational problem, it easily proves considering that perceptron can play the role of NAND gate.

Sigmoid neurone

It may seems like perceptrons are merely a new type of the NAND gate which is barely a breakthrough. BUT. We can devise learning algorithms which could automatically tune the weights and biases of the neurones in a network of artificial neurones, so they could be taught by external stimuli without a programmer’s direct intervention. To make it possible, we should implement the ability to slightly change the weights (or biases), so that would create a small corresponding change of the output.

Here is where we should introduce a sigmoid neurone, which is basically the same as the perceptron, but output is not just 0 or 1, it is defined by the sigmoid function (also known as logistics function):

How machines learn: neural networks 1
How machines learn: neural networks 2

That means, that if z is large and positive, then 𝜎(z) ≈ 1, and if z is large and negative, then 𝜎(z) ≈ 0. Same as in a perceptron, but when wx+bw⋅x+b is of modest size there’s much deviation from the perceptron model.


Basically, most of the neural nets nowadays have feedforward structure:

How machines learn: neural networks 3

There are also recurrent neural networks, but I am not looking at them for now because their usage is uncommon due to less powerful (at least for now) learning algorithms.

For the further information I suggest you to go to the article because the topic is very well explained there.

The important thing is that all the trendy words like machine learning and deep learning are just another name for the old well-known artificial neural networks with a different complexity.