Deep learning is currently the hottest machine learning technology, both in academics and industrial. Accordingly, leading software and internet companies like Google, Microsoft, and Facebook, are all in ownership of large-scale deep learning techs.

I had the luck to have witnessed the surging deep learning revolution at Google. And this is no simple matter, as the tech world is often slow to receive quinessences from the research world, as these are often complex and in need of further testing. The academics, meanwhile, is extremely on their feet regarding deep learning. In fact, old machine learning conferences can no longer bear the tremendous stream of deep learning papers, thus birthing new and specific conferences like the ICLR, which solely focuses on deep learning.

So, what is deep learning, and how is it flipping our world over? In this article, I will first provide an answer from the academic perspective, as in the end, machine learning is about teaching and learning.

History in a nutshell

Deep learning is often closely tied to neural network, which is in no way new. In fact, they were the first machine learning models – having made their first appearance in 1950, while machine learning is still being cultivated. There were little data and computer prowess back then, and so these models could not really compete with other simpler ones like logistic regression or support vector machines. In fact, Yann LeCunn, now Facebook’s Director of Research, used to struggle for quite the while as his deep learning papers were constantly rejected in large conferences. The issue got so bad that he once sent a letter to the organizer of one of the largest conferences in Computer Vision, stating that he would never ever send more papers unless they rid of their prejudices regarding deep learning.

To grow, deep learning needs the following three elements:

  1. A galore of data.
  2. Faster calculating speed in computer.
  3. New advanced techniques to train neural networks.

With time, we have achieved the first two elements. It was only until 2006, when Geoff Hinton’s article on Nature started to attract attention among the scientific community and later in 2010 when AlexNet, a convolutional neural network model, won the ImageNet spectacularly, leaving traditional models far behind inaccuracy, that people recognized the potential of neural networks and started to improve training techniques for them. And as the three elements are achieved comes the age of deep learning.

Deep learning: From an academic perspective

So why did the world drop old models like logistic regression and support vector machines for deep learning? For these questions of “why is A better than B”, we often receive answers like: A is good in this, while B is bad in that…

But I myself did not find the need to separate deep learning from older models, as deep learning inherits from them, in a very much natural and unavoidable manner.

For easy understanding, let’s think of machine learning as an education system. Before, there is only a single class in the system, and upon graduation from that class, people start to work. Policymakers at the time only care about which subjects to add and eliminate, and as such, any developmental goals are horizontal. I am going to call this type of model shallow learning.

Then slowly, people start to realize that learning everything once like this is concerning. First, the difficulty levels of the subjects differ. Sometimes, it is necessary to learn one subject to get the basics for another. Second, not all subjects are useful. Third, there are subjects that are useless individually but not when combined with others. For example, we cannot get rich just with mathematical knowledge, but it is still impossible to get rid of Maths, for it is the core of many other practical subjects. Finally, if all we do is adding and ridding subjects, it will take forever to find the optimal solution (as for N subjects we will have 2N kinds of combination), leading to generations of students being tested, wasting tremendous resources and money.

Then, a sudden thought occurred: why don’t we develop in-depth, instead of in width? Instead of learning everything in just a grade, why don’t we make it 12, each with smaller numbers of subjects, with contents getting more complex grade after grade? This kind of leveled system brings various benefits. First, it gives us the chance to retain knowledge and allows subjects to tightly support one another. For example, we start to learn chemistry in secondary school and yet are already prepared with mathematical basics from elementary school. This model itself is the foundation of deep learning. In fact, “deep” in deep learning represents this very categorization of information from basic to advanced.

Later, another more sophisticated step was implemented. Now, instead of following a set curriculum, students are allowed to choose the subjects themselves in each grade. Students will be given the following missions:

  1. You have to achieve the following goals;
  2. You have to study per the leveled system that we have issued;
  3. In each grade, you need to study a pre-determined number of subjects;
  4. What subjects to study entirely depend on you.

As everyone is different, and each goal requires different strategies, this new model had successfully utilized all students’ creativity and flexibility, as long as they focus on finishing their goals. This is much similar to the way to learn in university, and of course, the mechanics of deep learning.

We can see that from the Confucian to the K-12 education to the university credit system – all are necessary academical revolutions. It is the same for the path from shallow learning to deep learning.

In later parts, we will go back to the normal mathematical perspective, and I will explain the evolution from shallow learning to deep learning, as well as specify all task allocation steps for deep learning models. For example, “you have to achieve the following goals” is similar to choosing a loss function, “you have to study per the leveled system that we have issued” is similar to determining the number of layers in neural networks, and “in each grade, you need to study a pre-determined number of subjects” is about determining the number of neurons in each layer…

More HERE.

Ha Xuan Tung – VnCoder

Related posts: