Today I will discuss the challenges I faced when tackling deep learning, and also some of my personal experiences in studying machine learning in general. It may seem like I am beating around the bushes without actually focusing on the specifics, but not everyone is prepared to work in deep learning. So, I am going to brief you on the following conditions, just so you get yourself ready beforehand.

1. Changing your perspective

It was not until last summer did I actually approach deep learning. I was kind of rigid back then, thinking that whenever I learn, I need to do it properly. But then I realized that I can deep learning as I work and also as I don’t work. In fact, most people use deep learning models, though not for deep learning research but for researching other realms. And not many people actually understand deep learning in contrast to the overwhelming number of people training neural networks every day. I was kind of reluctant to start in such a hot realm, as I was afraid the hype around it would dwindle as time passed.

Last year, however, I had the honor to “chat” with Mr. Dan Jurasky for half an hour. Literally, everyone who works with NLP knows about this extremely esteemed Stanford professor, and of course, the NLP textbook that he had co-written with Chris Manning. Professors as popular as him often travel across schools to give presentations and meet Ph.D. students. At the end of the chat, I asked him about what topic I should do, and he quickly exclaimed: “Deep learning! If you don’t research it, at least write a paper that uses deep learning.” When I asked why, he replied: “Deep learning is quite the achievement in this day and age, if you don’t know about it, it feels like you never lived in this era.”

So, I started to learn deep learning. While it sounds like it is the life that pushed me to the decision, it was actually me feeling that deep learning is the future itself.

2. Preparation

I know 2 ways to actual deep learning: using GPUs or using distributed systems. GPUs are more suitable for individual research or small teams, because its small size allows for easy setup, while distributed systems are often used in large firms. The distributed system for Google can easily call for hundreds of computers to run simultaneously, while labs at universities doing deep learning researches are often equipped with GPUs.

My lab at the time got no GPU, so I decided to buy one for myself. It costs me around 1,700 USD to buy a PC set with high-quality GPU, just for deep learning. One GTX 1080 8GB already costs 700 USD (while Titan X is even higher at 1,000 USD). Please know that my family is in no way rich – this is really quite the investment and I don’t recommend you to follow (no other Ph.D. that I know buy themselves GPUs). This is just so you can get a general idea of how much a full computer set for deep learning would cost you.

To tackle deep learning, you would also need to be prepared mentally. Training deep learning models take a lot of time, so you need to be extremely patient and work scientifically. As I recall, 2 years ago, training a deep learning model for translation took around a whole week. So I have mad respect for those that work in deep learning at the time. Us PhDs even joked around that what matters for us is coming up with what to do as we watch the model to train.

So, you will need the 4 following factors to work in deep learning: good academics, investment, patience, and organization. Good academics so you get yourself into a neat lab or a well-equipped company. Money to buy a GPU. And if you afford one, then you can work on a CPU but with the extremely patient. A CPU may work for studying purposes, but a GPU will be needed in the long term. And even with the GPU, the patient is still very much needed, as you cannot just press a button and expect a good model, you need to repeat over and over to refine whatever results you get. Therefore, it is necessary to have nerves of steel and the ability to multitask.

Quite a small detail (but important), is that you should get used to using Ubuntu and learn Python. Leave Windows behind!

3. Set your goal

To save time, you need to first determine for the purpose for using deep learning. Here I will discuss 2 popular use cases.


Here the goal is to use deep learning to get results for other researches or products. Most of those with this goal are working with libraries, packages, tools, are treating them like black-boxes, and programming them via readily available APIs. However, don’t think of them as block-boxes, but rather think about what you can do with them.

First, you can refine the hyperparameters (learning rate, the number of hidden layers, dropout…) pre-process input. You may find these simple, but they are what determines if your model works well. Therefore, it is necessary to have a sound theoretical foundation to alter hypermeters according to model behaviors. For example, you need to know if the model you are using tends to overfit, if it actually overfits, and what to do when it overfits?

Second, you have the right to choose between models. This requires basic knowledge about each model to choose those most suitable for the data and problems you are working on. You need to know which model is good for what.

Overall, your knowledge is mostly for a hack instead of a build. So, if your purpose is the application, you should focus on simple principles that are widely applicable, instead of mathematics. However, don’t put too much faith in theories, dig a little in exercises too.


Here your aim is often to improve one kind of model. So, you obviously need to know what your model is doing and know that in-depth. To do this, you will need both theory and practice, and I going to highlight practice because most people are already aware that they should read books and papers and know model equations by heart. However, it is also necessary to work with initial codes to understand your model accurately. Papers are often limited to the perspective of its writers, and their experiments are often limited to specific datasets with promising figures. Sometimes, people only realize that the model is not suitable for the case they are solving as they test it. So, I highly recommend you at least try coding your deep learning models once, just so you can feel them. I have learned a lot through this, but of course, this is only if you have time to spare – don’t spend the entirety of time for debugging.

You can see that I hold practice in high regard for both of the above goals. The basics of deep learning theory are not too difficult to grasp, but getting your models to work the way you want it to would requite experience. And you can only get experience by practicing.

4. How to read papers

It is of utmost importance that you know how to read papers effectively, especially those with research purposes. And my ultimate goal in writing these tutorials is not to teach all there is about machine learning, but mostly for you to be able to read papers in the long terms.

Papers are often confusing and there are way too many of them. So, how to effectively read?

It is actually quite similar to reading books, doing exercises, and reviewing for exams. In fact, those who once studied in the national team would most often find them back in past. There were too many exercises back then, and you would never know which to do – not that different from the present. I once discussed this issue. Among those, there is this principle of “doing 1 exercise that is as effective as 10” about generalizing knowledge. Reading papers is the same, you just need to get the gist. The idea is the soul of the paper, the exercise that you can apply to yourself, and the more general and applicable the idea, the more valuable the paper. So, when reading, look for the idea, and avoid details unless necessary.

If the only thing that you remember after reading is not the big picture, but detailed mathematical equations, that your reader has failed. If you are wondering whether it is necessary to read the derivative calculations to check for errors when reading deep learning papers, then the answer is no. Most authors only add those for decoration or to illustrate some behaviors of their models. If it is later, then you may read the explanation, then look to the mathematical equation to check.  Current tools allow for automatic calculations these days, so if the equation is wrong, that is none of your matter. People pay attention to the idea, the explanation, and the result when they grade a paper, not some random calculative errors.

The main idea of a paper is often stated in the abstract and introduction segments. When starting a paper, you should immediately check the abstract to see which problem is being tackled and with what result. Below is an example of a scheduled sampling paper (Bengio et al., 2015).

As I skim through the abstract, it can be seen that this paper is about how to train recurrent neural networks. The problem that the paper is tackling is relevant to the “discrepancy between training and inference”, and the result is that it won an award in image captioning. So, if you are interested in recurrent neural networks or image captioning, then you should move onto the introduction.

If you find terminologies used in the paper confusing, then you may be lacking in background knowledge. In this case, you may look to the citations listed in the introduction (names or bracketed numbers). In the example we are using, citations are listed using square brackets, but we can also write something like “Our work is motivated by previous works on textual entailment (Le et al., 2015). It is necessary to cite other papers as evidence for claims, instead of making assumptions or basing them off experience. For example, the paper above states that “recurrent neural networks… hard to train…”, and cite a paper that proves such a phenomenon. Similarly, every definition needs to be cited with the original paper that first includes it, or one with widespread influence (like the aforementioned paper about LSTM).

So, in the example above, if you don’t understand what is “curriculum learning”, you can seek out paper no.7 in “references” to get more background knowledge about this term.

Sometimes you cannot understand even as you read the cited papers, or the concept is new yet cited with another paper, for example, Paper no.2 is the original paper about LSTM, but since it is written in 1997, the language we use has changed over time. In these cases, you will have to Google. In fact, for popular concepts like LSTM, it is much easier to read and find tutorials than reading papers.

Another way to improve your background knowledge is to go to “Related work” or “Previous work” – which actually symbolizes “Related ideas”. These are extremely useful for readers, showcasing the knowledge-sharing culture among academics. These segments will list the main ideas of the paper you are reading, other relevant papers, and compare and contrast between them. This is the “related work” segment to our scheduled sampling paper.

It can be seen that each paragraph represents a major point of the paper. We can easily recognize big ideas like “discrepancy between the training and inference distributions” or “online algorithm… that adapts the targets“. Reading these flows will help you imagine the large problem that the paper is tackling, and come up with your own solutions.

Above are my experiences in preparing to study deep learning. I do hope that they may be useful to you. Read more HERE.

Ha Xuan Tung – VnCoder

Related posts: