Never before had free and open-source software experienced such a boom. For those who are interested in Machine Learning and Data Science (mostly with R or Python), this article will introduce 10 books for your self-study and development journey in these fields.


Data science is a mix of mathematics, statistics, computer sciences, and information technology, as well as other specialized and business knowledges (economics or marketing). Therefore, to become a data scientist, you will need to be knowledgeable and experienced in the aforementioned fields.

Data Structure and Algorithms

1. Automate the Boring Stuff with Python (Al Sweigart)

This book is super simple and easy to understand, and thus is suitable for beginners in Python programming.

In the book, you will learn of the basics like types of functions/methods, variables, and flow control, as well as how to work with text formats like .csv, .excel, pdf, and .json.

2. Problem Solving with Algorithms and Data Structures using Python (Brad Miller and David Ranum)

This book focuses more on algorithms than examples like Recursion, Sorting and Searching, Trees and Tree Algorithms, Graphs and Graph Algorithms.

These parts are by means unimportant, as data science itself is a process of knowledge attaining and problem solving e.g. optimization problems, relevance finding between variables for predictions…  Algorithms also helps improve reasoning abilities and logical thinking.


3. An Introduction to Statistical Learning with Applications in R (James et al)

While not a book wholly specialized in statistics, this book introduces about what we call ‘statistical learning methods’, which are a set of methods for modeling and researching complicated datasets.

The book focuses on methods of direct relevance to Machine Learning models like Regression, Classification, Decision Tree, Support Vector, with some parts about unsupervised learning like Clustering using apps written in R.

4. Think Stats (O’Reilly* – Downey)

Think Stats focuses on Probabilities and Statistics using Python.

Basic probabilities and statistics concepts like Descriptive statistics, Cumulative distribution functions, Probability, Hypothesis testing, Correlation… are explained and applied directly to datasets. The end of every chapter will have revisited questions and glossary of specialized terms.

*O’Reilly is a famous IT publisher whose book covers always include an animal illustration in black and white.

Machine Learning and Data Science

5. Machine Learning for Dummies (IBM Limited Edition – Hurwitz and Kirsch)

This so-called “for dummies” book is actually for those already learned machine learning or data science. The book’s target readers are managers or project managers, who want to use ML in business as well as build their own team of data scientists.

The book is simply written with no mathematical explanation, using basic languages and icons to mark important sections for memorizing.

6. Understanding Machine Learning (Shai Shalev-Shwartz and Shai Ben-David)

This book mostly introduces about ML and other relevant algorithms, from the mathematical and theoretical perspectives to real-life applications of these theories.

The knowledges  introduced include computational complexity of learning, important algorithm models like stochastic gradient descent, neural network and structured output learning… Readers should have basic knowledges in both statistics and linear algebra.

7. A Programmer’s Guide to Data Mining (Zacharski)

‘Guide to Data Mining’ is interesting in that it not only bring data science knowledges via funny illustrations, but also allows readers to apply these information via processing given data and practicing Python coding. In other words, the book provides a learn-by-doing approach.

Algorithms introduced in the book are mostly for building Recommendation Systems, for example: classification, Naïve Bayes, and clustering.

8. A Brief Introduction to Neural Networks (Kriesel)

This book has no official release (by a publisher), but rather is public for free download on the author’s website.

While called ‘a brief introduction’, this book delves deeply in explaining advanced machine learning algorithms and neural networks, from their history, biological view, to artificial neural works and theirs components in machine learning (perceptron, backpropagation).

9. Deep Learning (An MIT Press Book – Goodfellow et al)

This book is suitable for those who want to dig further into deep learning and artificial intelligence (AI). From mathematical models, machine learning concepts, to deep learning algorithms and academic approaches – all are summarized, explained, and discussed here.

10. Machine Learning Yearning (Andrew Ng)

Andrew Ng has had myriad contributions in academics, research, and movements in AI and machine learning as the co-founder of Google Brain, Coursera,, and the Chief Scientist of Baidu.

This book will provide a technical strategic view for effective and accurate deployment of machine learning projects, including metrics optimization (like confidence, runtime, error rate) and other related problems like overfitting, bias, variance, and so on.

Hanh Hoang – DSinbrief Group

Related posts: