10 free Data Science books you must read


There are so many great resources out there to learn data science and analysis for free. If you are studying, or practicing data science, and haven’t read these books, they are worth adding to your reading list for 2019. Below is a list of the top 10 which are found to be the most useful and currently available online.

Automate the boring stuff

This book is a simple introduction to getting started with python from a practical point of view. Although not a specific data science related book it includes most of the basic concepts around using python for data science. Including flow control, functions, web scraping, working with csv and json files, and running programs. It is very much aimed at absolute beginners so a great book for getting started with python. As well as step by step instructions for each technique, at the end of each chapter, there are also practice questions and problems.

Data science at the command line

You can start using python for data analysis purely in Jupyter Notebooks. However, over time you will find that using the command line enabled users to be much more efficient in work. For example, you can very quickly obtain data, run programs and search through files all by typing commands and pressing enter in the terminal window. This book is a highly accessible and comprehensive guide to data science at the command line. In each chapter, it covers, alongside working examples, how to obtain, clean, explore, model and interpret data via the command line.

Think stats

This is a really practical overview of statistics for data science. The book uses a data set from the National Institute of Health throughout to explain the core concepts in probability and statistics necessary for data science and analysis. This is another highly practical book and includes lots of example python code, and simple programs to explain the concepts. This is much more lightweight than a lot of the more theoretical textbooks you may find on this subject.

Python data science handbook

This is a really comprehensive guide to python for data science. This builds from beginner to advanced concepts. There is a chapter on iPython which really made such a difference to my efficiency as a data science practitioner. This book also covers Numpy, data manipulation with Pandas, visualization methods, and Machine Learning. The Machine Learning chapter, in particular, is really good and covers both the practical implementation of the various libraries and the nuts and bolts of how they work.

R for data science

Although you mainly work in python it is really useful to have at least a working knowledge of R. If a good library for a particular method is not available in python, R usually has one. This book is a really comprehensive guide to doing data science with R and covers everything from data visualization and transformation to the R workflow, to data modeling.

Probabilistic Programming and Bayesian methods for hackers

In the author’s own words this book is an attempt to “bridge the gap between Bayesian mathematics and probabilistic programming”, and maybe it does this very well. As with Think Stats it moves away from heavily theoretical textbooks and offers practical use cases for Bayesian inference, and the approach is a computational understanding first, and a mathematical understanding second. It is another python based book with lots of practical examples and uses predominately the PyMC libraries.

Machine learning yearning

This book has been released in the draft by Andrew Ng this year. It is designed to teach data scientists how to structure Machine Learning projects and set the direction for a data science team. It is a good overview of when and how to use Machine Learning, and how to handle the complexities involved in implementing AI in the real world.

Ethics and data science

There has been a lot in the news this year relating to bias in machine learning applications, and data protection and privacy concerns. This book covers how to put ethical principles into data science projects. It includes a really good checklist to go through when designing a project as well as lots of suggestions for building ethics into a general data culture. Another resource released this year along very similar lines was the deon command line tool from drivendata.org. This tool allows you to build an ethics checklist into data science projects.

Deep learning

This is an excellent book now available to read for free online. It covers applied maths for Machine Learning and has a large emphasis on deep learning in particular. It covers the mathematics behind key concepts in deep learning such as convolutional networks, regularisation and recurrent and recursive nets. It is very much a theory based book but gives a deep level of understanding into the subject. It does also include chapters on the practical implementation of these techniques.

Rules for machine learning

This is really an ebook/paper and only about 24 pages long. However, it is such a great resource. This covers some best practices from Google on how to implement a machine learning project. It emphasizes the importance of data engineering to create great features and a solid data pipeline over machine learning expertise.

Source: Blog Rebecca Vickery

Related posts: