There are so many great resources out there to learn data science and analysis for free. If you are studying, or practicing data science, and haven’t read these books, they are worth adding to your reading list for 2019. Below is a list of the top 10 which are found to be the most useful and currently available online.
Automate the boring stuff
Data science at the command line
You can start using python for data analysis purely in Jupyter Notebooks. However, over time you will find that using the command line enabled users to be much more efficient in work. For example, you can very quickly obtain data, run programs and search through files all by typing commands and pressing enter in the terminal window. This book is a highly accessible and comprehensive guide to data science at the command line. In each chapter, it covers, alongside working examples, how to obtain, clean, explore, model and interpret data via the command line.
This is a really practical overview of statistics for data science. The book uses a data set from the National Institute of Health throughout to explain the core concepts in probability and statistics necessary for data science and analysis. This is another highly practical book and includes lots of example python code, and simple programs to explain the concepts. This is much more lightweight than a lot of the more theoretical textbooks you may find on this subject.
Python data science handbook
R for data science
Although you mainly work in python it is really useful to have at least a working knowledge of R. If a good library for a particular method is not available in python, R usually has one. This book is a really comprehensive guide to doing data science with R and covers everything from data visualization and transformation to the R workflow, to data modeling.
Probabilistic Programming and Bayesian methods for hackers
In the author’s own words this book is an attempt to “bridge the gap between Bayesian mathematics and probabilistic programming”, and maybe it does this very well. As with Think Stats it moves away from heavily theoretical textbooks and offers practical use cases for Bayesian inference, and the approach is a computational understanding first, and a mathematical understanding second. It is another python based book with lots of practical examples and uses predominately the PyMC libraries.
Machine learning yearning
This book has been released in the draft by Andrew Ng this year. It is designed to teach data scientists how to structure Machine Learning projects and set the direction for a data science team. It is a good overview of when and how to use Machine Learning, and how to handle the complexities involved in implementing AI in the real world.
Ethics and data science
There has been a lot in the news this year relating to bias in machine learning applications, and data protection and privacy concerns. This book covers how to put ethical principles into data science projects. It includes a really good checklist to go through when designing a project as well as lots of suggestions for building ethics into a general data culture. Another resource released this year along very similar lines was the deon command line tool from drivendata.org. This tool allows you to build an ethics checklist into data science projects.
This is an excellent book now available to read for free online. It covers applied maths for Machine Learning and has a large emphasis on deep learning in particular. It covers the mathematics behind key concepts in deep learning such as convolutional networks, regularisation and recurrent and recursive nets. It is very much a theory based book but gives a deep level of understanding into the subject. It does also include chapters on the practical implementation of these techniques.
Rules for machine learning
This is really an ebook/paper and only about 24 pages long. However, it is such a great resource. This covers some best practices from Google on how to implement a machine learning project. It emphasizes the importance of data engineering to create great features and a solid data pipeline over machine learning expertise.
Source: Blog Rebecca VickeryRelated posts: