In the (literally) ever-growing field of data science and machine learning, it can be difficult even for those involved to stay up to date on the latest, cutting-edge information and research. And with the ever-increasing public awareness in all things AI, it is as good a time as any to learn more about the ever-increasing world of data analyses.
With that said, I have compiled three books that I think every data scientist should read:
Machine Learning Yearning, by Andrew Ng
Modern data and big data can produce accurate machine learning systems. If you are at all interested in learning about big data and its effects on machine learning, then this book is a must-read. What would normally take people years in an apprenticeship to learn, Ng lays out in this book in a simple, yet profound way.
This book answers questions like: How much data should you collect? Should you use end-to-end deep learning? And, how do you deal with your training set not matching your test set?
Storytelling With Data, by Kole Nussbaumer Knaflic
This book isn’t interested in simply helping you read data more effectively – it wants to teach you how to relay that information to people. In the world of data science it can be easy to regurgitate information in the vernacular of your peers only, but learning how to tell the unique story of the data you are looking at is what helps set truly innovative data scientists above the rest.
This book does an excellent job at reaching beyond the conventional tools that are typically used to tell the story of data and instead opts for a method that is engaging, useful, and truer to the unique story that different sets of data present.
Hadoop: The Definitive Guide, by Tom White
If you’ve spent much time at all in the world of programming or data science, then you’re probably familiar with Hadoop. So, when Tom White, an expert Hadoop consultant and Apache Software Foundation member writes a definitive, tell-all guide on how to best utilize it, you probably don’t want to miss out on it.
This book provides bright and insightful information that can help you set up your Hadoop clusters, navigate the platform with a bit more expertise, and gain a better overall understanding of this important and intricate avenue of data science.
While this list is far from exhaustive, it’s certainly a great start for people looking to further their knowledge of data science, programming, AI, and machine learning. If you learn to utilize the information in these books well, I think you will find both your understanding and appreciation of big data to grow immensely.