Friday, April 23, 2021

Review - Machine Learning For Absolute Beginners - Oliver Theobold

Machine Learning For Absolute Beginners: A Plain English Introduction (Second Edition) 

Oliver Theobald

This book at Amazon

ISBN-10 : 1549617214

ISBN-13 : 978-1549617218

There is a wealth of information out there on the subject of Machine Learning (ML) especially over the last 10 years. There are books with catch-phrase titles but lacking in substance. Conversely there are books which dwell too deeply into certain kinds of programming framework for ML (nothing wrong with that). For a data scientist, there are not many books out there which cover a broad spectrum of ML, yet with substance on each topic. 'Machine Learning for Absolute Beginners' by Oliver Theobold seems to be one of these rare gems that covers ML broadly and capable of explaining important concepts without too much mathematics or code snippets. Despite the title, a seasoned data scientist may yet find this as a valuable reference.

Following a pleasant introduction to ML and an overview of a typical workflow, it discusses in detail the 3 classes of ML: Supervised, Unsupervised and Reinforcement Learning. It describes a special case of Reinforcement Learning known as Q-Learning. This is followed by a chapter on ML Toolbox where it discusses the data requirements, the infrastructure, the algorithms and the visualisation options that are available. Here it explores the various programming languages (Python, C, R, etc), the cloud platforms (Azure, AWS, Google) and the algorithm frameworks (Tensorflow, Caffe, Torch, etc).

The chapters on data preparation and feature selection covers a lot of the techniques as well as the motivation behind them. These include row and column compression, one hot encoding, binning, treatment of missing data, normalization and standardization. Partitioning of data and cross validation (k-fold) are discussed as the final stages of preparing the data before model training.

The next two chapters were devoted each to Regression and Clustering respectively. These are perhaps the two foundational families of algorithms which are also known outside the context of ML. The difference between Linear and Logistic regression conceptually and applicability to real scenarios were well illustrated. A very simple hand calculation shows how linear regression really works. Logistic regression is then explained by contrasting its usage with linear regression. Similarly the concepts and application for k-Nearest Neighbours and k-Means Clustering for supervised and unsupervised learning were well explained, including the reasonings behind the algorithms.

Machine Learning For Absolute Beginners: A Plain English Introduction (Second Edition) (Machine Learning From Scratch Book 1)

Bias and variance has a short chapter by itself and is enough to explain with sufficient details. Here is another example that brings this book above other ML books which just focus on coding. The understanding of the interplay between bias and variance, which is also emphasized in Andrew Ng's Machine Learning course, differentiates between a data scientist and an ML coder.

Next two chapters deal with the advanced algorithms Support Vector Machines (SVM) and Artificial Neural Networks (ANN). SVM is contrasted with Logistic regression, with explanation of the hyperplanes concept and the margins around the boundary for SVM. ANN concepts were explained from the basic components of neurons and activation functions, building up to the perceptron with mathematical examples for very simple cases. Multilayered Perceptron (MLP) and the various Deep Learning techniques were briefly discussed.

While ANN may be very effective in some cases, it is well known for its lack of explanability (except perhaps with recent developments in SHAP, ICE, LIME, etc). On the other hand Decision Trees has often been hailed for its high explainability nature. So a decent chapter is devoted to Decision Trees with simple mathematical calculations of entropy - which is key to decide on how to split the nodes of the tree. This is expanded to the advanced trees family including Boosting, Bagging and Random Forest and when to use them effectively. This is followed by a short chapter in Ensemble Modelling, where the Stacking method is also discussed.

The last three chapters, finally but rightly put, deals with introducing the development environment and the tool necessary for machine learning work. These include using the anaconda environent to program in the Python language and using the Panda library's dataframes. It also discusses the concept of model optimization and illustrated using the Grid Search technique. Towards the end there is a source code for a simple but complete Python code for an ML workflow, recommendation of other useful books, an Appendix on the introduction of Python but what I consider most useful is pointers to a few free datasets.