Machine Learning for Tabular Data

Name: Machine Learning for Tabular Data
Author: Mark Ryan, Luca Massaron

by Mark Ryan, Luca Massaron

Computers

Business runs on tabular data in databases, spreadsheets, and logs. Crunch that data using deep learning, gradient boosting, and other machine learning techniques. Machine Learning for Tabular Data teaches you to train insightful machine learning models on common tabular business data sources such as spreadsheets, databases, and logs. You’ll discover how to use XGBoost and LightGBM on tabular data, optimize deep learning libraries like TensorFlow and PyTorch for tabular data, and use cloud tools like Vertex AI to create an automated MLOps pipeline. Machine Learning for Tabular Data will teach you how to: • Pick the right machine learning approach for your data • Apply deep learning to tabular data • Deploy tabular machine learning locally and in the cloud • Pipelines to automatically train and maintain a model Machine Learning for Tabular Data covers classic machine learning techniques like gradient boosting, and more contemporary deep learning approaches. By the time you’re finished, you’ll be equipped with the skills to apply machine learning to the kinds of data you work with every day. Foreword by Antonio Gulli. About the technology Machine learning can accelerate everyday business chores like account reconciliation, demand forecasting, and customer service automation—not to mention more exotic challenges like fraud detection, predictive maintenance, and personalized marketing. This book shows you how to unlock the vital information stored in spreadsheets, ledgers, databases and other tabular data sources using gradient boosting, deep learning, and generative AI. About the book Machine Learning for Tabular Data delivers practical ML techniques to upgrade every stage of the business data analysis pipeline. In it, you’ll explore examples like using XGBoost and Keras to predict short-term rental prices, deploying a local ML model with Python and Flask, and streamlining workflows using large language models (LLMs). Along the way, you’ll learn to make your models both more powerful and more explainable. What's inside • Master XGBoost • Apply deep learning to tabular data • Deploy models locally and in the cloud • Build pipelines to train and maintain models About the reader For readers experienced with Python and the basics of machine learning. About the author Mark Ryan is the AI Lead of the Developer Knowledge Platform at Google. A three-time Kaggle Grandmaster, Luca Massaron is a Google Developer Expert (GDE) in machine learning and AI. He has published 17 other books. Table of Contents Part 1 1 Understanding tabular data 2 Exploring tabular datasets 3 Machine learning vs. deep learning Part 2 4 Classical algorithms for tabular data 5 Decision trees and gradient boosting 6 Advanced feature processing methods 7 An end-to-end example using XGBoost Part 3 8 Getting started with deep learning with tabular data 9 Deep learning best practices 10 Model deployment 11 Building a machine learning pipeline 12 Blending gradient boosting and deep learning A Hyperparameters for classical machine learning models B K-nearest neighbors and support vector machines

Echoes

Books with similar themes and ideas

Echoes summary

For professionals grappling with the ubiquitous nature of tabular data – found in everything from everyday spreadsheets and business databases to extensive logs – "Machine Learning for Tabular Data" by Mark Ryan and Luca Massaron emerges as an indispensable guide. This comprehensive volume directly addresses the core challenges of extracting meaningful insights from this prevalent data format, employing both classic and cutting-edge machine learning techniques. It’s a book that speaks to those who understand that the bedrock of business operations hinges on the effective processing of structured information. The inherent value of this book becomes particularly apparent when considering its close conceptual linkage to titles like "Practical Statistics for Data Scientists" by Peter Bruce, Andrew Bruce, and Peter Gedeck.

The connection between these two works is not merely superficial; rather, it highlights a shared intellectual lineage and a complementary approach to data mastery. "Machine Learning for Tabular Data" builds upon the fundamental statistical principles often explored in "Practical Statistics for Data Scientists," providing the practical application layer for those foundational concepts. While statistical analysis focuses on understanding data distributions, hypothesis testing, and exploratory data analysis, this machine learning text delves into actively building predictive and analytical models. It bridges the gap between understanding *what* the data tells you and actively *using* that knowledge to forecast, classify, and optimize. The authors of "Machine Learning for Tabular Data" don't shy away from the mathematical underpinnings, subtly reinforcing the rigor that readers of "Practical Statistics for Data Scientists" would expect.