by Mark Ryan, Luca Massaron
Business runs on tabular data in databases, spreadsheets, and logs. Crunch that data using deep learning, gradient boosting, and other machine learning techniques. Machine Learning for Tabular Data teaches you to train insightful machine learning models on common tabular business data sources such as spreadsheets, databases, and logs. You’ll discover how to use XGBoost and LightGBM on tabular data, optimize deep learning libraries like TensorFlow and PyTorch for tabular data, and use cloud tools like Vertex AI to create an automated MLOps pipeline. Machine Learning for Tabular Data will teach you how to: • Pick the right machine learning approach for your data • Apply deep learning to tabular data • Deploy tabular machine learning locally and in the cloud • Pipelines to automatically train and maintain a model Machine Learning for Tabular Data covers classic machine learning techniques like gradient boosting, and more contemporary deep learning approaches. By the time you’re finished, you’ll be equipped with the skills to apply machine learning to the kinds of data you work with every day. Foreword by Antonio Gulli. About the technology Machine learning can accelerate everyday business chores like account reconciliation, demand forecasting, and customer service automation—not to mention more exotic challenges like fraud detection, predictive maintenance, and personalized marketing. This book shows you how to unlock the vital information stored in spreadsheets, ledgers, databases and other tabular data sources using gradient boosting, deep learning, and generative AI. About the book Machine Learning for Tabular Data delivers practical ML techniques to upgrade every stage of the business data analysis pipeline. In it, you’ll explore examples like using XGBoost and Keras to predict short-term rental prices, deploying a local ML model with Python and Flask, and streamlining workflows using large language models (LLMs). Along the way, you’ll learn to make your models both more powerful and more explainable. What's inside • Master XGBoost • Apply deep learning to tabular data • Deploy models locally and in the cloud • Build pipelines to train and maintain models About the reader For readers experienced with Python and the basics of machine learning. About the author Mark Ryan is the AI Lead of the Developer Knowledge Platform at Google. A three-time Kaggle Grandmaster, Luca Massaron is a Google Developer Expert (GDE) in machine learning and AI. He has published 17 other books. Table of Contents Part 1 1 Understanding tabular data 2 Exploring tabular datasets 3 Machine learning vs. deep learning Part 2 4 Classical algorithms for tabular data 5 Decision trees and gradient boosting 6 Advanced feature processing methods 7 An end-to-end example using XGBoost Part 3 8 Getting started with deep learning with tabular data 9 Deep learning best practices 10 Model deployment 11 Building a machine learning pipeline 12 Blending gradient boosting and deep learning A Hyperparameters for classical machine learning models B K-nearest neighbors and support vector machines
Books with similar themes and ideas
Echoes summary
For professionals grappling with the ubiquitous nature of tabular data – found in everything from everyday spreadsheets and business databases to extensive logs – "Machine Learning for Tabular Data" by Mark Ryan and Luca Massaron emerges as an indispensable guide. This comprehensive volume directly addresses the core challenges of extracting meaningful insights from this prevalent data format, employing both classic and cutting-edge machine learning techniques. It’s a book that speaks to those who understand that the bedrock of business operations hinges on the effective processing of structured information. The inherent value of this book becomes particularly apparent when considering its close conceptual linkage to titles like "Practical Statistics for Data Scientists" by Peter Bruce, Andrew Bruce, and Peter Gedeck.
The connection between these two works is not merely superficial; rather, it highlights a shared intellectual lineage and a complementary approach to data mastery. "Machine Learning for Tabular Data" builds upon the fundamental statistical principles often explored in "Practical Statistics for Data Scientists," providing the practical application layer for those foundational concepts. While statistical analysis focuses on understanding data distributions, hypothesis testing, and exploratory data analysis, this machine learning text delves into actively building predictive and analytical models. It bridges the gap between understanding *what* the data tells you and actively *using* that knowledge to forecast, classify, and optimize. The authors of "Machine Learning for Tabular Data" don't shy away from the mathematical underpinnings, subtly reinforcing the rigor that readers of "Practical Statistics for Data Scientists" would expect.
Discover hidden gems with our 'Gap Finder' and explore your reading tastes with the 'Mood Galaxy'. Go beyond simple lists.
A key aspect of "Machine Learning for Tabular Data" is its nuanced exploration of both gradient boosting algorithms, such as XGBoost and LightGBM, and the application of deep learning to tabular datasets, leveraging libraries like TensorFlow and PyTorch. This provides a sophisticated pathway for users to move beyond more basic statistical inference and engage with advanced predictive modeling. The book meticulously guides readers through the process of selecting the most appropriate machine learning approach for their specific tabular data, a decision-making process that is often informed by the statistical understanding cultivated by texts like "Practical Statistics for Data Scientists." Furthermore, the inclusion of MLOps concepts, with a focus on cloud tools like Vertex AI, underscores a commitment to real-world deployment and the automation of model training and maintenance, bringing theoretical models into practical, scalable applications. This forward-thinking perspective on the entire machine learning lifecycle, from data understanding to deployment, solidifies its relevance for professionals who need to not only analyze but also operationalize their data-driven insights. The practical examples, such as predicting short-term rental prices or building local ML models with Python and Flask, demonstrate the tangible outcomes of mastering these techniques. This focus on impactful application makes "Machine Learning for Tabular Data" a vital resource for anyone looking to harness the full potential of their tabular data.
Joel Grus
Noah Gift, Alfredo Deza
Peter Bruce, Andrew Bruce, Peter Gedeck
Jason Hodson
Vadim Smolyakov
Brian Godsey
Chip Huyen
Books that connect different domains
Bridges summary
The exploration of **Machine Learning for Tabular Data** reveals a compelling interconnectedness with a curated selection of foundational and practical texts, forming significant conceptual bridges that illuminate the reader's learning journey. This book, by Mark Ryan and Luca Massaron, directly addresses the ubiquitous nature of tabular data in business operations – from spreadsheets and databases to log files – and equips you with the skills to unlock its inherent value through advanced machine learning techniques. The strength of this connection is immediately apparent when considering a text like **Python for Data Analysis** by Wes McKinney. While McKinney’s work provides the essential toolkit for data manipulation and initial exploration in Python, *Machine Learning for Tabular Data* builds directly upon these foundations by showing you how to apply sophisticated algorithms like XGBoost, LightGBM, and even deep learning models to this structured data. The shared theme is one of **disciplined exploration and structured problem-solving**, where the analytical prowess honed in *Python for Data Analysis* is seamlessly transitioned into the predictive and analytical power offered by machine learning models trained on tabular datasets.
Furthermore, the practical application emphasized in Ryan and Massaron’s book forms a crucial bridge to the more comprehensive coverage found in **Hands-On Machine Learning with Scikit-Learn and PyTorch** by Aurélien Géron. Both books advocate for moving beyond theoretical understanding to actively building and deploying models. *Machine Learning for Tabular Data* specifically targets the often-overlooked but critically important domain of tabular data, demonstrating how to optimize widely used deep learning libraries such as TensorFlow and PyTorch for this data type, and how to leverage cloud platforms like Vertex AI for MLOps pipelines. This directly complements the broader machine learning and deep learning architectures discussed in Géron’s work, creating a pathway for readers to apply foundational ML concepts to a concrete and prevalent business data format. The intellectual craftsmanship that underpins complex systems is a shared appreciation, as both texts encourage a deep dive into *how* these models function and *how* to effectively implement them.
The book also offers a vital bridge to the theoretical underpinnings of statistical modeling, particularly when viewed alongside **An Introduction to Statistical Learning** by Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, and Jonathan Taylor. While *An Introduction to Statistical Learning* provides the rigorous mathematical and statistical framework for understanding concepts like regression and classification, *Machine Learning for Tabular Data* demonstrates the practical instantiation of these principles using modern machine learning algorithms on real-world business data. This connection highlights a shared intellectual lineage focused on the fundamental challenge of **extracting reliable signals from noisy data**. Both books, in their distinct languages of statistics and applied algorithms, explore this core endeavor, with Ryan and Massaron guiding you through the specific application to tabular datasets, and the James et al. text providing the conceptual bedrock.
Finally, the dive into deep learning for tabular data within *Machine Learning for Tabular Data* creates a fascinating parallel with **Deep Learning from Scratch** by Seth Weidman. Despite their differing approaches – one focusing on the practical application to tabular data and the other on fundamental deep learning architectures – both books are fundamentally concerned with the **abstraction of signal from noise**. Ryan and Massaron show how to systematically distill meaningful patterns from complex, raw tabular data using deep learning, a concept that resonates with Weidman’s exploration of how deep representation learning itself works at a fundamental level. Moreover, the emphasis on extracting and communicating insights, a key theme in *Machine Learning for Tabular Data*, implicitly connects to the power of narrative found in **Storytelling with Data** by Cole Nussbaumer Knaflic. While Knaflic’s work focuses on the visual communication of data-driven findings, Ryan and Massaron’s book provides the algorithmic engine to generate those compelling narratives from tabular sources. Together, these books form a comprehensive ecosystem for understanding, manipulating, modeling, and ultimately communicating the value hidden within the vast quantities of tabular data that drive modern business.
Wes McKinney