New Book Release: Advanced Data Science Systems Engineering
Building Reproducible, Scalable, and Production-Ready Systems in Python
I am excited to announce the publication of the second volume in the Data Science Engineering series:
Advanced Data Science Systems Engineering: Building Reproducible, Scalable, and Production-Ready Systems in Python
The first volume focused on building reusable workflows.
This second volume focuses on engineering reusable systems.
Many data science projects successfully train models, generate predictions, and produce useful results. Yet as projects grow, new challenges emerge:
- How do you reproduce experiments months later?
- How should models and artifacts be managed?
- How can workflows be tested and debugged systematically?
- How should configurations be organized?
- How can reusable workflows evolve into reusable frameworks?
These questions are fundamentally systems-engineering challenges.
This book addresses them directly.
What Readers Will Learn
Throughout the book, readers learn how to design and implement:
- reproducible workflows
- configuration-driven systems
- artifact and model management
- logging infrastructure
- testing and debugging workflows
- performance optimization techniques
- scalable processing architectures
- reusable framework design
The book places a strong emphasis on practical implementation and engineering best practices rather than abstract theory.
Building dskit
A central feature of the book is the progressive development of dskit, a reusable data science framework.
Readers begin with a workflow architecture and gradually transform it into a structured, installable, and extensible Python framework.
Along the way, they learn:
- package architecture
- API design
- project organization
- framework engineering
- distribution through PyPI
The result is a complete framework that demonstrates how modern data science systems can be designed, maintained, and extended.
A Natural Progression
The two books together form a complete learning journey:
Book 1
Notebook → Workflow
Book 2
Workflow → System → Framework
This progression reflects the path many practitioners experience as their projects mature from experimentation to long-term software systems.
Who Should Read This Book?
This volume is intended for:
- data scientists seeking stronger engineering practices
- machine learning practitioners building production workflows
- Python developers working with data-intensive applications
- analysts interested in reproducibility and automation
- readers who want to design frameworks rather than simply use them
Looking Forward
As data science continues to evolve, engineering practices become increasingly important.
Models matter.
But reproducibility, maintainability, testing, scalability, and system design matter as well.
My hope is that this book helps readers build systems that remain useful long after the first experiment succeeds.
Thank you for joining me on this journey.
— Shouke Wei, PhD
Deepsim Press

