advanced data eng cover front

New Book Release: Advanced Data Science Engineering

New Book Release: Advanced Data Science Systems Engineering

Building Reproducible, Scalable, and Production-Ready Systems in Python

I am excited to announce the publication of the second volume in the Data Science Engineering series:

Advanced Data Science Systems Engineering: Building Reproducible, Scalable, and Production-Ready Systems in Python

The first volume focused on building reusable workflows.

This second volume focuses on engineering reusable systems.

Many data science projects successfully train models, generate predictions, and produce useful results. Yet as projects grow, new challenges emerge:

  • How do you reproduce experiments months later?
  • How should models and artifacts be managed?
  • How can workflows be tested and debugged systematically?
  • How should configurations be organized?
  • How can reusable workflows evolve into reusable frameworks?

These questions are fundamentally systems-engineering challenges.

This book addresses them directly.

What Readers Will Learn

Throughout the book, readers learn how to design and implement:

  • reproducible workflows
  • configuration-driven systems
  • artifact and model management
  • logging infrastructure
  • testing and debugging workflows
  • performance optimization techniques
  • scalable processing architectures
  • reusable framework design

The book places a strong emphasis on practical implementation and engineering best practices rather than abstract theory.

Building dskit

A central feature of the book is the progressive development of dskit, a reusable data science framework.

Readers begin with a workflow architecture and gradually transform it into a structured, installable, and extensible Python framework.

Along the way, they learn:

  • package architecture
  • API design
  • project organization
  • framework engineering
  • distribution through PyPI

The result is a complete framework that demonstrates how modern data science systems can be designed, maintained, and extended.

A Natural Progression

The two books together form a complete learning journey:

Book 1

Notebook → Workflow

Book 2

Workflow → System → Framework

This progression reflects the path many practitioners experience as their projects mature from experimentation to long-term software systems.

Who Should Read This Book?

This volume is intended for:

  • data scientists seeking stronger engineering practices
  • machine learning practitioners building production workflows
  • Python developers working with data-intensive applications
  • analysts interested in reproducibility and automation
  • readers who want to design frameworks rather than simply use them

Looking Forward

As data science continues to evolve, engineering practices become increasingly important.

Models matter.

But reproducibility, maintainability, testing, scalability, and system design matter as well.

My hope is that this book helps readers build systems that remain useful long after the first experiment succeeds.

Thank you for joining me on this journey.

— Shouke Wei, PhD

Deepsim Press

Found this useful? Share it

Leave a Comment

Shopping Cart
  • Your cart is empty.