Titanic ML Production Pipeline
Turned the Titanic notebook workflow into a modular Python ML system — configuration-driven, validated, with custom transformers, persisted artifacts, and a clean prediction interface.
Published August 2025
Case Brief
A working notebook is not a working system.
The research project produced a well-performing model — 83% cross-validation accuracy — through iterative experimentation in Jupyter. But a notebook runs top-to-bottom once, with no input validation, no reusable components, and no way to serve predictions without re-executing cells.
This project is the second half: taking every decision from that research — the features, the preprocessing logic, the model configuration — and rebuilding it as a proper software system. Same model, done properly.
At a Glance
Architecture
The system is organized around six explicit module boundaries, each with a single responsibility:
Nothing crosses those boundaries. The data manager does not know about features. The prediction interface does not know about training. Each module is independently readable, testable, and replaceable.
Directory
src/├── config/│ ├── configuration.yml ← All parameters, paths, schema│ └── core.py ← Config loader├── data_manager/│ ├── data_loader.py ← CSV I/O, artifact serialization│ ├── data_validator.py ← Schema and dtype checks│ └── datasets/ ← Raw and processed data├── features/│ └── feature_engineering.py ← Custom sklearn transformers├── pipeline.py ← Pipeline definitions├── train_pipeline.py ← Training orchestration└── predict.py ← Prediction interfaceConfiguration as a Contract
All paths, model hyperparameters, imputation strategies, feature lists, and expected dtypes live in a single configuration.yml file. Nothing is hardcoded across scripts.
Yaml
model_params: SEED: 20250708 learning_rate: 0.1 max_depth: 3 max_features: "sqrt" n_estimators: 200 subsample: 1.0 preprocessing_params: age_imputer_strategy: "median" embarked_imputer_strategy: "frequent"This makes the system auditable at a glance — one file describes every assumption the pipeline makes about data and model behavior. It also means changing a hyperparameter or swapping a file path requires editing exactly one line, in exactly one place.
Feature Engineering
The five features that drove most of the model's lift in the research phase are now implemented as proper sklearn transformer classes — each with a fit() step that learns from training data and a transform() step that applies the same learned logic to new data.
This distinction matters in practice: a transformer fitted on training data must apply its learned parameters at inference time. Notebook code re-running on test data learns from test data. A fitted sklearn transformer does not.
These transformers are assembled into a sequential pipeline alongside standard imputation, encoding, and scaling steps:
Python
feature_pipeline = Pipeline([ ("age_imputer", MeanMedianImputer(variables=["Age"], ...)), ("embarked_imputer", CategoricalImputer(variables=["Embarked"], ...)), ("fare_imputer", GroupMedianImputer(variable="Fare", group_var="Pclass")), ("fare_capping", CapFareOutliers()), ("sex_encoder", CustomMapping(variable="Sex", mapping={"male": 0, "female": 1})), ("embarked_encoder", SklearnTransformerWrapper(OneHotEncoder(drop="first"), ...)), ("title_extractor", TitleExtractor()), ("age_group_feature", AgeGroupEncoder()), ("isfamilyonboard_feature", IsFamilyOnBoard()), ("ticket_size_feature", TicketCounter()), ("drop_features", DropFeatures([...])), ("scaling", SklearnTransformerWrapper(StandardScaler(), ...)),])The pipeline transforms the raw 11-column input into a clean 13-feature numerical matrix — then serializes the entire fitted object to disk so inference requires no retraining.
Validation & Prediction
Before any transformation runs, validate_data() enforces the data contract on every input — whether that input is a batch DataFrame or a single passenger record passed as a dictionary:
- Is the input actually a DataFrame after normalization?
- Are all required columns present?
- Does each column match its expected dtype?
- Are missing values surfaced before they silently propagate?
Schema violations raise immediately with a clear message. Missing values trigger a log warning and continue to imputation downstream. A corrupted prediction is a harder problem than a failed one.
Python
def make_predictions(data: pd.DataFrame | dict = None) -> np.ndarray: if isinstance(data, dict): data = pd.DataFrame([data]) validate_data(data) X_transformed = feature_pipeline.transform(data[feature_pipeline.feature_names_in_]) return model_pipeline.predict(X_transformed)The function accepts both shapes of input intentionally — a DataFrame maps to batch inference, a dictionary maps to the natural structure of a single API request body. Either way, validation runs first.
Results
The model is Gradient Boosting — selected in the research phase for consistent accuracy and low cross-validation variance. The pipeline adds no new features and changes no hyperparameters. Performance is identical to the notebook prototype, which is the point.
The ~4pp gap between cross-validation and leaderboard is expected — Kaggle's held-out test set is never seen during training or tuning, and the training set of 891 records limits generalization regardless of model choice.
Reflection
The clearest thing this project demonstrates is not technical — it is judgment about when to stop experimenting and start building.
The research phase was the right place to try algorithms, engineer features freely, and iterate without consequence. The pipeline phase is the right place to lock those decisions into a structure that another person — or a future version of yourself — can run, inspect, and modify without reconstructing the reasoning from scratch.
The skill is knowing which phase you are in, and not conflating the two.
This project shows the translation between the two: modeling decisions converted into a Python workflow with clear module boundaries, saved artifacts, and a consistent path from raw data to predictions — built to be extended rather than re-explained every time it runs.
Future Steps
The architecture is structured so the remaining steps are additions, not rewrites:
Other Projects
View all →
Superstore Sales Performance Dashboard
Built an executive Power BI dashboard on the Superstore dataset for quarterly sales, regional performance, categories, and customer segments.

Maven Landing Page A/B Test Analysis
Analyzed a landing page experiment with traffic filtering, bounce-rate testing, guardrails, and treatment-effect cuts.

Predicting Advertising Recall from Brain Signals
My M.Sc. thesis at Politecnico di Milano, I built a machine learning pipeline that predicts whether a TV advertisement will be remembered - before it ever airs - using EEG brain signals recorded from viewers.
CONTACT
Want to compare notes on a project?
I'm always up for a sharp data or product conversation.
Get in touch