Project 01
Network Anomaly Detection
Production-grade ML pipeline for network intrusion detection — trained on 2.83M labelled flows, containerised FastAPI inference deployed on AWS ECS Fargate, with automated CI/CD and runtime drift monitoring.
Overview
Built a full production ML pipeline for network intrusion detection on the CICIDS2017 dataset — 2.83M labelled flows, 15 attack classes, 48 features after preprocessing.
The core model is an XGBoost classifier (macro F1 = 0.9036) trained with sample weights to handle class imbalance without distorting the data distribution. An Isolation Forest, trained exclusively on benign traffic, acts as a novelty gate — flagging out-of-distribution flows before XGBoost classifies them.
The pipeline runs end-to-end in production: a FastAPI inference API with four endpoints is containerised with Docker, pushed to AWS ECR, and deployed on ECS Fargate. GitHub Actions handles CI/CD — a push to main triggers a full rebuild and redeploy. Model artifacts are stored in S3 and pulled at startup via boto3. Runtime drift is monitored per feature using KS tests, logged to MLflow.
Pipeline
01 Preprocessing
feature_names.json and scaler.pkl
to artifacts/. 79 raw features reduced to 48.
02 Training
03 Ensemble
04 Inference API
/predict/classify,
/predict/anomaly, /predict/ensemble, /predict/batch.
Model artifacts pulled from AWS S3 at startup via boto3. Interactive docs at /docs.
05 CI/CD & Drift Monitoring
drift.py computes per-feature Kolmogorov–Smirnov statistics against the
training distribution at runtime. KS > 0.5 triggers a high-drift alert; results logged to MLflow.
Model Performance
| Model | Approach | Macro F1 |
|---|---|---|
| Random Forest v1 | SMOTE oversampling | 0.8785 |
| Random Forest v2 | class_weight='balanced' | 0.8696 |
| Keras DNN | Categorical cross-entropy | 0.3371 |
| XGBoost | Sample weights (production) | 0.9036 |
| Isolation Forest | BENIGN-only novelty detection | — |
| IF → XGBoost Ensemble | Novelty gate + classifier | 0.87* |
* Ensemble F1 is lower than standalone XGBoost by design — IF gates traffic that XGBoost was never trained on. The ensemble's value is breadth of detection, not benchmark maximisation.
Design Decisions
Sample weights over SMOTE
SMOTE introduces synthetic samples that can distort minority-class decision boundaries. Sample weights rebalance training without touching the data distribution — preferable for production where inference runs on raw flows.
No log1p transformation
Tree-based models are invariant to monotonic feature transformations. Log1p was evaluated and deliberately dropped; the notebooks document this with supporting evidence.
Isolation Forest as a novelty gate, not a classifier
IF is trained on BENIGN-only traffic to flag out-of-distribution flows. Hyperparameter tuning was deliberately skipped — overfitting the contamination parameter to labelled benchmarks would undermine its real purpose.
Deterministic preprocessing
Correlation-based feature dropping uses np.triu(k=1) to guarantee a stable, reproducible column set regardless of execution order.
Project Structure
Expand a folder to browse. Click any filename to open it on GitHub.
notebooks/
src/
api/
monitor/
artifacts/ git-ignored · feature_names.json, scaler.pkl
models/ git-ignored · stored in AWS S3
What I Learned
Getting XGBoost to 0.90 macro F1 was the straightforward part — the harder work was designing the system around it. Keeping preprocessing deterministic, giving the Isolation Forest a genuinely different purpose from the classifier, and arguing why a lower ensemble F1 is actually the correct outcome all required thinking beyond benchmark numbers.
Deploying to ECS Fargate and wiring up CI/CD made clear how much production ML differs from notebook ML. Artifact management, environment variables, container versioning, and drift monitoring never appear in a Kaggle competition — but they dominate real deployments.