AI Skill Course Course 2 · Intermediate
Module 11 of 12
Course 2 · Module 11 · 70 minutes

From notebook
to a real
production API.

A model in a Jupyter notebook helps nobody. To create value, the model has to run when users call it — reliably, at scale, with proper error handling. This module is the bridge: from model.predict() in a cell to a live HTTPS endpoint at api.yourcompany.com/predict. You'll watch a real API evolve from naive to production-grade in three stages.

You'll build
A FastAPI service
You'll test
Real HTTP requests
You'll score
Your deploy readiness
Ship it
From cell to cloud notebook model.ipynb API FastAPI cloud live localhost 1 user wrapped /predict deployed ∞ users requests · POST /predict · 200 OK · 23ms
Part 01 · The gap

What works in a notebook
breaks in production.

A Jupyter notebook is a fantastic place to develop. It is a terrible place to serve users. Four things change the moment your model has to handle real traffic.

Concurrency

One user vs. one thousand.

A notebook serves one user (you). A production API needs to handle hundreds of simultaneous requests without falling over.

You run a cell. Wait. See output.
10/s requests · auto-scaling · queues
Reliability

Crashes are catastrophic.

In a notebook, a crash just means re-run the cell. In production, a crash means downtime — and possibly waking someone up at 3am.

Bad input → exception → restart kernel
Validate everything · return proper errors · stay up
Observability

You can't watch every prediction.

You can eyeball notebook outputs. You can't eyeball a million daily predictions. You need logs, metrics, and alerts that tell you when something's wrong.

Print statements · visual inspection
Structured logs · dashboards · alerting
Reproducibility

"It worked on my machine" doesn't fly.

The model that worked in your notebook had specific Python, library, and OS versions. Production needs to recreate that environment exactly — every time.

conda env · works for you · today
Dockerfile · requirements pinned · works always
Part 02 · Hands on · Build an API in 3 stages

Watch an API evolve from
naive to production-grade.

Below: a real FastAPI service that wraps an Iris classifier. Three stages of growing sophistication. Switch between them, see the code change, and send real (simulated) HTTP requests to feel what happens at each stage. Bonus: in Stage 1, try sending broken input. The server crashes. That's the whole point — you'll fix it in Stage 2.

How to use it.

Pick a stage at the top — the code panel updates to show that stage's implementation. Then on the right, pick an endpoint, choose a body preset (or write your own JSON), and click Send Request. The simulated server runs the actual logic and returns a real HTTP-style response with status codes, latency, and explanatory notes.

main.py stage 1
Request
POST
Body (JSON)
Response
Send a request to see the server response
Part 03 · The 5-stage deployment pipeline

Five steps from commit
to live URL.

Every ML deployment, regardless of platform, goes through these five stages. Modern tools collapse multiple stages into one click — but knowing the underlying flow keeps you debugging effectively when something breaks.

01

Serialize

Save the trained model to disk so it can be loaded by the API. Pickle for sklearn, ONNX for cross-framework, GGUF for LLMs.

// joblib · pickle · ONNX
02

Wrap

Put the model behind an HTTP API. The whole world calls your model the same way: an HTTP request to your endpoint.

// FastAPI · Flask · LitServe
03

Containerize

Package the code + model + dependencies into a Docker image. Same image runs everywhere — your laptop, AWS, friend's cluster.

// Dockerfile · docker build
04

Deploy

Push the image to a hosting platform that runs it on demand. The platform handles HTTPS, scaling, restarts.

// HF Spaces · Railway · AWS
05

Monitor

Watch latency, errors, and prediction quality post-deployment. Set up alerts so you know before users complain.

// Datadog · Sentry · Grafana
Part 04 · Where to deploy

Six platforms.
Pick the one that fits.

Deployment platforms have multiplied in the last 5 years. These six cover 95% of cases — from "free demo this weekend" to "100M predictions a day."

Beginner · Free

Hugging Face Spaces

huggingface.co/spaces

Free hosting for ML demos. Push your Gradio or Streamlit app, get a public URL. Perfect for portfolios and quick prototypes.

Best forDemos, prototypes
CostFree
Setup time10 minutes
Scales toSmall traffic
Beginner · Free

Streamlit Cloud

share.streamlit.io

Free hosting for Streamlit apps. Connect to GitHub, push code, deploy. Excellent for data dashboards and internal tools.

Best forDashboards, internal tools
CostFree tier generous
Setup time5 minutes
Scales toMedium traffic
Intermediate · Paid

Railway / Render

railway.app · render.com

Push any Dockerfile or Python service, get a URL. No Kubernetes hell. Autoscaling, HTTPS, custom domains included.

Best forReal APIs
Cost$5+ per month
Setup time30 minutes
Scales toReal production
Intermediate · Paid

Modal / Replicate

modal.com · replicate.com

Serverless GPU-backed ML hosting. Cold-starts a container per request, scales to zero when idle. Pay per second of GPU time.

Best forGPU inference, batch jobs
CostPer-second GPU usage
Setup time1 hour
Scales toMassive bursts
Advanced · Cloud

AWS / GCP / Azure

SageMaker · Vertex · Azure ML

Full hyperscaler ML platforms. Everything you'd need at any scale — but with significant complexity. Most enterprises end up here.

Best forEnterprise scale
CostVariable, often expensive
Setup timeDays to weeks
Scales toAnything
Advanced · DIY

Self-hosted / k8s

Your own server · Kubernetes

Maximum control, maximum responsibility. Run on your own hardware or VPS. Use when costs, compliance, or latency demand it.

Best forPrivacy, custom needs
CostHardware + your time
Setup timeDays+
Scales toWhatever you build
Part 05 · Hands on · Are you ready?

12 questions.
Score your deploy-readiness.

Check off what you've actually done for a model you want to ship. The verdict at the top updates live. If you score below 7, your weekend will be longer than you planned.

Deployment readiness
0 / 12
Not ready
// Model
Model is serialized to disk
joblib, pickle, ONNX, or similar — can be loaded outside the training notebook
Model version is tracked
Git tag, MLflow run ID, or weights-and-biases — so you can roll back
Training data is versioned
DVC, snapshots, or at minimum a checksum — so retraining is reproducible
// API
Input validation (Pydantic/schemas)
Bad inputs return 422 errors, not 500 crashes
Proper error handling
try/except for predictable failures, appropriate HTTP status codes
/health endpoint exists
Lets your platform's load balancer know the service is alive
// Infrastructure
Containerized (Dockerfile)
Same image runs on your laptop and in production — zero "works on my machine"
Config via environment variables
No hardcoded API keys, DB URLs, or model paths in your source
Pinned dependencies
requirements.txt or pyproject.toml with exact versions — not "scikit-learn" but "scikit-learn==1.3.2"
// Monitoring
Structured logs (inputs + outputs)
Can replay any prediction, debug any complaint — JSON logs to stdout
Latency & error rate tracked
A dashboard answers: p95 latency? error rate? requests/second?
Alerts on anomalies
Get paged before users notice — error rate spike, latency spike, model drift
Part 06 · Knowledge check

Five questions on what
you just deployed.

Aim for 4/5. Wrong answers explain themselves.

Question 01 of 05

0/5

Continue
Course 2 · Module 11 complete

You can now ship
what you build.

You watched a real API mature from naive to production-grade. You know what makes deployments reliable — and what makes them brittle. You can score your own projects against a 12-point readiness rubric. The notebook-to-production gap is no longer a mystery. It's just a checklist.

Up next · Course 2 · Module 12 · CAPSTONE

Your Own End-to-End ML Project

The grand finale. You'll synthesize every skill from this course into one complete project: pick a problem, prepare data, train a model, evaluate it, deploy it. The capstone is where the course modules become a portfolio piece.

Continue to Capstone