danielnnadi / inventory-forecasting / demand_forecast.py
Python
scikit-learn
Case Study

How I Built a Python Demand Forecasting Model That Saved $18,000 in Emergency Procurement

D
pandas NumPy scikit-learn Supply Chain

Emergency procurement is expensive. Not just financially - it's disruptive. When a part runs out unexpectedly, everything stops: you're calling suppliers at the last minute, paying premium prices, and watching operational timelines slip. At the automotive parts company where I work, this was happening regularly across our 500+ SKU inventory.

The root cause wasn't bad purchasing decisions. It was reactive purchasing decisions - we were restocking after stock hit zero rather than before. I was asked to look at whether data could help. This is what I built, what I learned, and what it saved.

$18k Annual savings
65% Fewer emergency orders
12+ Months of training data

Understanding the Problem First

Before writing a single line of Python, I spent time understanding the procurement workflow. Emergency orders were being triggered when warehouse staff noticed a part was nearly gone - by which point the lead time from suppliers meant we'd already have a gap. The question wasn't "can we predict demand?" but specifically: "can we predict demand early enough to order within normal lead times?"

That reframing was important. I wasn't trying to build a state-of-the-art forecasting engine. I was trying to predict, for each SKU, whether stock would fall below the reorder threshold within the next 14 days - early enough for a standard purchase order to work.

Design Principle

Build the model your business needs, not the most technically impressive model. A well-scoped simple model beats an ambitious complex one that doesn't fit the workflow.

The Data

I pulled 13 months of stock movement data from our SQL inventory database - daily snapshots of quantity on hand per SKU, combined with purchase order history, supplier lead times, and sales records. After cleaning, I had roughly 195,000 rows across 512 SKUs.

Python 01_load_and_clean.py
import pandas as pd
import numpy as np
from sqlalchemy import create_engine

# Connect to inventory database
engine = create_engine("mssql+pyodbc://user:pass@server/InventoryDB")

# Pull 13 months of daily stock movements
query = """
    SELECT
        sm.sku_id,
        sm.movement_date,
        sm.qty_on_hand,
        sm.qty_dispatched,
        sm.qty_received,
        i.reorder_threshold,
        i.category_code,
        s.avg_lead_days
    FROM stock_movements sm
    JOIN inventory i ON sm.sku_id = i.sku_id
    JOIN suppliers s ON i.primary_supplier_id = s.supplier_id
    WHERE sm.movement_date >= DATEADD(MONTH, -13, GETDATE())
"""

df = pd.read_sql(query, engine)

# Parse dates and sort
df['movement_date'] = pd.to_datetime(df['movement_date'])
df = df.sort_values(['sku_id', 'movement_date'])

# Flag: will stock drop below threshold in next 14 days?
df['target'] = (
    df.groupby('sku_id')['qty_on_hand']
      .transform(lambda x: x.shift(-14) < df['reorder_threshold'])
).astype(int)

print(df['target'].value_counts())
# Output:
# 0    162,043 (no stockout risk)
# 1     33,211 (stockout risk within 14 days)

Feature Engineering

Raw stock numbers aren't enough. The model needs context: how fast is this SKU moving? Is demand seasonal? How many days of stock remain at the current consumption rate? I engineered several features to capture this.

Python 02_features.py
def engineer_features(df):
    # Rolling average daily consumption (7-day and 30-day)
    df['avg_consumption_7d'] = (
        df.groupby('sku_id')['qty_dispatched']
          .transform(lambda x: x.rolling(7, min_periods=1).mean())
    )
    df['avg_consumption_30d'] = (
        df.groupby('sku_id')['qty_dispatched']
          .transform(lambda x: x.rolling(30, min_periods=1).mean())
    )

    # Days of stock remaining at current consumption rate
    df['days_of_stock'] = np.where(
        df['avg_consumption_7d'] > 0,
        df['qty_on_hand'] / df['avg_consumption_7d'],
        999  # no consumption = no risk
    )

    # Stock buffer relative to reorder threshold
    df['stock_buffer_ratio'] = (
        df['qty_on_hand'] / df['reorder_threshold'].clip(lower=1)
    )

    # Month and day-of-week for seasonality
    df['month'] = df['movement_date'].dt.month
    df['day_of_week'] = df['movement_date'].dt.dayofweek

    return df

df = engineer_features(df)
Watch Out: Data Leakage

When creating lag features, always use .shift() so the model never sees "future" consumption during training. I made this mistake early and my validation accuracy was suspiciously high - a sure sign of leakage.

Model Selection and Training

I tested three approaches: a simple threshold rule (if days_of_stock < lead_days, flag it), a Gradient Boosting Classifier, and a Random Forest. The threshold rule had reasonable recall but too many false positives - it was essentially flagging the same SKUs every day. Random Forest gave the best balance between precision and recall on the holdout set.

Python 03_train.py
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import TimeSeriesSplit
from sklearn.metrics import classification_report, precision_recall_curve
from sklearn.preprocessing import LabelEncoder

FEATURES = [
    'qty_on_hand', 'avg_consumption_7d', 'avg_consumption_30d',
    'days_of_stock', 'stock_buffer_ratio',
    'avg_lead_days', 'month', 'day_of_week',
    'category_encoded'
]

le = LabelEncoder()
df['category_encoded'] = le.fit_transform(df['category_code'])

X = df[FEATURES]
y = df['target']

# Time-series split - NEVER use random shuffle on time-series data
tscv = TimeSeriesSplit(n_splits=5)

model = RandomForestClassifier(
    n_estimators=200,
    max_depth=12,
    class_weight='balanced',  # handle class imbalance
    random_state=42,
    n_jobs=-1
)

# Train on all but last fold, validate on last
for fold, (train_idx, val_idx) in enumerate(tscv.split(X)):
    X_train, X_val = X.iloc[train_idx], X.iloc[val_idx]
    y_train, y_val = y.iloc[train_idx], y.iloc[val_idx]
    model.fit(X_train, y_train)
    preds = model.predict(X_val)
    print(f"Fold {fold+1}:", classification_report(y_val, preds))

Results Across Model Comparison

Model Precision Recall F1 Score False Positives / week
Threshold Rule 0.51 0.79 0.62 approximately 47
Gradient Boosting 0.74 0.71 0.72 approximately 18
Random Forest ✓ 0.78 0.76 0.77 approximately 11

Precision mattered more than recall here. False positives (unnecessary early orders) cost money and storage space. False negatives (missed stockouts) cost more - but the purchasing team had a manual fallback. So I tuned the classification threshold slightly toward precision.

Deployment: Making It Actionable

A model that lives in a Jupyter notebook saves no one any money. The final step was integrating predictions into our daily workflow. I wrote a script that runs each morning, scores all active SKUs, and outputs a prioritised reorder list - sorted by days of stock remaining - directly into the shared operations folder as a formatted Excel report.

Python 04_daily_score.py
import schedule import time import joblib model = joblib.load("models/rf_demand_v2.pkl") def generate_daily_reorder_list(): # Pull today's snapshot df_today = get_today_snapshot(engine) df_today = engineer_features(df_today) # Score with model df_today['stockout_risk'] = model.predict_proba( df_today[FEATURES] )[:, 1] # Filter and sort by urgency alerts = ( df_today[df_today['stockout_risk'] > 0.65] .sort_values('days_of_stock') [['sku_id', 'product_name', 'qty_on_hand', 'days_of_stock', 'stockout_risk', 'avg_lead_days']] ) # Export to shared ops folder today_str = pd.Timestamp.today().strftime('%Y-%m-%d') alerts.to_excel( f"//ops-share/reorder-alerts/alert_{today_str}.xlsx", index=False ) print(f"[{today_str}] {len(alerts)} SKUs flagged for reorder") # Schedule for 07:00 daily schedule.every().day.at("07:00").do(generate_daily_reorder_list) while True: schedule.run_pending() time.sleep(60)

What Actually Drove the $18,000 Saving

The saving wasn't from perfect predictions - the model isn't perfect. It came from shifting purchasing from emergency channels (phone orders, courier freight, above-list prices) to standard purchase orders placed with 7-14 days lead time. That alone is the price difference. The model reduced emergency orders by 65% over the first 8 months of deployment. The cost differential per emergency order averaged around $55-80, across roughly 300-350 fewer emergency orders per year.

"The model's job isn't to be right every time. Its job is to shift enough decisions from reactive to proactive that the cost difference compounds into real savings."

What I Would Do Differently

  1. Model per SKU category sooner. I initially trained one model across all SKUs. Brake parts and engine components have very different demand patterns. Splitting by category_code improved F1 by approximately 4 points after retraining.
  2. Include supplier reliability as a feature. Some suppliers have variable lead times. A SKU flagged 10 days before threshold with a supplier who averages 12 days is already a problem. I added lead time variance in v2.
  3. Build the Excel output earlier. I initially delivered predictions as a CSV, which the purchasing team didn't adopt. Formatting the output as a colour-coded Excel report with priority tiers drove actual usage. Adoption is the metric that matters.

The full code, including feature engineering and the trained model weights, is available on my GitHub. If you're working on a similar problem in supply chain or operations, feel free to open an issue or fork the repo.

Python Machine Learning pandas scikit-learn Supply Chain Demand Forecasting Random Forest Data Science