Emergency procurement is expensive. Not just financially - it's disruptive. When a part runs out unexpectedly, everything stops: you're calling suppliers at the last minute, paying premium prices, and watching operational timelines slip. At the automotive parts company where I work, this was happening regularly across our 500+ SKU inventory.
The root cause wasn't bad purchasing decisions. It was reactive purchasing decisions - we were restocking after stock hit zero rather than before. I was asked to look at whether data could help. This is what I built, what I learned, and what it saved.
Before writing a single line of Python, I spent time understanding the procurement workflow. Emergency orders were being triggered when warehouse staff noticed a part was nearly gone - by which point the lead time from suppliers meant we'd already have a gap. The question wasn't "can we predict demand?" but specifically: "can we predict demand early enough to order within normal lead times?"
That reframing was important. I wasn't trying to build a state-of-the-art forecasting engine. I was trying to predict, for each SKU, whether stock would fall below the reorder threshold within the next 14 days - early enough for a standard purchase order to work.
Build the model your business needs, not the most technically impressive model. A well-scoped simple model beats an ambitious complex one that doesn't fit the workflow.
I pulled 13 months of stock movement data from our SQL inventory database - daily snapshots of quantity on hand per SKU, combined with purchase order history, supplier lead times, and sales records. After cleaning, I had roughly 195,000 rows across 512 SKUs.
import pandas as pd import numpy as np from sqlalchemy import create_engine # Connect to inventory database engine = create_engine("mssql+pyodbc://user:pass@server/InventoryDB") # Pull 13 months of daily stock movements query = """ SELECT sm.sku_id, sm.movement_date, sm.qty_on_hand, sm.qty_dispatched, sm.qty_received, i.reorder_threshold, i.category_code, s.avg_lead_days FROM stock_movements sm JOIN inventory i ON sm.sku_id = i.sku_id JOIN suppliers s ON i.primary_supplier_id = s.supplier_id WHERE sm.movement_date >= DATEADD(MONTH, -13, GETDATE()) """ df = pd.read_sql(query, engine) # Parse dates and sort df['movement_date'] = pd.to_datetime(df['movement_date']) df = df.sort_values(['sku_id', 'movement_date']) # Flag: will stock drop below threshold in next 14 days? df['target'] = ( df.groupby('sku_id')['qty_on_hand'] .transform(lambda x: x.shift(-14) < df['reorder_threshold']) ).astype(int) print(df['target'].value_counts()) # Output: # 0 162,043 (no stockout risk) # 1 33,211 (stockout risk within 14 days)
Raw stock numbers aren't enough. The model needs context: how fast is this SKU moving? Is demand seasonal? How many days of stock remain at the current consumption rate? I engineered several features to capture this.
def engineer_features(df): # Rolling average daily consumption (7-day and 30-day) df['avg_consumption_7d'] = ( df.groupby('sku_id')['qty_dispatched'] .transform(lambda x: x.rolling(7, min_periods=1).mean()) ) df['avg_consumption_30d'] = ( df.groupby('sku_id')['qty_dispatched'] .transform(lambda x: x.rolling(30, min_periods=1).mean()) ) # Days of stock remaining at current consumption rate df['days_of_stock'] = np.where( df['avg_consumption_7d'] > 0, df['qty_on_hand'] / df['avg_consumption_7d'], 999 # no consumption = no risk ) # Stock buffer relative to reorder threshold df['stock_buffer_ratio'] = ( df['qty_on_hand'] / df['reorder_threshold'].clip(lower=1) ) # Month and day-of-week for seasonality df['month'] = df['movement_date'].dt.month df['day_of_week'] = df['movement_date'].dt.dayofweek return df df = engineer_features(df)
When creating lag features, always use .shift() so the model never sees "future" consumption during training. I made this mistake early and my validation accuracy was suspiciously high - a sure sign of leakage.
I tested three approaches: a simple threshold rule (if days_of_stock < lead_days, flag it), a Gradient Boosting Classifier, and a Random Forest. The threshold rule had reasonable recall but too many false positives - it was essentially flagging the same SKUs every day. Random Forest gave the best balance between precision and recall on the holdout set.
from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import TimeSeriesSplit from sklearn.metrics import classification_report, precision_recall_curve from sklearn.preprocessing import LabelEncoder FEATURES = [ 'qty_on_hand', 'avg_consumption_7d', 'avg_consumption_30d', 'days_of_stock', 'stock_buffer_ratio', 'avg_lead_days', 'month', 'day_of_week', 'category_encoded' ] le = LabelEncoder() df['category_encoded'] = le.fit_transform(df['category_code']) X = df[FEATURES] y = df['target'] # Time-series split - NEVER use random shuffle on time-series data tscv = TimeSeriesSplit(n_splits=5) model = RandomForestClassifier( n_estimators=200, max_depth=12, class_weight='balanced', # handle class imbalance random_state=42, n_jobs=-1 ) # Train on all but last fold, validate on last for fold, (train_idx, val_idx) in enumerate(tscv.split(X)): X_train, X_val = X.iloc[train_idx], X.iloc[val_idx] y_train, y_val = y.iloc[train_idx], y.iloc[val_idx] model.fit(X_train, y_train) preds = model.predict(X_val) print(f"Fold {fold+1}:", classification_report(y_val, preds))
| Model | Precision | Recall | F1 Score | False Positives / week |
|---|---|---|---|---|
| Threshold Rule | 0.51 | 0.79 | 0.62 | approximately 47 |
| Gradient Boosting | 0.74 | 0.71 | 0.72 | approximately 18 |
| Random Forest ✓ | 0.78 | 0.76 | 0.77 | approximately 11 |
Precision mattered more than recall here. False positives (unnecessary early orders) cost money and storage space. False negatives (missed stockouts) cost more - but the purchasing team had a manual fallback. So I tuned the classification threshold slightly toward precision.
A model that lives in a Jupyter notebook saves no one any money. The final step was integrating predictions into our daily workflow. I wrote a script that runs each morning, scores all active SKUs, and outputs a prioritised reorder list - sorted by days of stock remaining - directly into the shared operations folder as a formatted Excel report.
import schedule import time import joblib model = joblib.load("models/rf_demand_v2.pkl") def generate_daily_reorder_list(): # Pull today's snapshot df_today = get_today_snapshot(engine) df_today = engineer_features(df_today) # Score with model df_today['stockout_risk'] = model.predict_proba( df_today[FEATURES] )[:, 1] # Filter and sort by urgency alerts = ( df_today[df_today['stockout_risk'] > 0.65] .sort_values('days_of_stock') [['sku_id', 'product_name', 'qty_on_hand', 'days_of_stock', 'stockout_risk', 'avg_lead_days']] ) # Export to shared ops folder today_str = pd.Timestamp.today().strftime('%Y-%m-%d') alerts.to_excel( f"//ops-share/reorder-alerts/alert_{today_str}.xlsx", index=False ) print(f"[{today_str}] {len(alerts)} SKUs flagged for reorder") # Schedule for 07:00 daily schedule.every().day.at("07:00").do(generate_daily_reorder_list) while True: schedule.run_pending() time.sleep(60)
The saving wasn't from perfect predictions - the model isn't perfect. It came from shifting purchasing from emergency channels (phone orders, courier freight, above-list prices) to standard purchase orders placed with 7-14 days lead time. That alone is the price difference. The model reduced emergency orders by 65% over the first 8 months of deployment. The cost differential per emergency order averaged around $55-80, across roughly 300-350 fewer emergency orders per year.
"The model's job isn't to be right every time. Its job is to shift enough decisions from reactive to proactive that the cost difference compounds into real savings."
The full code, including feature engineering and the trained model weights, is available on my GitHub. If you're working on a similar problem in supply chain or operations, feel free to open an issue or fork the repo.