By Okparaji Wisdom | Data Scientist | Nigeria
Retailers in Nigeria lose millions of naira every year to two problems: stockouts (shelves go empty, customers leave) and overstock (too much inventory, capital tied up, goods expire). Both are avoidable with data.
So I built DemandForecast AI — a machine learning–powered app that predicts weekly product demand up to 26 weeks ahead, across 20 products in 4 retail categories.
In this article I’ll walk you through exactly how I built it, the technical decisions I made, and what I learned.
What the App Does
- Forecasts weekly demand for 20 retail products (Electronics, Fashion, Food & Grocery, Home & Living)
- Supports forecast horizons from 4 to 26 weeks
- Models Nigerian festivity demand spikes (December, Easter, New Year)
- Analyses the impact of promotions on demand lift
- Displays confidence bands on every forecast
- Shows model performance metrics (MAPE, MAE, RMSE) for all 20 models
Live app: [https://demandforecast-ai-78egnrsv5ijehv4sayrduu.streamlit.app/]
GitHub: github.com/Santandave961/demandforecast-ai
The Dataset
I generated a synthetic retail dataset of 3,140 weekly records spanning January 2022 to December 2024, covering 20 products across 4 categories.
Each record contains:
{
"date": "2022-01-02",
"category": "Food & Grocery",
"product": "Rice (5kg)",
"units_sold": 412,
"price_naira": 18500.00,
"promotion": 0,
"month": 1,
"week_of_year": 1,
"year": 2022,
"quarter": 1
}
The demand values were generated with realistic business logic baked in — trend, seasonality, and Nigerian festivity boosts:
prob = (
base_demand * (1 + trend * i + seasonal + festivity_boost)
+ np.random.normal(0, base_demand * 0.08)
)
Nigerian festivity boosts applied:
- December → +35% (Christmas & New Year)
- January → +20% (New Year spending)
- April → +15% (Easter)
- November → +10% (pre-Christmas buildup)
Promotions randomly fire 15% of the time and boost demand by 25% while cutting price by 15% — simulating real promotional mechanics.
Feature Engineering
Raw dates aren’t useful to ML models. I converted them into meaningful numerical features using Fourier transforms to capture seasonality:
df["time_index"] = (df["date"] - df["date"].min()).dt.days
df["sin_week"] = np.sin(2 * np.pi * df["week_of_year"] / 52)
df["cos_week"] = np.cos(2 * np.pi * df["week_of_year"] / 52)
df["sin_month"] = np.sin(2 * np.pi * df["month"] / 12)
df["cos_month"] = np.cos(2 * np.pi * df["month"] / 12)
df["is_q4"] = (df["quarter"] == 4).astype(int)
Why Fourier features?
A raw month column tells the model January = 1 and December = 12, but doesn’t tell it they’re actually close together in seasonal behaviour. Sine and cosine transforms encode the circular nature of time — so the model understands that week 52 and week 1 are neighbours, not opposites.
The full feature set:
feature_cols = [
"time_index", # captures long-term trend
"sin_week", # weekly seasonality
"cos_week",
"sin_month", # monthly seasonality
"cos_month",
"is_q4", # Q4 festivity flag
"promotion", # promo indicator
"price_naira" # price elasticity
]
The Model
I trained a separate Linear Regression model for each of the 20 products. Each model learns the trend, seasonality pattern, and price/promo sensitivity specific to that product.
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error
model = LinearRegression()
model.fit(X_train, y_train)
preds = np.clip(model.predict(X_test), 0, None) # demand can't be negative
Why not XGBoost or Prophet?
I specifically chose Linear Regression + Fourier features for the Streamlit Cloud deployment because:
- No extra dependencies — scikit-learn is pre-installed everywhere
- Fast training — all 20 models train in under a second on app startup
- Fourier features do the heavy lifting for seasonality, so a linear model performs well
- XGBoost fails silently on some Streamlit Cloud Python versions
In a production system I would use Prophet or XGBoost with lag features for higher accuracy.
Model Performance
Evaluation on the last 12 weeks (held-out test set) per product:
| Metric | Value |
|---|---|
| Avg MAPE | ~9.5% |
| Avg MAE | ~28 units |
| Avg RMSE | ~35 units |
MAPE (Mean Absolute Percentage Error) below 10% is generally considered good for retail demand forecasting.
mae = mean_absolute_error(y_test, preds)
rmse = np.sqrt(mean_squared_error(y_test, preds))
mape = np.mean(np.abs((y_test.values - preds) / (y_test.values + 1))) * 100
Note: I add 1 to the denominator to avoid division by zero on weeks with zero demand.
Forecasting Future Demand
For future periods, I generate the feature rows synthetically — extending the time index forward and computing future Fourier values from the future dates:
def make_future_features(last_date, last_time_idx, periods, avg_price, promo_rate):
rows = []
for i in range(1, periods + 1):
future_date = last_date + pd.Timedelta(weeks=i)
week = future_date.isocalendar()[1]
month = future_date.month
rows.append({
"date": future_date,
"time_index": last_time_idx + i * 7,
"sin_week": np.sin(2 * np.pi * week / 52),
"cos_week": np.cos(2 * np.pi * week / 52),
"sin_month": np.sin(2 * np.pi * month / 12),
"cos_month": np.cos(2 * np.pi * month / 12),
"is_q4": int(((month - 1) // 3 + 1) == 4),
"promotion": 1 if np.random.rand() < promo_rate else 0,
"price_naira": avg_price * np.random.uniform(0.95, 1.05),
})
return pd.DataFrame(rows)
Confidence bands are approximated as ±12% around the point forecast — a simple but visually useful representation of uncertainty.
The Streamlit App
The app has 5 pages:
- Forecast — select product, horizon, promo rate → get forecast chart + table
- Model Performance — MAPE and RMSE charts for all 20 models
- Trend Explorer — historical demand lines + monthly seasonality heatmap
- Insights — promo impact analysis + Nigerian festivity calendar
- About — project details and links
One important Streamlit trick I used — @st.cache_resource to train all 20 models once at startup and reuse them across sessions:
@st.cache_resource
def train_all_models(df):
models, metrics = {}, {}
for product in df["product"].unique():
# train and store each model
models[product] = model
return models, metrics, feature_cols
Without this, the app would retrain 20 models on every user interaction — very slow.
Deployment
Deployed on Streamlit Community Cloud in 3 steps:
- Push to GitHub
- Connect repo at share.streamlit.io
- Add
runtime.txtcontaining3.11to pin Python version
The runtime.txt file is critical — without it Streamlit Cloud may use Python 3.14+ which breaks some dependencies silently.
What I’d Improve in v2
- Replace Linear Regression with Prophet for better seasonality decomposition
- Add lag features (demand from last week, last month) for autocorrelation
- Connect to a real retail database (SQLite or PostgreSQL)
- Add inventory optimisation — recommend reorder points based on forecasts
- Deploy as a FastAPI backend with a Streamlit frontend
Key Takeaways
- Fourier features are a powerful, lightweight way to encode seasonality without needing Prophet
- Training one model per SKU beats training one global model when products have very different demand patterns
-
@st.cache_resourceis essential for any Streamlit app that trains models at startup - Nigerian retail has strong festivity-driven seasonality that generic models miss — localisation matters
Connect
If you found this useful or want to collaborate on data science projects in the Nigerian tech space, connect with me:
- GitHub: github.com/Santandave961
- X: @Santandave961
- LinkedIn: Okparaji Wisdom
- Portfolio: santandave961.github.io
Tags: #python #machinelearning #datascience #streamlit #nigeria #retailtech #beginners #tutorial