If your company runs on an on-premise ERP, you know the drill: the data is in there, but getting it out is painful. Export to Excel, build manual reports, cross-reference spreadsheets between sales, purchasing, and inventory. Every time someone needs a number, the hunt begins.
The problem isn't your ERP. It does what it was designed to do: invoicing, inventory management, recording transactions. The problem is that your ERP wasn't designed to be your analytics tool. And forcing it into that role ends in endless spreadsheets, outdated data, and decisions made in the dark.
The solution is to extract your data and move it somewhere purpose-built for analysis. That's a cloud data warehouse. And in this guide, we'll show you exactly how to do it.
What is a data warehouse and why should you care
A data warehouse is a database designed specifically for analytics. Unlike your ERP's database (which is optimized for recording transactions), a data warehouse is optimized for answering questions: How much did we sell last month? What's our margin by product? Which customers owe us more than 90 days?
When your data lives in a data warehouse, you can:
- Build dashboards that update automatically
- Cross-reference sales with inventory with purchasing in a single view
- Run historical analysis across years without slowing down your ERP
- Share information with your team without depending on the one person who "knows the system"
We use Google BigQuery as our data warehouse of choice. It's serverless (no infrastructure to maintain), scales automatically, and costs a few dollars per month for a typical SMB.
The complete architecture: from ERP to dashboard
Before diving into the steps, here's the full picture of what we're building:
Your ERP (local server) β Extraction agent (Python) β Google Cloud Storage (raw files) β BigQuery (data warehouse) β dbt (transformation) β Looker Studio (dashboards)
It sounds like a lot, but each piece has a clear role and once it's set up, it runs on its own. Let's go piece by piece.
Step 1: Understand what data your ERP holds
Most on-premise ERPs run on SQL Server (Express or full), PostgreSQL, or MySQL. All the information you see on screen β customers, products, invoices, inventory, vendors β lives in SQL tables that you can query directly.
The most relevant tables for analytics typically fall into these categories:
- Sales transactions: invoice headers and line items
- Inventory: current stock levels by warehouse
- Products/items: master catalog with descriptions, categories, costs
- Customers: contact info, credit terms, sales history
- Purchasing: purchase orders, vendor invoices
- Accounts receivable: outstanding balances, aging
You don't need to extract everything. Start with the tables that answer the most urgent questions for your business. In our experience, sales + inventory + receivables cover 80% of what an SMB needs to see.
Never modify your ERP's tables directly. All database operations should be read-only (SELECT). Modifying data can break the system's integrity.
Step 2: Extract data automatically
The extraction is handled by a lightweight Python agent that runs on the same server as your ERP (or any machine with database access).
The agent does three things:
- Connects to SQL Server using
pyodbcand runs read-only queries on the ERP tables - Generates compressed CSV files (
.csv.gz) with the extracted data - Uploads the files to Google Cloud Storage via HTTPS
The agent is scheduled with Windows Task Scheduler (or cron on Linux) to run once or twice daily β typically in the early morning when nobody is using the ERP. Each run takes between 1 and 5 minutes depending on data volume.
# Simplified extraction flow
import pyodbc
import pandas as pd
from google.cloud import storage
# Connect to SQL Server (ERP database)
conn = pyodbc.connect(
'DRIVER={SQL Server};'
'SERVER=localhost;'
'DATABASE=ERP_DB;'
'Trusted_Connection=yes;'
)
# Extract sales transactions
query = """
SELECT h.*, d.item_code, d.quantity, d.unit_price
FROM sales_header h
INNER JOIN sales_detail d ON h.invoice_id = d.invoice_id
WHERE h.invoice_date >= DATEADD(month, -3, GETDATE())
"""
df = pd.read_sql(query, conn)
# Compress and upload to Cloud Storage
df.to_csv('/tmp/sales.csv.gz', compression='gzip', index=False)
# ... upload to GCS bucketThe agent doesn't need ERP application credentials. It connects directly to the SQL Server instance using Windows authentication or a read-only database user. It never touches the ERP application itself.
What if my ERP is hosted remotely?
If your ERP runs on a remote server or terminal server (common with hosted setups), the agent is installed directly on that server. You just need your hosting provider to grant access to install Python and schedule a task. It's a lightweight script that doesn't impact ERP performance.
Step 3: Store in the cloud (Data Lake)
The compressed CSV files land in a Google Cloud Storage bucket organized by date:
gs://your-company-datalake/
βββ raw/
βββ 2026-04-20/
β βββ sales.csv.gz
β βββ inventory.csv.gz
β βββ customers.csv.gz
βββ 2026-04-19/
βββ sales.csv.gz
βββ ...
This is your "smart backup" of business data. Even though your ERP continues running normally, you have a complete history of your data in the cloud, organized and versioned. That alone has value: if you ever need to see what your inventory looked like 6 months ago, it's there.
Storage cost for a typical SMB is less than USD 1 per month. Literally pennies.
Step 4: Load into BigQuery (Data Warehouse)
From Cloud Storage, data is automatically loaded into BigQuery as raw tables. This load can be triggered every time a new file arrives, or scheduled with Cloud Scheduler.
At this stage, data arrives exactly as it came from the ERP: same column names, same formats, same codes. It's not easy to analyze yet β but it's now in a place where we can work with it.
BigQuery
βββ your_company_raw
βββ raw_sales (invoice headers + details)
βββ raw_customers (customer master)
βββ raw_items (product catalog)
βββ raw_inventory (stock levels)
βββ raw_purchasing (purchase orders)
βββ raw_receivables (account balances)
Step 5: Transform with dbt (the key step)
This is where raw data becomes useful information. dbt (data build tool) transforms data using pure SQL, with built-in testing and documentation.
What dbt does in our pipeline:
- Cleaning: renames cryptic ERP columns (like
GVA12.T_COMPβdocument_type), converts date formats, handles null values - Standardization: normalizes categories, unifies codes, creates calculated fields (margin, receivables aging, days of stock)
- Modeling: creates clean dimensions and fact tables ready for analytics
The result is clean, clear tables:
BigQuery
βββ your_company_analytics
βββ dim_customers (name, region, category, tenure)
βββ dim_items (description, category, family, cost)
βββ fact_sales (date, customer, item, qty, amount, margin)
βββ fact_purchasing (date, vendor, item, qty, amount)
βββ fact_inventory (item, warehouse, qty, valuation)
βββ fact_receivables (customer, document, amount, balance, aging)
The advantage of dbt is that every transformation is documented and tested. If a margin calculation goes wrong, there's a test that catches it automatically. It's the difference between a fragile spreadsheet and a robust system.
Step 6: Visualize with Looker Studio (Dashboards)
With clean data in BigQuery, building dashboards is the fastest part. We use Looker Studio (free from Google), which connects natively to BigQuery.
The 5 dashboards we build as a baseline:
- Executive: total sales, margin, top customers, monthly trend
- Sales: analysis by rep, product, region, year-over-year comparison
- Inventory: current levels, stockouts, overstock, turnover
- Receivables: total debt, aging, delinquent accounts, trend
- Purchasing: spend by vendor, payment terms, price comparison
Each dashboard updates automatically with every extraction. Your team opens a link, sees today's numbers, and makes decisions. No asking someone to "pull the data."
What this costs
Let's be concrete about monthly costs for a typical SMB:
- Google Cloud Storage: less than USD 1/month
- BigQuery: between USD 5 and USD 20/month (depends on query volume)
- Looker Studio: free
- Agent server: you already have it (same server running your ERP)
- Cloud Scheduler + Functions: less than USD 1/month
Total infrastructure cost: USD 10-25 per month. Less than a license for any reporting tool.
Development and implementation cost is separate and depends on your case's complexity, but it's a one-time investment. Once built, the pipeline runs on its own with minimal operational costs.
How long does implementation take
For a typical SMB with a standard ERP:
- Week 1: Table mapping, query definition, agent installation
- Week 2: Cloud Storage + BigQuery setup, initial data loads
- Week 3: dbt transformations (staging + analytics models)
- Week 4: Looker Studio dashboards + team feedback and adjustments
In one month you have the complete pipeline running. And from there, adding new tables, new dashboards, or new analyses is incremental β you never start from scratch.
What if I use a different ERP?
The architecture is identical for any ERP running on a relational database. Whether you use SAP Business One, Odoo, NetSuite, or any other, what changes are the extraction queries and the transformation models. The cloud infrastructure is the same.
Even if your data source is an Excel file that someone exports daily, we can start there and professionalize later.
Conclusion
Having your company's data trapped in a local ERP is like having money in a safe without the key. You know it's there, but you can't use it when you need it.
Connecting your ERP to a cloud data warehouse isn't an innovation project. It's basic infrastructure that every SMB should have: accessible data, up-to-date dashboards, and decisions based on real information instead of intuition.
The cost is low, implementation is fast, and the impact is immediate from the first dashboard your team opens in the morning.
At Richmond Analytics, we design and build this architecture for SMBs. If you want to see your business data in dashboards that update automatically, let's talk.
