The Problem
Magento store data is customer data — orders, account details, addresses, and for merchants without a fully offloaded payment flow, payment history. Under GDPR it carries compliance obligations. Under basic business continuity requirements, losing it is unacceptable.
Managing backups per-client individually doesn’t scale. Without a centralised system, backup quality is inconsistent — some stores well-covered, others only partially, with no portfolio-wide visibility to confidently assert that all stores have working, tested recovery capability.
A second problem sits alongside this: developers need realistic data to work with locally. Synthetic data doesn’t reproduce the edge cases that appear in production data shapes. Real production data in developer environments is a compliance liability.
The Solution
A centralised backup system with tiered storage and automated integrity verification, plus a developer data pipeline that solves the realistic-data problem without touching real PII.
Automated Backups
Database snapshots: Daily, triggered by scheduled CI-style jobs (monitored — missed jobs alert). Full database dump, compressed, encrypted with a store-specific key before transmission. Each dump includes a manifest: store identifier, Magento version, database schema hash, row counts by key table.
Filesystem backups: Weekly, covering app/etc, pub/media, and any store-specific configuration outside version control. Incremental where possible to manage storage cost.
Trigger: Scheduled Ansible playbook, not server-side cron. Server cron is invisible to monitoring; CI-style job scheduling integrates with alerting infrastructure.
Storage Tiering
Warm storage (S3): Most recent 30 days. Access within minutes. Covers the operational recovery use case — most recovery requests are for recent data.
Cold storage (Glacier): Beyond 30 days. Significantly lower cost per GB. Access within hours. Covers compliance and long-term retention requirements.
Retention configured per client based on their specific data retention requirements — minimum 90 days, some stores with longer retention based on their industry or contractual obligations.
Integrity Verification
Every backup is verified immediately after creation:
- Backup restored to a throwaway instance (containerised MySQL, provisioned and destroyed per verification)
- Row counts checked against the manifest — significant delta triggers an alert
- Schema integrity check — key tables present and structurally matching expected schema
- Full teardown of verification environment
A backup that can’t be verified is treated as a failed backup — alert dispatched, retry triggered. The “backup exists” assurance is only meaningful if “backup can be restored” has been verified.
Developer Data Pipeline
A sanitisation pipeline that strips customer PII from a copy of the production database and produces a developer-usable dump:
- Customer email addresses →
customer-{id}@example.com - Customer names → anonymised
- Physical addresses → replaced with plausible synthetic addresses
- Phone numbers → replaced
- Payment data → stripped entirely
- Order data, product data, category structure → preserved
The result: a database dump with real Magento data shapes — real product catalogues, real order histories, real attribute configurations — without any real customer data. The dump is available to engineers via a CLI command that pulls, imports, and sets up the local environment in a single step.
This solves two problems simultaneously. Developers get realistic data that reproduces production edge cases. Compliance teams have documented evidence that developers never access unredacted customer data.
Documentation
Per-client backup coverage is documented with:
- Storage location and retention policy
- RTO and RPO targets
- Last successful verification date and result
- Restore procedure (tested and documented, not theoretical)
Not “we have backups” — evidenced coverage that can be presented in a compliance audit or to a client’s legal team.
Impact
The portfolio has 100% documented backup coverage with verified recovery capability. The integrity verification step has caught several cases where a backup was technically present but would not have restored correctly — caught before anyone needed to rely on it.
The developer data pipeline has become a standard part of the development workflow. Engineers with realistic local data reproduce bugs faster, build features against representative data shapes, and don’t need to request production access for debugging tasks that only need data structure, not real data.