The Problem
Magento store data is customer data — orders, account details, addresses, and for merchants without a fully offloaded payment flow, payment history. Under GDPR it carries compliance obligations. Under basic business continuity requirements, losing it is unacceptable.
Managing backups per-client individually doesn’t scale. Without a centralised system, backup quality is inconsistent — some stores well-covered, others only partially, with no portfolio-wide visibility to confidently assert that all stores have working, tested recovery capability.
A second problem sits alongside this: developers need realistic data to work with locally. Synthetic data doesn’t reproduce the edge cases that appear in production data shapes. Real production data in developer environments is a compliance liability.
The Solution
A centralised backup system with tiered storage and automated integrity verification, plus a developer data pipeline that solves the realistic-data problem without touching real PII.
Automated Backups
Database snapshots: Daily, triggered by scheduled CI-style jobs (monitored — missed jobs alert). Full database dump, compressed, encrypted with a store-specific key before transmission.
Filesystem backups: Weekly, covering app/etc, pub/media, and any store-specific configuration outside version control. Incremental where possible to manage storage cost.
Trigger: Scheduled system job, with external monitoring to ensure backups land in the right place on schedule. Missed backup triggers an alert to the on-call engineer.
Storage Tiering
Hot Storage (Central Server): Immediate access for the most recent backups. Retained for 7 days to cover operational needs — including development and ephemeral environments that may need recent data.
Warm storage (S3): Most recent 30 days. Access within minutes. Covers the operational recovery use case — most recovery requests are for recent data.
Cold storage (Glacier): Beyond 30 days. Significantly lower cost per GB. Access within hours. Covers compliance and long-term retention requirements.
Retention configured per client based on their specific data retention requirements — minimum 90 days, some stores with longer retention based on their industry or contractual obligations.
Storage Locations
We mirror both Cold and Warm storage across multiple Cloud providers (AWS and GCP) to mitigate risk of provider-specific outages or issues. As well as ensuring multi-az redundancy within each provider.
Storage locations are specifically chosen to maximize durability and availability during regional outages. For example, for primary warm storages, if AWS is Stockholm, then GCP might be Madrid — different regions, different underlying infrastructure, reducing the risk of a single event impacting both copies.
Integrity Verification
Every backup is verified immediately after creation once it lands in hot storage. The verification process includes:
- Backup restored to a throwaway instance (containerised MySQL, provisioned and destroyed per verification)
- Schema integrity check — key tables present and structurally matching expected schema
- Full teardown of verification environment
A backup that can’t be verified is treated as a failed backup — alert dispatched, retry triggered. The “backup exists” assurance is only meaningful if “backup can be restored” has been verified.
Regular staging syncs also serve as ongoing verification of the restore process — staging environments are refreshed from production backups, so any issues with backup integrity or restore procedures are caught in the staging environment before they impact a real recovery scenario.
Developer Data Pipeline
A on-demand sanitisation pipeline that strips customer PII from a copy of the production database and produces a developer-usable dumps. Available as anonymised catalog-only or full dumps, with full raw dumps available for a few authorised developers.
The sanitisation process includes:
- Customer email addresses →
customer-{id}@example.com - Customer names → anonymised
- Physical addresses → replaced with plausible synthetic addresses
- Phone numbers → replaced
- Payment data → stripped entirely
- Order data, product data, category structure → preserved
The result: a database dump with real Magento data shapes — real product catalogues, real order histories, real attribute configurations — without any real customer data. The dump is available to engineers via a CLI command that pulls, imports, and sets up the local environment in a single step.
This solves two problems simultaneously. Developers get realistic data that reproduces production edge cases. Compliance teams have documented evidence that developers never access unredacted customer data.
Documentation
Per-client backup coverage is documented with:
- Storage location and retention policy
- RTO and RPO targets
- Last successful verification date and result
- Restore procedure (tested and documented, not theoretical)
Not “we have backups” — evidenced coverage that can be presented in a compliance audit or to a client’s legal team.
Impact
The portfolio has 100% documented backup coverage with verified recovery capability. The integrity verification step has caught several cases where a backup was technically present but would not have restored correctly — caught before anyone needed to rely on it.
The developer data pipeline has become a standard part of the development workflow. Engineers with realistic local data reproduce bugs faster, build features against representative data shapes, and don’t need to request production access for debugging tasks that only need data structure, not real data.
And the multi region, multi-provider storage strategy has provided re-assurance especially during region specific outages, like in the UAE during the recent Iran & USA conflict where AWS data-centers in the region were impacted.