Project Phoenix Rebuilding: End-to-End Backup and Access Architecture
This writeup documents the full buildout of the environment, including OneDrive to S3 backup, NAS to S3 backup, lifecycle rules into Deep Archive, EC2 role permissions, non-EC2 key usage, Microsoft Entra app registrations for stable OAuth handling, and Cloudflare Tunnel for EC2, Windows, and Synology DSM. What started as “protect my own data without spending a fortune” became a full architecture exercise around identity, object storage, validation, lifecycle economics, and zero-trust access.
1. Objectives
The design goals were simple on paper:
- Back up Microsoft OneDrive reliably without refresh token problems killing long jobs.
- Back up a 100–130 TB NAS footprint in phases.
- Use object storage as the landing zone.
- Use archive storage for long-term cost control.
- Eliminate exposed ports and avoid legacy VPN dependency.
- Make the system converge over time instead of depending on one perfect transfer.
- Maintain a true disaster-recovery backup, including NAS sidecar/indexing artifacts where desired.
The final design naturally split into two planes:
Data plane
OneDrive / NAS → rclone → S3 bucket → Glacier Deep Archive
Access plane
Client → Cloudflare Access → Entra ID → Tunnel → protected service
2. Core Components
- rclone
- AWS S3
- AWS Glacier Deep Archive
- AWS EC2
- IAM roles and IAM user keys
- Microsoft Entra app registrations
- Cloudflare Tunnel / cloudflared
- Synology DSM + Docker
3. S3 Bucket Layout
The primary bucket used was:
phoenix-backup-archivePrefixes were organized by purpose:
onedrive-chris/
onedrive-pam/
nas/Under nas/, prefixes mirrored the exact NAS folder structure:
nas/Movies - NAS/
nas/UHD Movies - NAS/
nas/TV Shows - NAS/
nas/Documents/
nas/music/
nas/video/That exact namespace symmetry mattered because it ensured future delta copies from the NAS would line up with already-uploaded content instead of duplicating or restructuring it.
4. Lifecycle Rules and Deep Archive
The storage cost model depended on not leaving backup data in S3 Standard any longer than necessary. S3 Standard was treated as the landing zone. Glacier Deep Archive was treated as the resting state.
The most important lesson here was that S3 lifecycle rules match on bucket-relative prefixes, not URIs and not filesystem-style paths.
Correct examples:
nas/video/
nas/music/
onedrive-chris/Incorrect examples that matched nothing:
s3://phoenix-backup-archive/nas/video/
/nas/video/Because versioning was enabled, lifecycle had to consider both current and noncurrent versions:
- Current versions → transition to Deep Archive after 7 days
- Noncurrent versions → transition later
- Optional version cleanup later
Another operational lesson: lifecycle does not run immediately when a rule is created or corrected. Objects become eligible, then AWS evaluates and processes them asynchronously. There is no “run now” button.
That meant lifecycle should not be enabled for a prefix until the copy had completed, validation had completed, and the dataset was considered ready for archive.
5. EC2 Role Permissions for S3
The EC2 instance used role-based auth, not hardcoded keys.
That meant the EC2 instance could talk to S3 using temporary credentials from the attached IAM role. The role needed permissions like:
- s3:ListBucket
- s3:GetBucketLocation
- s3:PutObject
- s3:GetObject
- s3:AbortMultipartUpload
- multipart/list permissions as needed
The EC2 S3 remote looked like this:
[aws_archive]
type = s3
provider = AWS
env_auth = true
region = us-east-2This worked on EC2 because AWS provides role credentials automatically through the instance metadata service. It does not work the same way on a Synology or workstation outside AWS.
6. Access Key / Secret Key Usage Outside EC2
For systems outside AWS, role-style EC2 auth was not available. That applied to the Windows workstation ingest host and the Synology NAS.
So an IAM user with scoped bucket permissions was created for non-EC2 access. The handling model was straightforward:
- Create IAM user
- Generate Access Key ID and Secret Access Key
- Use those in rclone.conf on non-EC2 systems
The actual secret key is intentionally not reproduced here.
7. rclone Configuration Model
EC2 S3 remote
[aws_archive]
type = s3
provider = AWS
env_auth = true
region = us-east-2Non-EC2 S3 remote
[aws_archive]
type = s3
provider = AWS
access_key_id = <redacted>
secret_access_key = <redacted>
region = us-east-2Keeping the remote name aws_archive the same across environments reduced command drift and made scripts portable.
8. OneDrive Authentication Problem and Entra App Fix
Default OneDrive auth behavior through rclone was not sufficient for unattended long-running jobs. Token handling kept becoming a problem.
Symptoms included:
- Token refresh problems
- /common endpoint issues
- Refresh-token-related failures
- Long-running transfers dying
The fix was to use a custom Microsoft Entra app registration and tenant-specific auth endpoints.
Key settings included:
- Single-tenant app registration
- Redirect URI: http://localhost:53682/
- Delegated Microsoft Graph permissions appropriate for OneDrive access
- Critically included: offline_access
Tenant-specific endpoints used:
https://login.microsoftonline.com/<tenant-id>/oauth2/v2.0/authorize
https://login.microsoftonline.com/<tenant-id>/oauth2/v2.0/tokenThat stabilized the auth flow and solved the refresh-token problem cleanly.
9. OneDrive Backup Execution
Separate OneDrive remotes were maintained for separate datasets, such as od_chris_app and od_pam_app.
The backup model used rclone copy, not sync.
- copy only sends new or changed files
- it does not delete the destination
- it is safer for backup convergence
rclone copy od_chris_app:/ aws_archive:phoenix-backup-archive/onedrive-chris ...Validation used:
rclone size
rclone check --size-onlyThe rule became simple: trust rclone validation, not the pretty usage bar.
10. EC2 Automation
Backup scripts were created per dataset and run manually first, then by cron.
- Create a shell script
- Embed remote, destination, logging path, and copy flags
- Test manually
- Only then schedule with cron
A cron job that has not survived real manual execution is not automation. It is roulette.
Once stable, the jobs were scheduled every six hours in staggered fashion, creating a convergence loop where each run reduced drift.
11. NAS Ingest Strategy
The NAS footprint is large enough that it had to be treated as a phased ingestion project rather than a casual cloud upload.
Home upload bandwidth was extremely poor, which made full direct ingest from home impractical for bulk seeding.
Practical solution:
- Copy a chunk to external disk
- Take it to a high-bandwidth location
- Upload from there
- Later validate and delta from home / NAS
This cleanly separated bulk seeding from ongoing delta maintenance.
12. Validation from Synology
Once rclone was running on the Synology, the NAS became not just a source, but also a validation node.
rclone check --size-only --one-wayA full check against Movies - NAS showed both Synology sidecar artifacts and genuinely missing payload files, proving the copy was not fully converged yet.
The system did exactly what it was supposed to do: it stopped lifecycle from being applied to an incomplete dataset.
13. Full Fidelity Backup Decision
A decision had to be made:
- Exclude Synology-generated metadata and keep the backup “clean”
- Or back up everything exactly as it exists on the NAS
The final decision was to back up everything, including:
- Media files
- Sidecar files
- Indexing metadata
- Synology-generated directory artifacts
That increases object count and noise, but it aligns with the stated disaster-recovery goal.
14. Synology-native rclone via Docker
rclone was run on the Synology via Docker.
- Mount NAS root storage into the container as /data
- Mount the rclone config/log area into the container as /config/rclone
This allowed both direct copies and direct checks from the NAS itself.
15. Scheduled Tasks for NAS Folders
Scheduled Synology tasks were prepared for:
- Adult
- docker
- Documents
- music
- photo
- Plex
- PlexMediaServer
- reports
- TV Shows - NAS
- UHD - TV Shows
- video
- web
- web_packages
Because current home upload is weak, those tasks were prepared but not treated as aggressive always-on bulk transfer jobs yet.
16. Cloudflare Tunnel for EC2
Cloudflare Tunnel was used to reach the EC2 backup host without exposing SSH publicly. This established the access-plane pattern reused elsewhere: identity-gated, no exposed inbound ports, and Cloudflare Access in front.
17. Cloudflare Tunnel for Windows RDP Host
A Windows system was also tunneled through Cloudflare for RDP access. Key lessons included localhost vs 127.0.0.1, ensuring RDP was actually listening, and getting Access policy and persistence right.
18. Cloudflare Tunnel for Synology DSM
The Synology DSM interface was tunneled through Cloudflare using:
neocountprime.phoenix-comp.comThe process was:
- Run cloudflared tunnel login
- Create the tunnel
- Route DNS to the hostname
- Create config.yml
- Run the tunnel manually first
- Test Access + DSM
- Convert to a persistent detached container
Key lessons from that build:
- Cert location mismatch initially caused login artifacts to land in the wrong place
- JSON tunnel credentials belonged in the main cloudflared directory, not the config subfolder
- Using the tunnel ID in config was the safest route
- The initial DSM port was wrong; correcting the port made the origin reachable
Final result: DSM became reachable through Cloudflare Access without exposing the NAS publicly.
19. Cloudflare Access Policy Model
- Microsoft login method
- phoenix-comp.com domain restriction
- MFA requirement
Email OTP was unreliable and unnecessary in a Microsoft-heavy environment. Entra was the better fit.
20. Entra App Registration for Cloudflare Access
Separate from the rclone OneDrive app registration, a second Entra app registration was used for Cloudflare Access. The redirect URI had to match the exact Cloudflare team domain callback path.
21. Synology SSO to Microsoft Entra
After getting DSM safely behind Cloudflare Tunnel, the next logical step was getting rid of yet another standalone local login workflow and tying DSM into Microsoft Entra for home-lab SSO. The goal here was not domain join, file share mapping, or anything that would require an added-cost service. The goal was simple: use the existing Entra tenant for DSM login without bolting on a paid identity tier.
The first pass used OIDC. That got far enough to prove the Microsoft login itself worked, but it ran into claim-mapping problems on the DSM side. The Entra sign-in would succeed, but DSM would not map the returned identity to the local NAS account the way it needed to. In this case, the local DSM account that needed to be matched was cruggieri, while the sign-in identity was still naturally email-shaped.
The clean pivot was to move from OIDC to SAML. That exposed the much more useful Entra-side Attributes & Claims controls and made it possible to send the exact username value DSM needed.
The SAML app configuration ended up using:
Identifier (Entity ID): https://neocountprime.phoenix-comp.com
Reply URL (ACS URL): https://neocountprime.phoenix-comp.com
Sign on URL: https://neocountprime.phoenix-comp.com
Unique User Identifier (NameID): user.mailnickname
NameID format: UnspecifiedThat meant the assertion being returned from Entra contained:
<NameID Format="urn:oasis:names:tc:SAML:1.1:nameid-format:unspecified">cruggieri</NameID>At that point, the Entra side was correct. The sign-in logs all showed success, and the SAML assertion itself proved the NAS was being handed the exact username it needed.
The problem turned out to be on the DSM side and it was maddeningly specific: signature type.
DSM had originally been configured to expect:
Response signature verification: Sign SAML responseBut the actual Entra SAML payload was signing the Assertion, not the outer Response. In other words, the username was right, the audience was right, the recipient was right, the Microsoft auth was right, and DSM was still rejecting the login because it was validating the wrong layer of the SAML message.
The fix was to change DSM to verify the signed assertion instead of only the signed response. Once that was corrected, the login worked.
That entire path is worth documenting for anyone trying to light up SSO in a home lab because it is exactly the kind of problem that wastes hours:
- Entra logs show success
- DSM still throws a generic failure
- the username claim looks correct
- the real issue is signature verification mode on the service provider
The practical lesson is simple: when troubleshooting SAML, do not stop at “the IdP says success.” Capture the assertion, inspect the NameID, inspect the Audience, inspect the Recipient, and confirm whether the identity provider is signing the Response, the Assertion, or both.
For home lab builders, this is one of the bigger takeaways from the whole project: you can absolutely bring enterprise-style identity patterns into a personal environment without turning it into a six-figure science project. But once you do, you also inherit enterprise-style troubleshooting. The upside is that when you finally solve it, you end up with something a lot cleaner, safer, and frankly more satisfying than another pile of isolated local accounts.
22. Lessons Learned
- Prefix precision matters. S3 lifecycle rules are brutally literal.
- Validation matters more than intuition.
- Convergence beats perfection.
- Identity is the perimeter.
- Bulk ingest and delta sync are different phases.
- UI can lie.
- Archive economics only work if lifecycle is correct.
23. Current State
- OneDrive backups work and converge
- EC2 role-based S3 access works
- Non-EC2 S3 key-based access works
- Lifecycle rules work once prefixes are correct
- Data is transitioning to Deep Archive
- Synology validation works
- Synology direct copy works
- Cloudflare Tunnel works for EC2, Windows, and Synology
- Access policies work with Microsoft login and MFA
This is no longer a lab experiment. It is an operating system.
24. Final Operating Model
Now
- Office / high-bandwidth location handles heavy seed copies
- EC2 handles OneDrive delta automation
- Synology handles validation and targeted catch-up
- Cloudflare secures access to the key systems
Later, after better connectivity
- Synology becomes the primary long-term delta copy engine
- The same namespace alignment ensures only new/changed data moves
- The access and lifecycle models stay the same
25. Closing
This build was not about “copying files to the cloud.” It was about building a system that reduces cost over time, tolerates imperfect first runs, uses identity instead of perimeter exposure, preserves recoverability, and keeps converging without constant manual intervention.
That system now exists.