Skip to main content

Command Palette

Search for a command to run...

Building social features without betraying your users

Updated
10 min read
Building social features without betraying your users
S
Sunney Sood is a Program Manager with experience delivering complex programs in Banking and Insurance. His interests span Program Leadership, Governance, Artificial Intelligence, Data Science, Cloud Technologies, and Digital Transformation. Through hands-on experimentation, industry research, and practical experience, he shares insights that help bridge strategy, technology, and business outcomes. Skills Program Management • Leadership • Governance • Stakeholder Management • Risk Management • Business Transformation • Artificial Intelligence • Data Science • Cloud Technologies • Agile Delivery
Building social features without betraying your users

Architecture · Android · Privacy

Building social features without betraying your users

A practical journey from cloud-first to local-first social sync, and why the conventional approach was the wrong one.

CRDT local-first privacy Android

When I started designing the social layer for a health tracking app, the obvious path was clear: Firebase Firestore, real-time listeners, cloud-synced state. It's what everyone does. It works. And it took about fifteen minutes of honest reflection to realise it was completely wrong for this product.

This is the story of how I ended up with a CRDT-based local-first architecture — and why I think the conventional cloud-first approach for social health features deserves far more scrutiny than it gets.

The problem with the obvious approach

The typical social feature in a health app works like this: when you complete a workout, finish a fast, or hit your step target, the app writes your status to a cloud database. Your friends' apps subscribe to that database and display your status in real time. Simple, proven, scalable.

But pause and ask: what exactly is sitting on that cloud database?

What "aggregated booleans" actually reveal: isActive=true, elapsedHours=14, completedToday=false, written at 7:04am. Repeat daily. That's a detailed behavioural profile of someone's health habits — when they eat, when they fast, how consistent they are — sitting on a third-party server indefinitely.

The instinct to call this "just metadata" or "aggregated data" is a rationalisation. A continuous stream of health-adjacent behavioural signals is a health profile, regardless of what you name the fields. And once that data is on someone else's infrastructure, you no longer control what happens to it.

What users actually think

The market research is unambiguous. Studies of health app reviews show privacy complaints in negative reviews directly correlate with apps that have more personal data collection in their code. Competing apps in the health space explicitly market "your data stays on your device" as a feature — because users recognise it as one.

The regulatory direction is also one-way. Platform policies are tightening around health data, requiring developers to prove that any health-adjacent data accessed is essential to the app's primary function. "We use it for social features" is not a strong answer.

Cloud-first social

Health behaviour written to third-party servers continuously. Server operator can read behavioural patterns. Data persists indefinitely. User consent is buried in ToS.

Local-first social

Health data stays on device. Only leaves as encrypted ciphertext directly to circle members. Server sees coordination metadata only. User controls their data physically.

The architecture that emerged

The core insight is that social features require two fundamentally different things: coordination (who's in my group, how do I find them) and health data (what's my status right now). These can be decoupled. Coordination metadata can live on a server you control. Health data never has to leave the device.

The three layers

Signaling server — coordination only
  • Stores: group membership, display names, FCM tokens, invite codes, Ed25519 public keys, encrypted snapshots (ciphertext only, 7-day TTL)
  • Never stores: any health status, activity data, streak values, or plaintext health-adjacent data
  • Stack: Ktor 3.x on a $0 Oracle Cloud free-tier VM — sufficient for thousands of users
CRDT sync engine — health data stays encrypted
  • Type: Last-Write-Wins registers per member — correct for this use case since each member is the sole writer of their own record
  • Transport: Encrypted deltas (AES-256-GCM via ECDH key agreement) posted to server as opaque ciphertext, collected by requester, decrypted locally
  • Keys: Ed25519 keypairs in Android Keystore — hardware-backed, never extractable
Local database — source of truth
  • member_state: merged CRDT state per member per group, LWW semantics
  • activity_feed: local-only feed entries, never sent to any server
  • challenge_history: append-only record of completed and expired challenges

How a sync actually works

01

User pulls to refresh. App posts a sync request to the signaling server.

02

Server sends FCM data messages to all group members. Payload contains only a group ID and request ID — zero health content.

03

Each member's device receives the message silently. WorkManager schedules a background job that encrypts the member's current state with the requester's public key and posts the ciphertext to the server.

04

Requester collects and decrypts deltas locally using the private key stored in hardware — it never leaves the device. Each delta is merged into the local CRDT via LWW semantics.

05

UI renders progressively as members respond. Non-responders show last-known cached state with a timestamp. Server deletes all deltas after serving.

What the server actually knows

This is the most important property of the architecture. At no point does the signaling server receive plaintext health data. The encrypted snapshots it holds are AES-256-GCM ciphertext — the server is mathematically incapable of reading them without the private keys, which live in device hardware.

Data residency
Group membership listsignaling server
Display namessignaling server
Activity statusdevice only
Step countsdevice only
Any health metricdevice only
Streak valuesdevice only
Encrypted snapshots (ciphertext)server (unreadable)
Health data on third-party infrastructurenever

The honest trade-offs

This architecture is not free. It requires writing a small signaling server rather than using managed Firebase rules. The CRDT merge logic adds complexity that Firestore listeners would hide. The crypto layer — while using standard Android APIs — requires careful key lifecycle management.

There is one irreducible minimum of server dependency: you cannot reach an offline device without some form of addressing service. FCM fills this role here, carrying only a wake-up signal with no health content. Eliminating this would require both devices to be online simultaneously — an unacceptable UX constraint.

The scale ceiling also matters. With a hard cap of 50 members per group, vector clock overhead stays bounded at roughly 2KB of metadata, sync completes in under a second on any reasonable connection, and the signaling server handles thousands of users on infrastructure costing nothing per month.

Why this matters beyond health apps

The pattern generalises. Any social feature where the shared data is sensitive — financial status, location, relationship state, communication history — deserves this level of scrutiny. The default assumption that "we need a cloud database for social features" is worth questioning every time. The coordination layer and the data layer are separable, and treating them as one is an architectural decision with real privacy consequences.

Local-first is not a niche philosophy. It is, increasingly, what users expect from apps that handle anything personal. Building it in from the start is significantly easier than retrofitting it. The architecture described here adds perhaps two weeks to a social feature sprint. The cost of not building it — in user trust, in regulatory exposure, in the genuine harm of unnecessary data collection — is harder to quantify but considerably larger.

CRDT · local-first · Android · Ktor · privacy-first architecture

More from this blog

Program Leadership, AI & Transformation

32 posts

Experienced Program Manager with a track record of leading complex, cross-functional programs across the Banking and Insurance sectors. Passionate about driving business transformation through effective leadership, governance, and stakeholder alignment. A lifelong learner with a growing focus on Artificial Intelligence, Data Science, Cloud Technologies, and digital innovation, committed to bridging strategy, technology, and execution to deliver measurable business outcomes.