Rajkiran

Posted on Jun 13

System Design - 19. Authentication & Authorization: OAuth2, JWT, and the Equifax Breach That Changed Everything

#designsystem #distributedsystems #security #software

Authentication & Authorization: OAuth2, JWT, and the Equifax Breach That Changed Everything

Covers: OAuth2 Flow, JWT vs Sessions, SAML, RBAC vs ABAC, mTLS, Zero Trust, Token Revocation

The Breach That Exposed 147 Million People

In 2017, Equifax — one of the three major US credit bureaus — suffered a breach that exposed the personal data of 147 million people: Social Security numbers, birth dates, addresses, and in some cases credit card numbers.

The root cause wasn't a sophisticated zero-day exploit. It was a known vulnerability in Apache Struts that Equifax had failed to patch for months after a fix was available — combined with an internal network where, once an attacker got in, they could move laterally with minimal additional authentication.

The lesson the security world took from this: authentication and authorization can't be an afterthought, and they can't be "strong at the perimeter, weak inside." This is the philosophy behind Zero Trust — and it's reshaped how every system designs identity and access from the ground up.

Today we cover the core building blocks: how users prove who they are (authentication), how systems decide what they can do (authorization), and the protocols that make this work at scale.

Authentication vs Authorization: The Distinction That Matters

These two words get conflated constantly, but they answer fundamentally different questions:

Authentication (AuthN): Who are you? Verifying identity. Logging in with a password, fingerprint, or token.

Authorization (AuthZ): What are you allowed to do? Verifying permissions. Once we know you're "Priya," can Priya delete this file?

Authentication: "Prove you're Priya"
  → Password check, biometric, OTP

Authorization: "Is Priya allowed to delete order #4521?"
  → Check Priya's role, ownership, permissions

A system can authenticate you perfectly and still deny you access — you proved who you are, but you don't have permission for this specific action.

OAuth2: The Protocol Behind "Sign in with Google"

OAuth2 isn't actually an authentication protocol — it's an authorization framework. It answers: "Can this third-party app access my Google Calendar, without me giving the app my Google password?"

The Authorization Code Grant Flow (Most Common)

This is the flow you experience every time you click "Sign in with Google" on a website.

┌────────┐                                          ┌──────────┐
│  User   │                                          │  Google   │
│(Browser)│                                          │ (Auth     │
└───┬────┘                                          │ Server)   │
    │                                                └─────┬────┘
    │  1. Click "Sign in with Google" on YourApp           │
    │ ──────────────────────────────────────────────────► │
    │                                                       │
    │  2. Redirect to Google login                         │
    │ ◄──────────────────────────────────────────────────  │
    │                                                       │
    │  3. User logs in, approves permissions               │
    │ ──────────────────────────────────────────────────► │
    │                                                       │
    │  4. Redirect back to YourApp with AUTHORIZATION CODE │
    │ ◄──────────────────────────────────────────────────  │
    │                                                       │
┌───▼────┐                                                  │
│ YourApp │  5. YourApp's BACKEND exchanges code for tokens │
│(Server) │ ──────────────────────────────────────────────► │
└───┬────┘                                                  │
    │  6. Returns: access_token + refresh_token             │
    │ ◄──────────────────────────────────────────────────  │
    │                                                       │
    │  7. YourApp uses access_token to call Google APIs    │
    │ ──────────────────────────────────────────────────► │

Why the "authorization code" step exists (and isn't just the token directly):

The redirect in step 4 happens through the browser — visible in the URL, browser history, server logs. If Google sent the actual access_token in this redirect, it would be exposed in all those places.

Instead, Google sends a short-lived, single-use authorization code. Only YourApp's backend (step 5) — which has a secret client_secret that never touches the browser — can exchange this code for the actual tokens. This exchange happens server-to-server, never exposed to the browser.

This is why OAuth2 is secure: the actual access token never appears in a URL, browser history, or front-end JavaScript that could be intercepted.

The Tokens OAuth2 Produces

access_token:  Short-lived (minutes to hours). Used to call APIs.
               "This bearer can access Priya's Calendar for the next hour."

refresh_token: Long-lived (days to months). Used to get NEW access tokens
               without the user logging in again.

JWT: Stateless, Self-Contained Tokens

A JWT (JSON Web Token) is a specific token format — widely used for access_tokens — that's self-contained and cryptographically signed.

Anatomy of a JWT

eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJwcml5YSIsImV4cCI6MTcxODg4ODg4OH0.4f8a92...
└──────────┬──────────┘└──────────────┬──────────────┘└────┬────┘
         HEADER                     PAYLOAD              SIGNATURE
   (algorithm info)            (claims — the data)    (verifies integrity)

Decoded payload (claims):

{
  "sub": "priya_12345",
  "name": "Priya Sharma",
  "role": "admin",
  "exp": 1718888888,
  "iat": 1718885288
}

Why JWT Is "Stateless" — And Why That's Powerful

Traditional session-based auth:
  User logs in → Server creates session → stores in Redis/DB
  Every request: server looks up session in Redis → "yes, this is Priya"
  → Requires a database/cache lookup on EVERY request

JWT-based auth:
  User logs in → Server creates JWT, SIGNS it, gives to user
  Every request: server VERIFIES the signature (no DB lookup needed!)
  → If signature is valid, the claims inside are trusted

The signature is the magic. The server has a secret key. When issuing a JWT, it signs the payload with this key. When verifying, it checks the signature using the same key (HMAC) or a public key (RSA/ECDSA for asymmetric signing).

import jwt

# Issuing (server signs with secret key)
token = jwt.encode(
    {"sub": "priya_12345", "role": "admin", "exp": expiry_timestamp},
    secret_key,
    algorithm="HS256"
)

# Verifying (any service with the secret/public key can verify — NO DB CALL)
try:
    payload = jwt.decode(token, secret_key, algorithms=["HS256"])
    # payload["sub"] == "priya_12345" — trusted, because signature is valid
except jwt.InvalidSignatureError:
    # Token was tampered with — reject
    raise AuthError()

This is why JWTs are perfect for microservices: any service holding the shared secret (or public key) can independently verify a token without calling a central auth service. No network hop, no database lookup, no shared session store.

JWT's Achilles Heel: Revocation

Here's the catch. Since JWTs are self-contained and stateless, a server verifying a JWT has no way to know if it's been "revoked" — there's no database to check.

The problem: A user's account is compromised. You want to immediately invalidate all their tokens. But their JWT is valid until exp — and there's no central record to delete.

Solving Token Revocation at Scale

Approach 1: Short TTL + Refresh Token Pattern

access_token: expires in 15 minutes (short-lived)
refresh_token: expires in 30 days, but STORED in a database

To get a new access_token:
  Client sends refresh_token to /token/refresh
  Server checks: is this refresh_token in the database AND not revoked?
  If yes → issue new access_token (15 min)
  If no  → reject, user must log in again

To revoke a user's access:
  DELETE the refresh_token from the database
  → Within 15 minutes, ALL their access_tokens expire naturally
  → They can't get new ones (refresh_token is gone)
  → Maximum exposure window: 15 minutes

This is the industry-standard pattern. You accept a small window (the access token's TTL) where a "revoked" token still technically works, in exchange for the massive performance win of stateless verification for the vast majority of requests.

Approach 2: Blacklist (for immediate revocation)

Maintain a Redis set of "revoked token IDs" (jti claim)
On verification: check signature (stateless) AND check blacklist (one Redis call)

Trade-off: re-introduces a lookup on every request — but it's a fast 
Redis lookup, not a full session database query. Used when 15-minute 
exposure windows are unacceptable (e.g., financial systems).

JWT vs Sessions: The Honest Trade-off

	Session-based	JWT-based
State	Server stores session (Redis/DB)	Stateless — token is self-contained
Scaling	Requires shared session store across servers	Any server can verify independently
Revocation	Instant — delete the session	Hard — requires short TTL + refresh pattern
Token size	Small (just a session ID)	Larger (contains claims)
Microservices	Every service needs access to session store	Any service with the key can verify
Mobile/SPA	Cookies awkward for mobile apps	Works naturally — token in header

The honest take: JWTs aren't "better" than sessions — they trade instant revocation for statelessness. For a monolith with a fast Redis session store, sessions are simpler and have no revocation problem. For microservices and mobile clients, JWT's statelessness is usually worth the revocation complexity.

SAML: Enterprise SSO

SAML (Security Assertion Markup Language) is an older (2005) but still dominant protocol for enterprise Single Sign-On — the "Login with your company account" flow used by Okta, OneLogin, and corporate Active Directory integrations.

User → tries to access SaaS App (Service Provider, SP)
SaaS App → redirects to company's Identity Provider (IdP) — e.g., Okta
User → already logged into Okta (corporate SSO session)
Okta → generates a SIGNED XML "assertion": "This is priya@company.com, 
        verified, here are her roles"
User → browser POSTs this assertion back to SaaS App
SaaS App → verifies signature, creates session for Priya

SAML vs OAuth2/OIDC:

SAML uses XML, OAuth2/OIDC use JSON — SAML is older and more verbose
SAML is dominant in enterprise/B2B SSO (legacy systems, Active Directory integration)
OAuth2 + OpenID Connect (OIDC, which adds authentication on top of OAuth2's authorization) is dominant for consumer apps and modern APIs

When you see "Enterprise SSO" as a requirement in a system design interview — that's a SAML signal. "Sign in with Google/GitHub" — that's OAuth2/OIDC.

RBAC vs ABAC: Two Models of Authorization

Once you know who the user is, how do you decide what they can do?

RBAC: Role-Based Access Control

Users are assigned roles. Roles have permissions.

Roles:
  "admin"   → permissions: [read, write, delete, manage_users]
  "editor"  → permissions: [read, write]
  "viewer"  → permissions: [read]

User "priya" → role: "editor"
→ Priya can read and write, but not delete or manage users

Advantages: Simple to understand, easy to audit ("show me everyone with admin role"), maps naturally to org structures.

Limitation: Roles are coarse-grained. What if Priya should be able to edit documents she created but not documents created by others? RBAC alone can't express this — every "editor" has the same permissions regardless of context.

ABAC: Attribute-Based Access Control

Access decisions are based on attributes of the user, resource, action, and environment — evaluated against policies.

Policy: "A user can EDIT a document IF:
  user.role == 'editor' 
  AND document.owner_id == user.id
  AND current_time is within business_hours
  AND user.department == document.department"

def can_edit_document(user, document, context):
    return (
        user.role == "editor" and
        document.owner_id == user.id and
        is_business_hours(context.current_time) and
        user.department == document.department
    )

Advantages: Extremely fine-grained — context-aware decisions (time of day, location, resource ownership, relationships between entities).

Limitation: More complex to implement, audit, and reason about. Policies can become a tangled web of conditions that are hard to verify for correctness.

The practical guideline:

RBAC for broad, organizational access control ("admins can access the admin panel")
ABAC for fine-grained, contextual rules ("users can edit their own posts, but only during business hours, and only within their department")
Many real systems use both: RBAC for coarse roles, ABAC for fine-grained exceptions layered on top.

mTLS: Service-to-Service Authentication

Regular TLS (the "S" in HTTPS) authenticates the server to the client — your browser verifies it's really talking to bank.com. But the server doesn't verify who the client is beyond what application-layer auth (passwords, tokens) provides.

Mutual TLS (mTLS) requires both sides to present certificates:

Service A wants to call Service B:

1. Service A presents its certificate to Service B
   "I am service-a.internal, signed by our internal CA"

2. Service B presents its certificate to Service A
   "I am service-b.internal, signed by our internal CA"

3. Both verify each other's certificates against the trusted CA
4. Connection established — BOTH sides cryptographically verified

This is exactly what we saw the service mesh (Topic 17) automate — Istio issues certificates to every service and enforces mTLS for all internal traffic, without application code changes. Every service-to-service call is mutually authenticated and encrypted by default.

Zero Trust: "Never Trust, Always Verify"

The Equifax breach happened partly because, once an attacker breached the perimeter, the internal network trusted them. Zero Trust is the architectural philosophy that emerged in response: no request is trusted by default, regardless of whether it originates inside or outside the network perimeter.

Traditional ("castle and moat"):
  Strong perimeter security (firewall, VPN)
  Once inside → relatively trusted, broad access

Zero Trust:
  Every request — internal or external — must be authenticated 
  AND authorized, regardless of network location

  Service A calling Service B internally:
    → mTLS authenticates A's identity
    → Service B checks: is A authorized for THIS specific operation?
    → Every hop verified, nothing assumed because "it's internal"

Practical implementation: mTLS for service identity (Topic 17's service mesh), short-lived credentials everywhere (no long-lived API keys), continuous verification (not just at login), and least-privilege access (services get only the permissions they need, nothing more).

Google's BeyondCorp is the most famous Zero Trust implementation — Google employees access internal tools the same way whether they're in a Google office or a coffee shop, because the network location confers zero trust. Identity and device posture are what matter.

Interview Scenario: "Walk Through OAuth2 Flow Step by Step"

The structured answer (this is almost always asked verbatim):

"I'll walk through the Authorization Code Grant, the most common and secure flow for server-side apps.

Step 1: The user clicks 'Sign in with Google' on our app. We redirect them to Google's authorization endpoint, including our client_id, the redirect_uri, and the requested scope (e.g., calendar access).

Step 2: The user authenticates with Google (if not already) and approves the requested permissions.

Step 3: Google redirects the browser back to our redirect_uri with a short-lived, single-use authorization_code in the URL.

Step 4: Our backend server — not the browser — exchanges this code for tokens by calling Google's token endpoint, including our client_secret. This is a server-to-server call, so the secret never touches the browser.

Step 5: Google returns an access_token and refresh_token. We store the refresh token securely server-side, associate it with the user's session.

Step 6: We use the access token to call Google APIs on the user's behalf. When it expires, we use the refresh token to get a new one — without bothering the user.

The key security property: the actual tokens never appear in browser-visible locations like URLs or history — only the one-time authorization code does, and that's useless without the client_secret to exchange it."

Key Takeaways

Authentication = who you are. Authorization = what you can do. Different problems, different solutions.
OAuth2 is an authorization framework — the Authorization Code Grant flow keeps tokens out of browser-visible locations via a server-side exchange step.
JWT is self-contained and stateless — any service with the key can verify without a database lookup. Its weakness is revocation.
Solve JWT revocation with short access token TTLs (minutes) + long-lived refresh tokens stored server-side (revoke by deleting the refresh token).
SAML dominates enterprise SSO (XML-based). OAuth2/OIDC dominates consumer and API auth (JSON-based).
RBAC (roles → permissions) for broad access control. ABAC (attribute-based policies) for fine-grained, contextual rules. Most systems use both.
mTLS authenticates both sides of a connection — the foundation of service mesh security.
Zero Trust: never trust based on network location — verify every request, everywhere, always.

What's Next

Topic 20 covers Observability — the 3 pillars (metrics, logs, traces), the 4 Golden Signals, distributed tracing, and how to avoid alert fatigue when you're running hundreds of services.

Tags: system-design authentication oauth2 jwt security backend interview-prep

DEV Community