Back to Whitepapers
NewApril 2026

Eliminating SSH Key Sprawl

Short-Lived Certificates for Modern Infrastructure

Replace thousands of static SSH keys with HSM-backed, identity-bound certificates that expire automatically. One trust anchor per server. Full audit trail. Revocation in seconds.

30 min read Security Engineers, Platform Teams, DevOps Leaders Technical Whitepaper
3-10x

Average SSH keys per enterprise

more keys than employees

2+ years

Unmanaged key lifetime

average before rotation or removal

99.97%

Exposure reduction

with 8-hour certificate validity

Seconds

Emergency revocation

via automatic KRL regeneration

The SSH Key Management Crisis

SSH key sprawl is one of the most underestimated security risks in enterprise infrastructure. A typical organization with 500 engineers and 2,000 servers will have 5,000-15,000 SSH public keys distributed across serverauthorized_keys files -- with no centralized inventory, no expiration, and no audit trail.

The Hidden Attack Surface

  • > No centralized inventory of which keys grant access to which servers
  • > No expiration mechanism -- keys remain valid until manually removed
  • > Orphaned keys from departed employees found on servers years later
  • > Service account keys shared across teams, stored in CI/CD, never rotated
  • > A single compromised key often grants lateral movement to every server

How SSH Certificates Work

OpenSSH has supported certificate-based authentication since version 5.4 (2010). The server never needs to know about individual users -- it trusts the CA, and the CA decides who gets access based on identity.

The Certificate Flow

1

Generate ephemeral key pair

ssh-keygen

2

Submit public key to CA

API or OIDC

3

CA signs with metadata

Principals, validity, extensions

4

Receive signed certificate

-cert.pub file

5

SSH with certificate

Server trusts the CA

Why Short-Lived Certificates Change Everything

  • Revocation becomes optional -- most certificates expire before you would need to revoke them
  • Stolen credentials are time-limited -- a compromised cert is useless tomorrow
  • No server-side state -- no authorized_keys, no LDAP queries at auth time
  • Access is always current -- reflects identity at issuance time, not when the key was first deployed

KeyGrid SSH CA Architecture

KeyGrid SSH CA is a first-class module in the KeyGrid PKI platform -- alongside X.509, ACME, SCEP, EST, SPIFFE, TrustSign, and RADIUS. It reuses the existing infrastructure for HSM-backed key management, multi-tenant isolation, and audit logging.

HSM-Backed Signing

CA signing keys never leave the HSM. Supports Ed25519, ECDSA P-256/P-384, and RSA-4096. Even a full application compromise does not expose the CA private key.

OIDC Integration

Browser-based certificate issuance through your corporate IdP (Okta, Azure AD, Google). Maps OIDC claims to SSH principals automatically.

KRL Revocation

Generates OpenSSH Key Revocation Lists in binary format. Auto-regenerates on certificate revocation. Servers fetch KRLs via REST endpoint.

Multi-Tenant Isolation

Every SSH CA is scoped to a tenant. Database queries filtered by tenant_id at every layer. Same isolation model used across all KeyGrid services.

Use Cases

Developer Access to Production

Before

Copy each developer's public key to authorized_keys on every server. Hunt through servers to remove keys when someone leaves.

After

Servers trust one CA key. Developers authenticate via SSO, get an 8-hour certificate. Offboarding = disable IdP account.

CI/CD Pipeline SSH Access

Before

Static deploy key stored as CI secret. Shared across all pipelines, root-level access, never rotated.

After

Pipeline requests a 10-minute certificate with only the "deploy" principal. Expires before cleanup finishes.

Contractor & Vendor Access

Before

Vendor receives SSH keys that persist long after engagement ends. Keys found on servers years later.

After

Vendor authenticates via federated OIDC. 4-hour certificates. Remove IdP group = access stops immediately.

Host Identity Verification

Before

"Are you sure you want to continue connecting (yes/no)?" Users blindly type yes. TOFU risk on every new server.

After

Servers get host certificates at boot. Client machines trust the host CA. No TOFU prompt, verified identity.

Emergency Revocation

Before

Laptop stolen. Security team must find and remove the key from every server. Takes hours or days.

After

Revoke serial in KeyGrid. KRL auto-regenerates. Servers reject the cert within minutes. Done.

Identity Integration: OIDC & SSO

KeyGrid SSH CA integrates with any OIDC-compliant Identity Provider. Developers authenticate through their normal corporate SSO flow and receive an SSH certificate automatically. No separate SSH key management workflow.

Claims-to-Principals Mapping Example

{
  "principal_sources": [
    {
      "claim": "email",
      "transform": "local_part"    // [email protected] → principal "jdoe"
    },
    {
      "claim": "groups",
      "mapping": {
        "ssh-admins": ["root", "admin"],
        "developers": ["deploy"],
        "sre-team":   ["root"]
      }
    }
  ]
}

Revocation and KRL Distribution

When a certificate is revoked, KeyGrid automatically regenerates the KRL. Servers fetch the updated KRL via a simple REST endpoint. With 8-hour certificates, the window where revocation is even necessary is small -- KRL is a safety net, not the primary security mechanism.

Deployment Models

Cloud-Hosted

No on-premises infrastructure required. Configure servers to trust KeyGrid's CA public key and fetch KRLs from the API.

On-Premises

For strict data residency or air-gapped requirements. Uses your own HSM infrastructure (PKCS#11, AWS CloudHSM, Azure Key Vault).

Hybrid

Production on-premises, dev/staging in the cloud. Multi-tenant isolation ensures complete separation between environments.

Static Keys vs SSH Certificates

MetricStatic KeysSSH CertificatesImprovement
Credential lifetimePermanent (until manually removed)8 hours (configurable)99.97% reduction
Revocation speedHours to days (manual per-server)Seconds (automatic KRL)~1000x faster
Server-side stateO(users) keys per server1 CA trust anchor per serverConstant complexity
Audit trailNoneFull (who, when, principals, auth method)Complete visibility
Offboarding effortTouch every serverDisable IdP accountZero server changes
Key rotationManual, rarely doneAutomatic (certs expire)Continuous rotation

Implementation Guide

Server Configuration (2 lines)

# /etc/ssh/sshd_config

# Trust user certificates signed by KeyGrid
TrustedUserCAKeys /etc/ssh/keygrid-user-ca.pub

# Revocation list (optional but recommended)
RevokedKeys /etc/ssh/keygrid-krl

Phased Migration

1

Deploy CA trust anchor

Add TrustedUserCAKeys alongside existing authorized_keys. Both methods work simultaneously.

2

Issue certificates to early adopters

Static keys continue to work for everyone else. Zero disruption.

3

Remove static keys

For users successfully using certificates. Gradual, server-by-server.

4

Certificate-only authentication

Disable AuthorizedKeysFile. Certificates are the only auth method.

Conclusion

SSH key sprawl is a solvable problem. The technology has existed in OpenSSH since 2010. What has been missing is a production-grade Certificate Authority that integrates with enterprise identity, provides HSM-backed security, and operates at scale.

KeyGrid SSH CA fills this gap as part of a comprehensive Modern PKI platform -- X.509 certificate lifecycle, ACME, SCEP, EST, SPIFFE workload identity, document signing, and RADIUS authentication. Cloud-hosted or on-premises.