Loading
Hire.Monster

Production Support & Reliability Engineer

ГибридDevOpsСША$$130k-$150k

Обязанности

  • This Production Support & Reliability Engineer role is focused on customers who deploy Cyberhaven’s Content Inspection (CI) stack in their own cloud or datacenter environments (AWS, GCP, Azure, Kubernetes)
  • You will be the primary L3 owner for on‑prem CI deployments and incidents, especially as DSPM on‑prem connectors become the norm
  • You must be deeply technical, comfortable with Linux, containers, Kubernetes, and virtual appliances, and enjoy operating at the intersection of support, SRE, and product engineering
  • This role reports to the Director of Technical Support
  • Act as the primary L3 owner for all support cases involving on-prem Content Inspection (CI), including deployments, upgrades, performance, correctness, and stability
  • Own customer issues end-to-end, from initial triage and reproduction in lab or test clusters through root-cause analysis, remediation, or escalation
  • Serve as the internal and external point of contact for high-severity CI incidents, joining customer bridges and coordinating closely with SRE, R&D, and Product until resolution
  • Lead or co-lead on-prem CI deployments and upgrades alongside Professional Services and SRE, validating prerequisites, reviewing Kubernetes and Helm configurations, and coordinating maintenance windows and rollback plans
  • Monitor and support on-prem CI environments, understand CI metrics and logs, and drive first-line response to capacity, health, latency, and stability issues
  • Act as the L3 specialist for DSPM on-prem connectors that depend on CI, validating new capabilities and supporting large-scale and design-partner deployments
  • Design, build, and maintain a Support-owned on-prem CI lab across AWS, GCP, and Azure to reproduce customer issues, validate fixes, and test upgrades and mitigations
  • Create and maintain internal runbooks and knowledge base documentation for CI deployments, upgrades, troubleshooting, and escalation best practices
  • Enable internal teams by training Support, Professional Services, CX, SRE, and R&D on CI workflows, escalation patterns, and customer-facing considerations
  • Drive supportability improvements by filing and owning engineering feedback related to diagnostics, logging, health checks, and safer upgrade and rollback behavior

Partner cross-functionally with SRE, R&D, and Product Management to improve CI reliability and scalability, align releases with customer commitments, and participate in design and readiness reviews

Требования

  • 5+ years experience in technical support, SRE, or production operations for a SaaS or security product
  • Experience deploying and supporting virtual appliances or on‑prem products in enterprise environments
  • Comfortable reading and interpreting logs and metrics from distributed systems; able to form and test hypotheses quickly
  • Able to manage high‑priority incidents calmly, communicate clearly with enterprise customers, and coordinate multiple internal teams
  • Curious, self‑directed, and comfortable taking ownership of ambiguous problems until they are fully resolved
  • Excellent written and verbal communication skills; able to turn repeated patterns into clear runbooks and training material

Joining Cyberhaven is a chance to revolutionize data security

Навыки

  • Deep, hands‑on experience with Linux, containers, and Kubernetes (EKS, GKE, AKS, or self‑managed clusters)

Strong understanding of networking (load balancers, TLS, DNS, proxies, firewalls) in hybrid/on‑prem settings

Зарплата

$130'000-150'000

Опубликовано: 11.01.2026