- They are seeking a Senior Staff Engineer, Online Datastores to build and operate high-performance datastores, ensuring reliability and scalability while providing technical leadership and mentorship
- Ensure high reliability, uptime, and query performance for Apache Druid clusters on EC2
- Lead monitoring, alerting, troubleshooting, and incident response using observability tools such as DataDog and CloudWatch
- Manage data lifecycle: tiering strategies (hot, warm, default), retention policies, deep storage (S3), local SSD caches, and Aurora MySQL metadata store
- Operate and tune Zookeeper clusters while ensuring overall cluster stability, coordination, and service discovery
- Optimize performance and cost efficiency by right-sizing clusters, scaling instances, and balancing ingestion throughput with query workloads
- Define and drive standards for online datastore operations, including multi-region deployments, failover strategies, SLA/SLOs, and best practices for real-time analytics workloads
Provide technical leadership and mentorship, collaborating with data and product teams to define governance, reliability practices, and long-term strategy across datastores (starting with Druid, and over time MongoDB and Postgres)