About LanceDB
LanceDB is a developer-friendly, open-source database for multimodal AI. From hyper-scalable vector search to advanced retrieval for RAG, from streaming training data to interactive exploration of large-scale AI datasets, LanceDB is the best foundation for your AI application, and powers some of the most groundbreaking applications and challenging requirements today.
About the role
We’re seeking a seasoned Cloud Infrastructure Engineer with deep expertise in automation, infrastructure-as-code (IaC), and cloud platform management. You’ll design, deploy, and maintain robust cloud environments while collaborating with cross-functional teams to streamline CI/CD pipelines, enhance system reliability, and drive operational excellence.
As a Cloud Infrastructure Engineer at LanceDB, your responsibilities will include:
- Design & Build Cloud Infrastructure: Architect and manage secure, scalable cloud environments (AWS, Azure, GCP) using IaC tools like Puppet, Terraform, or CloudFormation.
- Automate Everything: Develop and maintain automation scripts to streamline deployments, monitoring, and system operations.
- Systems Reliability: Implement monitoring/alerting solutions (Prometheus, Grafana, Datadog) to proactively address performance bottlenecks and ensure 99.9% uptime.
- Security & Compliance: Enforce security policies, manage secrets (Vault, AWS KMS), and ensure compliance with industry standards (GDPR, SOC2).
- Troubleshoot & Optimize: Resolve complex infrastructure issues and lead cost-optimization initiatives for cloud resources.
- Collaborate & Mentor: Partner with software engineering teams to integrate DevOps practices into SDLC and mentor junior engineers on IaC and cloud best practices.
Requirements:
- 5+ years in DevOps, Cloud Infrastructure, or SRE roles, with hands-on experience in public cloud platforms (AWS, Azure, GCP, Heroku).
- Expertise in IaC tools (Puppet, Terraform, Ansible), orchestration tools (Kubernetes, Helm) and configuration management.
- Deep understanding of networking, security, and cloud architecture best practices.
- Experience with monitoring tools (Prometheus, Grafana) and logging systems (ELK, Splunk).
- Strong knowledge of CI/CD tools (Jenkins, GitHub Actions, CircleCI) and containerization (Docker, Kubernetes).