// infrastructure
Platform / Infrastructure Engineer
About the Role
Our AI agents run 24/7 in live business environments where downtime means lost revenue. You'll build and maintain the infrastructure that deploys, scales, and monitors our agents across dozens of client environments — keeping everything fast, reliable, and observable.
What You'll Do
- Design and manage cloud infrastructure for deploying AI agents at scale (AWS/GCP)
- Build CI/CD pipelines for rapid, reliable agent deployment across multiple client environments
- Implement monitoring, alerting, and observability systems for agent health and performance
- Manage containerized workloads and orchestration (Docker, Kubernetes)
- Optimize infrastructure costs while maintaining 99.999% uptime targets
- Build tooling for multi-tenant agent isolation, configuration management, and secret handling
What We're Looking For
- 3+ years of experience in infrastructure, DevOps, or platform engineering
- Strong experience with cloud platforms (AWS or GCP)
- Proficiency with infrastructure-as-code (Terraform, Pulumi, or CloudFormation)
- Experience with container orchestration (Kubernetes, ECS)
- Solid understanding of networking, security, and Linux systems
- Comfort with on-call responsibilities and incident response
Nice to Have
- Experience scaling ML/AI workloads (GPU management, model serving)
- Familiarity with edge computing or IoT device management
- Experience with multi-tenant SaaS architectures
- Background in high-availability systems for mission-critical applications