Hathora | NYC (In-Person) | $160-200k + equity
We're building the fastest voice AI inference platform in the world. Our low-latency models platform (models.hathora.dev) runs voice models globally to minimize network latency. As voice AI explodes (ElevenLabs, Cartesia, Hume raising billions), we're positioning to own the infrastructure layer through a marketplace of optimized models.
What you'll do
- Optimize inference latency and throughput for voice AI models using CUDA, kernel-level tuning, and engine modifications (vLLM, Nemo, etc)
- Deploy and scale models on our Docker/Kubernetes infrastructure with high reliability
- Own the performance stack end-to-end: from GPU kernels to production deployment
- Rapidly integrate new models as they're released and ensure we're the fastest option available
- Publish research and speak at conferences—establish yourself and Hathora as the authority on voice inference performance
What we're looking for
- Proven experience with GPU programming (CUDA) and low-level performance optimization
- Contributions to inference engines (SGLang, vLLM, etc.) or PyTorch internals preferred
- Comfort with deployment infrastructure (Docker, Kubernetes)
- Demonstrated curiosity and depth—whether through projects, contributions, or research
- Any experience level welcome if you've gotten your hands dirty and want to go deeper
Why join
- Build the inference stack from scratch as the voice AI market explodes—your decisions will define how the industry runs models
- Own the entire inference stack for a rapidly growing platform
- Solve hard technical problems: sub-100ms voice responses at global scale
- Backed by Upfront Ventures, Founders Fund, Lunar Ventures
About Hathora