ML Performance Engineer

Hathora | NYC (In-Person) | $160-200k + equity

We're building the fastest voice AI inference platform in the world. Our low-latency models platform (models.hathora.dev) runs voice models globally to minimize network latency. As voice AI explodes (ElevenLabs, Cartesia, Hume raising billions), we're positioning to own the infrastructure layer through a marketplace of optimized models.

What you'll do

Optimize inference latency and throughput for voice AI models using CUDA, kernel-level tuning, and engine modifications (vLLM, Nemo, etc)
Deploy and scale models on our Docker/Kubernetes infrastructure with high reliability
Own the performance stack end-to-end: from GPU kernels to production deployment
Rapidly integrate new models as they're released and ensure we're the fastest option available
Publish research and speak at conferences—establish yourself and Hathora as the authority on voice inference performance

What we're looking for

Proven experience with GPU programming (CUDA) and low-level performance optimization
Contributions to inference engines (SGLang, vLLM, etc.) or PyTorch internals preferred
Comfort with deployment infrastructure (Docker, Kubernetes)
Demonstrated curiosity and depth—whether through projects, contributions, or research
Any experience level welcome if you've gotten your hands dirty and want to go deeper

Why join

Build the inference stack from scratch as the voice AI market explodes—your decisions will define how the industry runs models
Own the entire inference stack for a rapidly growing platform
Solve hard technical problems: sub-100ms voice responses at global scale
Backed by Upfront Ventures, Founders Fund, Lunar Ventures

What you'll do

What we're looking for

Why join

About Hathora