Senior AI Quality Engineer
💰 $80,000 – $130,000/yr
Job Description
About Roofr
Roofr is building the next generation of CRM software for the roofing and construction industry. We started by creating essential sales tools like aerial roof measurements and digital sales proposals. When customers asked for a comprehensive way to manage and scale their entire businesses, we listened and built a unified CRM platform that connects these solutions with payments, material ordering, and more into a seamless, powerful ecosystem. With strong financials, best-in-class company metrics, and an amazing culture, Roofr is an exciting startup that is already successful yet early enough to offer team members significant growth opportunities, equity, and the chance to make a real impact.
The Role
Roofr is building the application foundation that will define how AI is integrated across the entire product. As our Senior AI Quality Engineer, you'll own the eval frameworks, testing standards, and quality gates that every engineering team at Roofr depends on to ship AI with confidence. You'll sit on the AI Platform team and work horizontally across the entire testing organization, training teams on best practices and raising the bar on how Roofr tests everything it builds with AI. This is foundational, early-stage work—you're setting the standard for the whole organization, not just one team.
Key Responsibilities
- Define testing standards and patterns for AI at Roofr, establishing how product teams validate AI behaviour when building on the application foundation
- Build and own Roofr's LLM evaluation framework—selecting and extending the right tooling (Promptfoo, DeepEval, Braintrust) and designing methodology to measure whether AI integrations and agent outputs perform correctly, consistently, and safely
- Integrate quality gates into CI/CD pipelines to catch regressions in AI behaviour before production deployment
- Design and implement human-in-the-loop review processes for AI outputs where automated evaluation isn't sufficient
- Work embedded on the AI Platform team, ensuring quality is designed into the integration architecture from day one
- Coach QA engineers and developers on AI evaluation patterns, embed best practices into team workflows, and actively raise the quality bar across engineering
- Stay current with the evolving AI quality landscape—new evaluation techniques, benchmarking approaches, and tooling like Ragas, Arize Phoenix, and LangSmith—and bring the best of it to Roofr
What You'll Bring
- 5–8 years of software engineering or quality assurance experience
- Hands-on experience building evaluation frameworks for machine learning or AI systems
- Strong proficiency with Python and experience with testing/validation frameworks
- Familiarity with LLM evaluation tooling and methodologies (Promptfoo, DeepEval, Braintrust, Ragas, or similar)
- Experience designing and implementing CI/CD pipelines and quality gates
- Understanding of AI/LLM behaviour, limitations, and failure modes
- Ability to work across teams, mentor junior engineers, and establish organizational best practices
- Strong communication skills and a passion for quality and continuous improvement
💰 Compensation not publicly listed. Market estimate for similar roles: from $80K, varying by experience and location.