Senior AI Quality Engineer

roofr·April 11, 2026·0 views

🌍 Remote · CanadaFull-time

💰 $80,000 – $130,000/yr

AI Quality Assurance LLM Evaluation Python CI/CD Pipelines Machine Learning Testing Prompt Engineering Software Quality

Job Description

About Roofr

Roofr is building the next generation of CRM software for the roofing and construction industry. We started by creating essential sales tools like aerial roof measurements and digital sales proposals. When customers asked for a comprehensive way to manage and scale their entire businesses, we listened and built a unified CRM platform that connects these solutions with payments, material ordering, and more into a seamless, powerful ecosystem. With strong financials, best-in-class company metrics, and an amazing culture, Roofr is an exciting startup that is already successful yet early enough to offer team members significant growth opportunities, equity, and the chance to make a real impact.

The Role

Roofr is building the application foundation that will define how AI is integrated across the entire product. As our Senior AI Quality Engineer, you'll own the eval frameworks, testing standards, and quality gates that every engineering team at Roofr depends on to ship AI with confidence. You'll sit on the AI Platform team and work horizontally across the entire testing organization, training teams on best practices and raising the bar on how Roofr tests everything it builds with AI. This is foundational, early-stage work—you're setting the standard for the whole organization, not just one team.

Key Responsibilities

Define testing standards and patterns for AI at Roofr, establishing how product teams validate AI behaviour when building on the application foundation
Build and own Roofr's LLM evaluation framework—selecting and extending the right tooling (Promptfoo, DeepEval, Braintrust) and designing methodology to measure whether AI integrations and agent outputs perform correctly, consistently, and safely
Integrate quality gates into CI/CD pipelines to catch regressions in AI behaviour before production deployment
Design and implement human-in-the-loop review processes for AI outputs where automated evaluation isn't sufficient
Work embedded on the AI Platform team, ensuring quality is designed into the integration architecture from day one
Coach QA engineers and developers on AI evaluation patterns, embed best practices into team workflows, and actively raise the quality bar across engineering
Stay current with the evolving AI quality landscape—new evaluation techniques, benchmarking approaches, and tooling like Ragas, Arize Phoenix, and LangSmith—and bring the best of it to Roofr

What You'll Bring

5–8 years of software engineering or quality assurance experience
Hands-on experience building evaluation frameworks for machine learning or AI systems
Strong proficiency with Python and experience with testing/validation frameworks
Familiarity with LLM evaluation tooling and methodologies (Promptfoo, DeepEval, Braintrust, Ragas, or similar)
Experience designing and implementing CI/CD pipelines and quality gates
Understanding of AI/LLM behaviour, limitations, and failure modes
Ability to work across teams, mentor junior engineers, and establish organizational best practices
Strong communication skills and a passion for quality and continuous improvement

💰 Compensation not publicly listed. Market estimate for similar roles: from $80K, varying by experience and location.