NeverHard
Freelance Agent Evaluation Engineer at Mindrift — NeverHard
Freelance Agent Evaluation Engineer at Mindrift in Ontario, Canada. View salary, required skills, and apply on NeverHard.
- Company
- Mindrift
- Location
- Ontario, Canada
- Type
- part_time
Please submit your CV in English and indicate your level of English proficiency.
Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. Participation is project-based, not permanent employment.
What this opportunity involves
We're building a dataset to evaluate AI coding agents — how well a model handles real-world developer tasks. You'll create challenging tasks and evaluation criteria within realistic simulated environments:
Build virtual companies following a high-level plan - codebase, infrastructure, and context (conversations, documentation, tickets) that form a realistic environment with development history
Assemble and calibrate tasks from intermediate states of the virtual company: craft the prompt, define evaluation criteria, and ensure the task is solvable and the evaluation is fair
Design tasks set in isolated environments - emulations of a developer's workstation: a Linux machine with development tools (terminal, CLI), MCP servers (repository, task tracker, messenger, documentation, etc.), and a real web application codebase
Write tests that accept all correct solutions and
This page requires JavaScript. Please enable it in your browser, or explore
neverhard.com for more information.