Responsibilities
• Evaluate AI model responses for personalization quality, including grounding, integration, and helpfulness.
• Design and execute multi-turn prompts based on personal context to test AI capabilities.
• Analyze responses for hallucinations, incorrect personalization, and poor inferences.
• Perform side-by-side comparison of model outputs to determine quality and effectiveness.
• Write clear and structured rationales for response evaluations and rankings.
• Extract and verify debug information to ensure proper use of data sources.
• Maintain strict data hygiene and ensure accurate documentation of evaluations.
• Collaborate with cross-functional teams to improve AI model performance.
Requirements
• Strong proficiency in Polish with excellent reading and writing skills.
• Experience in data annotation, AI evaluation, content moderation, or a related role.
• Strong analytical thinking and ability to assess nuanced AI responses.
• Ability to design creative, multi-turn prompts based on personal context.
• Understanding of personalization concepts, including identifying incorrect or forced personalization.
• High attention to detail in evaluating subtle differences in model outputs.
• Excellent written communication and structured reasoning skills.
• Ability to work independently in a remote environment.
• Willingness to use a personal Google account for evaluation purposes.
• Full-time availability with at least 4 hours overlap with PST.
• Bachelor’s degree or equivalent experience in a relevant analytical field.