Improve evaluations' handling of wrong answers