Evaluations: what are you really measuring?