How the MT-Bench test measures and compares LLMs