Methodology copyright final results: All copyright scores are pass @1."One endeavor" options permit no vast majority voting or parallel test-time compute; "numerous tries" options let check-time choice of the candidate reply. They are all run with the AI Studio API with default sampling configurations. To lower variance, we regular around many tria