[๋ ผ๋ฌธ๋ฆฌ๋ทฐ] UltraHorizon: Benchmarking Agent Capabilities in Ultra Long-Horizon Scenarios
Zeyu Qin์ด [arXiv]์ ๊ฒ์ํ โUltraHorizon: Benchmarking Agent Capabilities in Ultra Long-Horizon Scenariosโ ๋ ผ๋ฌธ์ ๋ํ ์์ธํ ๋ฆฌ๋ทฐ์ ๋๋ค.