[๋ ผ๋ฌธ๋ฆฌ๋ทฐ] AMFT: Aligning LLM Reasoners by Meta-Learning the Optimal Imitation-Exploration Balance
Yong Li์ด [arXiv]์ ๊ฒ์ํ โAMFT: Aligning LLM Reasoners by Meta-Learning the Optimal Imitation-Exploration Balanceโ ๋ ผ๋ฌธ์ ๋ํ ์์ธํ ๋ฆฌ๋ทฐ์ ๋๋ค.