[논문리뷰] RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization
Kechi Zhang이 [arXiv]에 게시한 ‘RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization’ 논문에 대한 자세한 리뷰입...