secrett2633's blog

[논문리뷰] VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications

October 1, 2025

이 [arXiv]에 게시한 ‘VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications’ 논문에 대한 자세한 리뷰입니다.

October 1, 2025

Muhammad Huzaifa이 [arXiv]에 게시한 ‘VisualOverload: Probing Visual Understanding of VLMs in Really Dense Scenes’ 논문에 대한 자세한 리뷰입니다.

October 1, 2025

Jing Shi이 [arXiv]에 게시한 ‘Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play’ 논문에 대한 자세한 리뷰입니다.

October 1, 2025

이 [arXiv]에 게시한 ‘TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning’ 논문에 대한 자세한 리뷰입니다.

October 1, 2025

이 [arXiv]에 게시한 ‘Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training’ 논문에 대한 자세한 리뷰입니다.