'프로젝트/피지여행' 카테고리의 글 목록 (2 Page)

5. Natural Policy Gradient

2023.03.06

피지여행 4번째 논문 논문 저자 : Sham Kakade 논문 링크 : https://papers.nips.cc/paper/2073-a-natural-policy-gradient.pdf Proceeding : Advances in Neural Information Processing Systems (NIPS) 2002 정리 : 김동민, 이동민, 이웅원, 차금강 1. 들어가며... 이 논문이 발표된 2002년 당시에도 많은 연구자들이 objective function의 gradient 값을 따라서 좋은 policy $\pi$를 찾고자 하였습니다. 하지만 기존의 우리가 알던 gradient descent method는 steepest descent direction이 아닐 수 있기 때문에(쉽게 말해 가장 가..

프로젝트/피지여행

4. Deep Deterministic Policy Gradient (DDPG)

2023.03.06

피지여행 3번째 논문 논문 저자 : Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver & Daan Wierstra 논문 링크 : https://arxiv.org/pdf/1509.02971.pdf Proceeding : International Conference on Learning Representations (ICLR) 2016 정리 : 양혁렬, 이동민, 차금강 1. 들어가며... 1.1 Success & Limitation of DQN Success sensor로부터 나오는 전처리를 거친 input 대신에 raw pixel input을 사용합니다. 이렇게..

프로젝트/피지여행

3. Deterministic Policy Gradient Algorithms

2023.03.05

피지여행 2번째 논문 논문 저자 : David Silver, Guy Lever, Nicloas Heess, Thomas Degris, Daan Wierstra, Martin Riedmiller 논문 링크 : main text, supplementary material Proceeding : International Conference on Machine Learning (ICML) 2014 정리 : 김동민, 공민서, 장수영, 차금강 1. 들어가며... Deterministic Policy Gradient (DPG) Theorem을 제안합니다. 중요한 점은 DPG는 Expected gradient of the action-value function의 형태라는 것입니다. Policy variance가 0에 ..

프로젝트/피지여행

2. Policy Gradient Methods for Reinforcement Learning with Function Approximation

2023.03.05

피지여행 1번째 논문 논문 저자 : Richard S. Sutton, David McAllester, Satinder Singh, Yishay Mansour 논문 링크 : NIPS Proceeding : Advances in Neural Information Processing Systems (NIPS) 2000 정리 : 김동민, 이동민 1. Intro to Policy Gradient 이 논문은 policy gradient (PG) 기법의 효시와도 같으며 향후 많은 파생연구를 낳은 중요한 논문입니다. 7페이지의 짧은 논문이지만 읽기에 만만한 논문은 아닙니다. 이 논문을 이해하기 위해 필요한 배경지식을 먼저 설명하고 논문을 차근차근 살펴보도록 하겠습니다. 1.1 Value Function Approach..

5. Natural Policy Gradient

4. Deep Deterministic Policy Gradient (DDPG)

3. Deterministic Policy Gradient Algorithms

2. Policy Gradient Methods for Reinforcement Learning with Function Approximation

티스토리툴바