On Proximal Policy Optimization

This is the story of a man named Proximal Policy Optimization (You may want to read this along with this article). His friends call him PPO. He's one of reinforcement learning most popular guys. PPO is a simple man, so simple that you can summarize how he acts in any given situation in just under … Continue reading On Proximal Policy Optimization

The Summary of Basic Reinforcement Learning Algorithms

Vanilla Policy Gradient (VPG): Make trajectories leading to rewards more likely to be chosen by the policy On policy Discrete and continuous Trust Region Policy Optimization (TRPO): Make trajectories leading to rewards more likely to be chosen by the policy but take the largest step in each update such that the KL-divergence between the updated … Continue reading The Summary of Basic Reinforcement Learning Algorithms

Illusion 100

We really want to have illusion 100 in Skyrim. But when we're confronted with the choice to use command cheat or not, we stopped. It's not about being a powerful mage. It's about becoming a powerful mage. If we can choose to suddenly have great ability in artificial intelligence, would we accept? We would but … Continue reading Illusion 100