Member-only story

Train Your Lunar-Lander | Reinforcement Learning | OpenAIGYM

6 min readApr 20, 2019

Lunar Lander is another interesting problem in OpenAIGym. In my previous blog, I solved the classic control environments. In this blog, I will be solving the Lunar Lander environment.

Reinforcement Learning | Brief Intro

Reinforcement learning is an interesting area of Machine learning. The rough idea is that you have an agent and an environment. The agent takes actions and environment gives reward based on those actions, The goal is to teach the agent optimal behavior in order to maximize the reward received by the environment.

For example, have a look at the diagram. This maze represents our environment. Our purpose would be to teach the agent an optimal policy so that it can solve this maze. The maze will provide a reward to the agent based on the goodness of each action it takes. Also, each action taken by agent leads it to the new state in the environment.

About Lunar-Lander

As you can see in the picture below, there is one space-ship. The task is to land the space-ship between the flags smoothly. The ship has 3 throttles in it. One throttle points downward and the other 2 points in the left and right direction. With the help of these, you have to control the Ship.

There are 2 different Lunar Lander Environment in OpenAIGym. One has discrete action space and the other has continuous action space. Let’s solve both one by one. Please read this doc to know how to use Gym environments.

LunarLander-v2 (Discrete)

Landing pad is always at coordinates (0,0). Coordinates are the first two numbers in state vector. Reward for moving from the top of the screen to landing pad and zero speed is about 100..140 points. If lander moves away from landing pad it loses reward back. Episode finishes if the lander crashes or comes to rest, receiving additional -100 or +100…

Train Your Lunar-Lander | Reinforcement Learning | OpenAIGYM

Reinforcement Learning | Brief Intro

About Lunar-Lander

LunarLander-v2 (Discrete)

Written by Shiva Verma

Responses (5)