Reinforcement learning

3 min readFeb 13, 2021

we all had dreams of creating A.I that can do amazing things like play games with you when we were child and now we can actually make this happen with thing called reinforcement learning assuming you don’t know machine learning or deep learning still you can make a little A.i for you self which can play learn can do mistakes and other things so the question is how ?

So you can do this by python in gym environment it will not be a very complex agent (it’s a term for our a.i in reinforcement learning)

i know what you all thinking that you don’t know much about deep learning how we gonna make it don’t worry for this particular example you don’t even need a background in deep learning but this will give a sense what you can do if you do something with deep reinforcement learning.

Okay enough of the chit chat let’s come to the point how?

Step-1 Open any IDE as your taste i prefer jupyter notebook with miniconda don’t judge if you use different one

Step-2

pip install gym terminal 
pip install numpy

(don’t forget use python before using pip install gym)

Step - 3

import gym
import numpy as npenv = gym.make(“MountainCar-v0”)LEARNING_RATE = 0.1DISCOUNT = 0.95
EPISODES = 25000
SHOW_EVERY = 3000DISCRETE_OS_SIZE = [20, 20]
discrete_os_win_size = (env.observation_space.high — env.observation_space.low)/DISCRETE_OS_SIZE# Exploration settings
epsilon = 1 # not a constant, qoing to be decayed
START_EPSILON_DECAYING = 1
END_EPSILON_DECAYING = EPISODES//2
epsilon_decay_value = epsilon/(END_EPSILON_DECAYING — START_EPSILON_DECAYING)q_table = np.random.uniform(low=-2, high=0, size=(DISCRETE_OS_SIZE + [env.action_space.n]))def get_discrete_state(state):
 discrete_state = (state — env.observation_space.low)/discrete_os_win_size
 return tuple(discrete_state.astype(np.int)) # we use this tuple to look up the 3 Q values for the available actions in the q-tablefor episode in range(EPISODES):
 discrete_state = get_discrete_state(env.reset())
 done = Falseif episode % SHOW_EVERY == 0:
 render = True
 print(episode)
 else:
 render = Falsewhile not done:if np.random.random() > epsilon:
 # Get action from Q table
 action = np.argmax(q_table[discrete_state])
 else:
 # Get random action
 action = np.random.randint(0, env.action_space.n)new_state, reward, done, _ = env.step(action)new_discrete_state = get_discrete_state(new_state)if episode % SHOW_EVERY == 0:
 env.render()
 #new_q = (1 — LEARNING_RATE) * current_q + LEARNING_RATE * (reward + DISCOUNT * max_future_q)# If simulation did not end yet after last step — update Q table
 if not done:# Maximum possible Q value in next step (for new state)
 max_future_q = np.max(q_table[new_discrete_state])# Current Q value (for current state and performed action)
 current_q = q_table[discrete_state + (action,)]# And here’s our equation for a new Q value for current state and action
 new_q = (1 — LEARNING_RATE) * current_q + LEARNING_RATE * (reward + DISCOUNT * max_future_q)# Update Q table with new Q value
 q_table[discrete_state + (action,)] = new_q# Simulation ended (for any reson) — if goal position is achived — update Q value with reward directly
 elif new_state[0] >= env.goal_position:
 #q_table[discrete_state + (action,)] = reward
 q_table[discrete_state + (action,)] = 0discrete_state = new_discrete_state# Decaying is being done every episode if episode number is within decaying range
 if END_EPSILON_DECAYING >= episode >= START_EPSILON_DECAYING:
 epsilon -= epsilon_decay_valueenv.close()

Don’t worry if agent learn slowly it usually depends on machine but it will works on any machine
i will be posting deep reinforcement learning project with code so stay connected

Reinforcement learning

Written by whoisslimshady