Reinforcement learning

whoisslimshady
3 min readFeb 13, 2021

we all had dreams of creating A.I that can do amazing things like play games with you when we were child and now we can actually make this happen with thing called reinforcement learning assuming you don’t know machine learning or deep learning still you can make a little A.i for you self which can play learn can do mistakes and other things so the question is how ?

So you can do this by python in gym environment it will not be a very complex agent (it’s a term for our a.i in reinforcement learning)

i know what you all thinking that you don’t know much about deep learning how we gonna make it don’t worry for this particular example you don’t even need a background in deep learning but this will give a sense what you can do if you do something with deep reinforcement learning.

Okay enough of the chit chat let’s come to the point how?

Step-1 Open any IDE as your taste i prefer jupyter notebook with miniconda don’t judge if you use different one

Step-2

pip install gym terminal 
pip install numpy

(don’t forget use python before using pip install gym)

Step - 3

import gym
import numpy as np
env = gym.make(“MountainCar-v0”)LEARNING_RATE = 0.1DISCOUNT = 0.95
EPISODES = 25000
SHOW_EVERY = 3000
DISCRETE_OS_SIZE = [20, 20]
discrete_os_win_size = (env.observation_space.high — env.observation_space.low)/DISCRETE_OS_SIZE
# Exploration settings
epsilon = 1 # not a constant, qoing to be decayed
START_EPSILON_DECAYING = 1
END_EPSILON_DECAYING = EPISODES//2
epsilon_decay_value = epsilon/(END_EPSILON_DECAYING — START_EPSILON_DECAYING)
q_table = np.random.uniform(low=-2, high=0, size=(DISCRETE_OS_SIZE + [env.action_space.n]))def get_discrete_state(state):
discrete_state = (state — env.observation_space.low)/discrete_os_win_size
return tuple(discrete_state.astype(np.int)) # we use this tuple to look up the 3 Q values for the available actions in the q-table
for episode in range(EPISODES):
discrete_state = get_discrete_state(env.reset())
done = False
if episode % SHOW_EVERY == 0:
render = True
print(episode)
else:
render = False
while not done:if np.random.random() > epsilon:
# Get action from Q table
action = np.argmax(q_table[discrete_state])
else:
# Get random action
action = np.random.randint(0, env.action_space.n)
new_state, reward, done, _ = env.step(action)new_discrete_state = get_discrete_state(new_state)if episode % SHOW_EVERY == 0:
env.render()
#new_q = (1 — LEARNING_RATE) * current_q + LEARNING_RATE * (reward + DISCOUNT * max_future_q)
# If simulation did not end yet after last step — update Q table
if not done:
# Maximum possible Q value in next step (for new state)
max_future_q = np.max(q_table[new_discrete_state])
# Current Q value (for current state and performed action)
current_q = q_table[discrete_state + (action,)]
# And here’s our equation for a new Q value for current state and action
new_q = (1 — LEARNING_RATE) * current_q + LEARNING_RATE * (reward + DISCOUNT * max_future_q)
# Update Q table with new Q value
q_table[discrete_state + (action,)] = new_q
# Simulation ended (for any reson) — if goal position is achived — update Q value with reward directly
elif new_state[0] >= env.goal_position:
#q_table[discrete_state + (action,)] = reward
q_table[discrete_state + (action,)] = 0
discrete_state = new_discrete_state# Decaying is being done every episode if episode number is within decaying range
if END_EPSILON_DECAYING >= episode >= START_EPSILON_DECAYING:
epsilon -= epsilon_decay_value
env.close()
  • Don’t worry if agent learn slowly it usually depends on machine but it will works on any machine
  • i will be posting deep reinforcement learning project with code so stay connected

@

--

--

whoisslimshady

Just a boring guy who falls in love with machine learning