Ask Question, Ask an Expert


Ask MATLAB Expert

Windy Grid World

This assignment is to use Reinforcement Learning to solve the following ‘Windy Grid World’ problem. There are four actions: move up, down, right, and left. This is a deterministic domain each action deterministically moves the agent one cell in the direction indicated. If the agent is on the boundary of the world and executes an action which would move it ‘off’ of the world, it remains on the grid in the same cell from that it executed the action.

These are the "windy" states. In these states, the agent experiences an extra ‘push’ upward. For illustration, if the agent is in a windy state and executes an action to the left or right, the result of the action is to move left or right (respectively) however also to move one cell upward. As a result, the agent moves diagonally upward to the left or right.

This is an episodic task where each episode lasts no more than 30 time steps. At the start of each episode, the agent is placed in the ‘Start’ state. Reward in this domain is zero everywhere except when the agent is in the goal state (labeled "goal" in the diagram). The agent obtains a reward of positive ten when it executes any action {\it from} the goal state. The episode ends subsequent to 30 time steps or when the agent takes any action after having landed in the goal state.

You must solve the problem using Q-learning. Employ e-greedy exploration with epsilon=0.1 (the agent takes a random action 10 percent of the time in order to explore.) Employ a learning rate of 0.1 and a discount rate of 0.9.

The programming must be done in MATLAB. Students might get access to MATLAB here. Alternatively, students might code in Python (using Numpy). If the student would rather code in a different language, please see Dr Platt or the TA.

Students must submit their homework in the form of a ZIP file that includes the following:

1. A PDF of a plot of grid world that illustrates the policy and a path found by Q-learning after it has approximately converged. The policy plot must identify the action taken by the policy in each state. The path must begin in the start state and follow the policy to the goal state.

2. A PDF of a plot of reward per episode.

3. A text file showing output from a sample run of your code.

4. A directory having all source code for your project.

5. A short readme file enumerating the imperative files in your submission.


You can initialize the Q function randomly or you can initialize it to a uniform value of 10. i.e., you can initialize Q such that each value in the table is equivalent to 10.

There have been problems about how to know when the algorithm has converged. The algorithm has converged when the value function has stopped changing significantly and the policy has stopped changing completely. Because we are using q-learning, the algorithm must converge to a single optimal policy.

Please as well submit a short readme file with your homework that enumerates the significant files in your submission.

MATLAB, Engineering

  • Category:- MATLAB
  • Reference No.:- M9402

Have any Question? 

Related Questions in MATLAB

Problem 1 10pts the csu-chill radar made observations of

Problem 1 [10pts.]: The CSU-CHILL radar made observations of the Ponnequin Wind Farm located on the boarder of Wyoming and Colorado near I-25. The wind farm is approximately 62 km away from the radar. Complex valued digi ...

Reports of projectproblem design a fourbar grashof

REPORTS OF PROJECT PROBLEM: Design a fourbar Grashof crank-rocker to give? (Everyone will determine the terms given by professor) of rocker rotation with equal time forward and back, from a constant speed motor input. 1- ...

Use matlab to solve the following equations please i plot

Use Matlab to solve the following equations. Please (i) plot the solution as a function of t for t  ∈ [0, 100]. (ii) draw the phase diagram of the ODE. 1. dy/dt = y 2 + y with y(0) = 1         dy 1 /dt = y 2 2.           ...

1 calculate the following for the function f x e-3x -

1) Calculate the following for the function f (x) = e -3x - 2x a. Calculate the derivative of the function by hand. Write a MATLAB function that calculates the derivative of this function and calculate the derivative at ...

Question 1backgroundclimate change is a change in global or

QUESTION 1 Background Climate change is a change in global or regional climate patterns, in particular a change apparent from the mid to late 20th century onwards and attributed largely to the increased levels of atmosph ...

Car parking systema building of 10 floors is used for

CAR PARKING SYSTEM A building of 10 floors is used for parking cars. The area of each floor can be used for parking 100 cars. An automatic parking system is used to detect which parking slot is free at each moment. So, t ...

Question 1backgroundyou have been asked to manage a project

QUESTION 1 Background You have been asked to manage a project to install a pipeline from an offshore gas platform to an onshore gas processing plant, and to find the cheapest design that is possible. The platform is Q km ...

Instructions for each of the following exercise create an

Instructions: For each of the following exercise, create an M-file to store the MATLAB commands. Copy and paste the M-file into a text document. Include in the text document the pictures produced by MATLAB. Resize and cr ...

Plotting and computer animation in matlabinstructions for

Plotting and computer animation in MATLAB Instructions: For each of the following exercise, create an M-file to store the MATLAB commands. Copy and paste the M-file into a text document. For problems 1 and 2, include in ...

Question 1 a what is the period of the forced oscillation

Question 1. (a) What is the period of the forced oscillation? What is the numerical value (modulo 2Π) of the angle α defined by? (b) In this question you are asked to modify the file LAB06ex1.m in order to plot the compl ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

WalMart Identification of theory and critical discussion

Drawing on the prescribed text and/or relevant academic literature, produce a paper which discusses the nature of group

Section onea in an atwood machine suppose two objects of

SECTION ONE (a) In an Atwood Machine, suppose two objects of unequal mass are hung vertically over a frictionless

Part 1you work in hr for a company that operates a factory

Part 1: You work in HR for a company that operates a factory manufacturing fiberglass. There are several hundred empl

Details on advanced accounting paperthis paper is intended

DETAILS ON ADVANCED ACCOUNTING PAPER This paper is intended for students to apply the theoretical knowledge around ac

Create a provider database and related reports and queries

Create a provider database and related reports and queries to capture contact information for potential PC component pro