Ask Question, Ask an Expert


Ask MATLAB Expert

Windy Grid World

This assignment is to use Reinforcement Learning to solve the following ‘Windy Grid World’ problem. There are four actions: move up, down, right, and left. This is a deterministic domain each action deterministically moves the agent one cell in the direction indicated. If the agent is on the boundary of the world and executes an action which would move it ‘off’ of the world, it remains on the grid in the same cell from that it executed the action.

These are the "windy" states. In these states, the agent experiences an extra ‘push’ upward. For illustration, if the agent is in a windy state and executes an action to the left or right, the result of the action is to move left or right (respectively) however also to move one cell upward. As a result, the agent moves diagonally upward to the left or right.

This is an episodic task where each episode lasts no more than 30 time steps. At the start of each episode, the agent is placed in the ‘Start’ state. Reward in this domain is zero everywhere except when the agent is in the goal state (labeled "goal" in the diagram). The agent obtains a reward of positive ten when it executes any action {\it from} the goal state. The episode ends subsequent to 30 time steps or when the agent takes any action after having landed in the goal state.

You must solve the problem using Q-learning. Employ e-greedy exploration with epsilon=0.1 (the agent takes a random action 10 percent of the time in order to explore.) Employ a learning rate of 0.1 and a discount rate of 0.9.

The programming must be done in MATLAB. Students might get access to MATLAB here. Alternatively, students might code in Python (using Numpy). If the student would rather code in a different language, please see Dr Platt or the TA.

Students must submit their homework in the form of a ZIP file that includes the following:

1. A PDF of a plot of grid world that illustrates the policy and a path found by Q-learning after it has approximately converged. The policy plot must identify the action taken by the policy in each state. The path must begin in the start state and follow the policy to the goal state.

2. A PDF of a plot of reward per episode.

3. A text file showing output from a sample run of your code.

4. A directory having all source code for your project.

5. A short readme file enumerating the imperative files in your submission.


You can initialize the Q function randomly or you can initialize it to a uniform value of 10. i.e., you can initialize Q such that each value in the table is equivalent to 10.

There have been problems about how to know when the algorithm has converged. The algorithm has converged when the value function has stopped changing significantly and the policy has stopped changing completely. Because we are using q-learning, the algorithm must converge to a single optimal policy.

Please as well submit a short readme file with your homework that enumerates the significant files in your submission.

MATLAB, Engineering

  • Category:- MATLAB
  • Reference No.:- M9402

Have any Question? 

Related Questions in MATLAB

Problem -the state estimation equation in the discrete

Problem - The state estimation equation in the discrete Kalman filter is x ^ (n|n) = A(n - 1)x ^ (n - 1|n - 1) + K(n)[y(n) - C(n)A(n - 1)x ^ (n - 1|n - 1)] Thus, given the state transition matrix A(n) and the observation ...

Provide matlab code as per the research paperresearch paper

Provide MATLAB code as per the research paper. Research paper - Decimeter-Level Localization with a Single WiFi Access Point We present Chronos, a system that enables a single WiFi access point to localize clients to wit ...

Problem 1 the two foremost aerospace companies in the usa

PROBLEM 1: The two foremost aerospace companies in the USA are Lockheed and Boeing. They often compete for government contracts. A major factor in awarding a contract is the claimed reliability of the given system to be ...

Assignment 1 -question 1 design a pumping system that will

Assignment 1 - Question 1: Design a pumping system that will transfer raw water from Reservoir A to Reservoirs B and C. A control valve shall be located in Line D-C next to Reservoir C to control the share of flow betwee ...

Assignment - instructions for the overall material

Assignment - Instructions for the Overall Material Balance Work with your learning pod to perform an overall material balance for fuel-grade ethanol production.  This balance is much simplified, but will help you to revi ...

Project 1complete parts 1a 1b and 1cprepare a report

Project 1 Complete parts 1A, 1B, and 1C Prepare a report describing your steps and results Send by email a zip or rar file that contains the report and all MATLAB files you have used. Project 1A: In this project, we cons ...

Topic matlab coding signals and systems sampling fourier

Topic: MATLAB coding, signals and systems, sampling, Fourier analysis, FFT, new transform techniques Purpose of the assessment: The purpose of this assignment is to motivate students to learn MATLAB programming and how t ...

Discrete communicationsobjectiverevision of fundamental

Discrete Communications Objective: Revision of fundamental concepts and demonstration of necessary foundation skills. Assessment: The assignment will comprise 20% of your final mark and all ten problems will be of equal ...

Consider the signal vt given belowwrite a matlab script to

Consider the signal v(t) given below. Write a MATLAB script to do the following (a) Display v(t) and v(2t-1) in window 1 by using subplot. (b) Display both the even component and odd component of v(t) in window 2 by usin ...

Question 11 introductionmeasurements have been made of a

Question 1: 1 Introduction: Measurements have been made of a cantilever beam's deflection (stored in the file ass2q1in.csv). You must determine what type of loading has been applied and the magnitude of the load. There a ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

A cola-dispensing machine is set to dispense 9 ounces of

A cola-dispensing machine is set to dispense 9 ounces of cola per cup, with a standard deviation of 1.0 ounce. The manuf

What is marketingbullwhat is marketing think back to your

What is Marketing? • "What is marketing"? Think back to your impressions before you started this class versus how you

Question -your client david smith runs a small it

QUESTION - Your client, David Smith runs a small IT consulting business specialising in computer software and techno

Inspection of a random sample of 22 aircraft showed that 15

Inspection of a random sample of 22 aircraft showed that 15 needed repairs to fix a wiring problem that might compromise

Effective hrmquestionhow can an effective hrm system help

Effective HRM Question How can an effective HRM system help facilitate the achievement of an organization's strate