Ask Question, Ask an Expert

+1-415-315-9853

info@mywordsolution.com

Ask MATLAB Expert

Windy Grid World

This assignment is to use Reinforcement Learning to solve the following ‘Windy Grid World’ problem. There are four actions: move up, down, right, and left. This is a deterministic domain each action deterministically moves the agent one cell in the direction indicated. If the agent is on the boundary of the world and executes an action which would move it ‘off’ of the world, it remains on the grid in the same cell from that it executed the action.

These are the "windy" states. In these states, the agent experiences an extra ‘push’ upward. For illustration, if the agent is in a windy state and executes an action to the left or right, the result of the action is to move left or right (respectively) however also to move one cell upward. As a result, the agent moves diagonally upward to the left or right.

This is an episodic task where each episode lasts no more than 30 time steps. At the start of each episode, the agent is placed in the ‘Start’ state. Reward in this domain is zero everywhere except when the agent is in the goal state (labeled "goal" in the diagram). The agent obtains a reward of positive ten when it executes any action {\it from} the goal state. The episode ends subsequent to 30 time steps or when the agent takes any action after having landed in the goal state.

You must solve the problem using Q-learning. Employ e-greedy exploration with epsilon=0.1 (the agent takes a random action 10 percent of the time in order to explore.) Employ a learning rate of 0.1 and a discount rate of 0.9.

The programming must be done in MATLAB. Students might get access to MATLAB here. Alternatively, students might code in Python (using Numpy). If the student would rather code in a different language, please see Dr Platt or the TA.

Students must submit their homework in the form of a ZIP file that includes the following:

1. A PDF of a plot of grid world that illustrates the policy and a path found by Q-learning after it has approximately converged. The policy plot must identify the action taken by the policy in each state. The path must begin in the start state and follow the policy to the goal state.

2. A PDF of a plot of reward per episode.

3. A text file showing output from a sample run of your code.

4. A directory having all source code for your project.

5. A short readme file enumerating the imperative files in your submission.

Updates

You can initialize the Q function randomly or you can initialize it to a uniform value of 10. i.e., you can initialize Q such that each value in the table is equivalent to 10.

There have been problems about how to know when the algorithm has converged. The algorithm has converged when the value function has stopped changing significantly and the policy has stopped changing completely. Because we are using q-learning, the algorithm must converge to a single optimal policy.

Please as well submit a short readme file with your homework that enumerates the significant files in your submission.

MATLAB, Engineering

  • Category:- MATLAB
  • Reference No.:- M9402

Have any Question? 


Related Questions in MATLAB

Labobjectiveslearn how to create vectors and matrices in

Lab Objectives ? Learn how to create vectors and matrices in MATLAB ? Become familiar with some of the built-in MATLAB functions and how they work Deliverables ? Submit your pre-lab answers in Bblearn under the Lab 2 pre ...

Part -1introduction to programming with matlab1 the

Part -1: Introduction to Programming with MATLAB 1. The function move_me is defined like this: function w = move_me(v,a). The first input argument v is a row-vector, while a is a scalar. The function moves every element ...

Assignment 1 -question 1 design a pumping system that will

Assignment 1 - Question 1: Design a pumping system that will transfer raw water from Reservoir A to Reservoirs B and C. A control valve shall be located in Line D-C next to Reservoir C to control the share of flow betwee ...

Problem -the state estimation equation in the discrete

Problem - The state estimation equation in the discrete Kalman filter is x ^ (n|n) = A(n - 1)x ^ (n - 1|n - 1) + K(n)[y(n) - C(n)A(n - 1)x ^ (n - 1|n - 1)] Thus, given the state transition matrix A(n) and the observation ...

Algorithms assignmentthe following 3 tasks are to be

Algorithms Assignment The following 3 tasks are to be completed for Homework 1 - using the following MATLAB syntax for the script file "bike.m" 1) Write additional syntax to create an INPUT variable named time and prompt ...

Problem - the state estimation equation in the discrete

Problem - The state estimation equation in the discrete Kalman filter is x ^ (n|n) = A(n - 1)x ^ (n - 1|n - 1) + K(n)[y(n) - C(n)A(n - 1)x ^ (n - 1|n - 1)] Thus, given the state transition matrix A(n) and the observation ...

Assignment - instructions for the overall material

Assignment - Instructions for the Overall Material Balance Work with your learning pod to perform an overall material balance for fuel-grade ethanol production.  This balance is much simplified, but will help you to revi ...

Topic matlab coding signals and systems sampling fourier

Topic: MATLAB coding, signals and systems, sampling, Fourier analysis, FFT, new transform techniques Purpose of the assessment: The purpose of this assignment is to motivate students to learn MATLAB programming and how t ...

Consider the signal vt given belowwrite a matlab script to

Consider the signal v(t) given below. Write a MATLAB script to do the following (a) Display v(t) and v(2t-1) in window 1 by using subplot. (b) Display both the even component and odd component of v(t) in window 2 by usin ...

It is desired to design a power electronic system referred

It is desired to design a power electronic system, referred to as ‘the device' in below, which has the following specifications: - The device is supplied from a single-phase wall outlet which has a 240 V rms, 50 Hz speci ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

A cola-dispensing machine is set to dispense 9 ounces of

A cola-dispensing machine is set to dispense 9 ounces of cola per cup, with a standard deviation of 1.0 ounce. The manuf

What is marketingbullwhat is marketing think back to your

What is Marketing? • "What is marketing"? Think back to your impressions before you started this class versus how you

Question -your client david smith runs a small it

QUESTION - Your client, David Smith runs a small IT consulting business specialising in computer software and techno

Inspection of a random sample of 22 aircraft showed that 15

Inspection of a random sample of 22 aircraft showed that 15 needed repairs to fix a wiring problem that might compromise

Effective hrmquestionhow can an effective hrm system help

Effective HRM Question How can an effective HRM system help facilitate the achievement of an organization's strate