Ask Question, Ask an Expert

+61-413 786 465

info@mywordsolution.com

Ask MATLAB Expert

Windy Grid World

This assignment is to use Reinforcement Learning to solve the following ‘Windy Grid World’ problem. There are four actions: move up, down, right, and left. This is a deterministic domain each action deterministically moves the agent one cell in the direction indicated. If the agent is on the boundary of the world and executes an action which would move it ‘off’ of the world, it remains on the grid in the same cell from that it executed the action.

These are the "windy" states. In these states, the agent experiences an extra ‘push’ upward. For illustration, if the agent is in a windy state and executes an action to the left or right, the result of the action is to move left or right (respectively) however also to move one cell upward. As a result, the agent moves diagonally upward to the left or right.

This is an episodic task where each episode lasts no more than 30 time steps. At the start of each episode, the agent is placed in the ‘Start’ state. Reward in this domain is zero everywhere except when the agent is in the goal state (labeled "goal" in the diagram). The agent obtains a reward of positive ten when it executes any action {\it from} the goal state. The episode ends subsequent to 30 time steps or when the agent takes any action after having landed in the goal state.

You must solve the problem using Q-learning. Employ e-greedy exploration with epsilon=0.1 (the agent takes a random action 10 percent of the time in order to explore.) Employ a learning rate of 0.1 and a discount rate of 0.9.

The programming must be done in MATLAB. Students might get access to MATLAB here. Alternatively, students might code in Python (using Numpy). If the student would rather code in a different language, please see Dr Platt or the TA.

Students must submit their homework in the form of a ZIP file that includes the following:

1. A PDF of a plot of grid world that illustrates the policy and a path found by Q-learning after it has approximately converged. The policy plot must identify the action taken by the policy in each state. The path must begin in the start state and follow the policy to the goal state.

2. A PDF of a plot of reward per episode.

3. A text file showing output from a sample run of your code.

4. A directory having all source code for your project.

5. A short readme file enumerating the imperative files in your submission.

Updates

You can initialize the Q function randomly or you can initialize it to a uniform value of 10. i.e., you can initialize Q such that each value in the table is equivalent to 10.

There have been problems about how to know when the algorithm has converged. The algorithm has converged when the value function has stopped changing significantly and the policy has stopped changing completely. Because we are using q-learning, the algorithm must converge to a single optimal policy.

Please as well submit a short readme file with your homework that enumerates the significant files in your submission.

MATLAB, Engineering

  • Category:- MATLAB
  • Reference No.:- M9402

Have any Question? 


Related Questions in MATLAB

Question 1 manipulate spectral imagehyperspectral images

Question 1. Manipulate spectral image Hyperspectral images can be seen as a generalisation of normal colour images such as RGB images. In a normal RGB colour image, there are 3 channels, i.e. channels for red colour, gre ...

Discrete optimisation- solve the following two problems

Discrete Optimisation - Solve the following two problems with both exhaustive enumeration and branch and bound - Problem 1 is a mixed integer linear optimisation problem (the problem has both discrete and continuous vari ...

Recitation problems -1 determine the highest real root of

Recitation Problems - 1. Determine the highest real root of f(x) = 2x 3 - 11.7x 2 + 17.7x - 5 using the Newton-Raphson method with at least four iterations. Start with an initial guess of x 0 = 3. 2. Determine the real r ...

Assignment - matlab programmingusing appropriate matlab

Assignment - MatLab Programming Using appropriate MatLab syntax, write the code required to analyse and display the data as per the problem description. The order of the MatLab Program should be as follows: Variables and ...

Question a safe prime is a prime number that can be written

Question : A safe prime is a prime number that can be written in the form 2p + 1 where p is also a prime number. Write a MATLAB script file that finds and displays all safe primes between 1 and 1000.

Assignment -data is given on which want to do computational

Assignment - Data is given on which want to do computational production planning using Metaheuristic MATLAB Programming: 1) Ant Colony Algorithm on both Partial and Total Flexible Problem. 2) Bee Algorithm on both Partia ...

Suppose that you have used some concept learning algorithm

Suppose that you have used some concept learning algorithm to learn a hypothesis h1 from some training data. You are interested in knowing the accuracy that the hypothesis can be expected to achieve on the underlying pop ...

Assignmentafter the success of your robo-advice venture you

Assignment After the success of your robo-advice venture you decide to explore alternative sources of profitability for your company. You realize that Australian investors are often forced to chose between expensive acti ...

Assignment -matlab codes and simulated model in

Assignment - Matlab codes and simulated model in simulink/matlab and truetime. 1. Matlab codes and simulink model for pid controller optimization using particle swarm optimization (PSO) my plant is integer order 1000/(s^ ...

Assignment -we have daily gridded rainfall data of 40 years

Assignment - We have daily gridded rainfall data of 40 years and structure of the dataset is like below; Lat = [6.5:0.25:38.5]; Lon = [66.5:0.25:100]; Rainfall (135x129x365x40) (Lon, Lat, days, years). Now, we looking fo ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As