Ask Question, Ask an Expert

+61-413 786 465

info@mywordsolution.com

Ask Computer Engineering Expert

Windy Grid World

This assignment is to use Reinforcement Learning to solve the following "Windy Grid World" problem illustrated in the above picture. Each cell in the image is a state. There are four actions: move up, down, left, and right. This is a deterministic domain -- each action deterministically moves the agent one cell in the direction indicated. If the agent is on the boundary of the world and executes an action that would move it "off" of the world, it remains on the grid in the same cell from which it executed the action.
Notice that there are arrows drawn in some states in the diagram. These are the "windy" states. In these states, the agent experiences an extra "push" upward. For example, if the agent is in a windy state and executes an action to the left or right, the result of the action is to move left or right (respectively) but also to move one cell upward. As a result, the agent moves diagonally upward to the left or right.
This is an episodic task where each episode lasts no more than 30 time steps. At the beginning of each episode, the agent is placed in the "Start" state. Reward in this domain is zero everywhere except when the agent is in the goal state (labeled "goal" in the diagram). The agent receives a reward of positive ten when it executes any action {\it from} the goal state. The episode ends after 30 time steps or when the agent takes any action after having landed in the goal state.
You should solve the problem using Q-learning. Use e-greedy exploration with epsilon=0.1 (the agent takes a random action 10 percent of the time in order to explore.) Use a learning rate of 0.1 and a discount rate of 0.9.
The programming should be done in MATLAB. Students may get access to MATLAB here. Alternatively, students may code in Python (using Numpy). If the student would rather code in a different language, please see Dr Platt or the TA.
Students should submit their homework via email to the TA (suchismi@buffalo.edu) in the form of a ZIP file that includes the following:
1. A PDF of a plot of gridworld that illustrates the policy and a path found by Q-learning after it has approximately converged. The policy plot should identify the action taken by the policy in each state. The path should begin in the start state and follow the policy to the goal state.
2. A PDF of a plot of reward per episode. It should look like the diagram in Figure 6.13 in SB.
3. A text file showing output from a sample run of your code.
4. A directory containing all source code for your project.
5. A short readme file enumerating the important files in your submission.
Updates
You can initialize the Q function randomly or you can initialize it to a uniform value of 10. That is, you can initialize Q such that each value in the table is equal to 10.
There have been questions about how to know when the algorithm has converged. The algorithm has converged when the value function has stopped changing signficantly and the policy has stopped changing completely. Since we are using q-learning, the algorithm should converge to a single optimal policy.
Please also submit a short readme file with your homework that enumerates the important files in your submission.

Computer Engineering, Engineering

  • Category:- Computer Engineering
  • Reference No.:- M9521358

Have any Question?


Related Questions in Computer Engineering

Identify at least two 2 factors that have led to the

Identify at least two (2) factors that have led to the explosive growth of digital crime over the past a few decades. Next, describe the most common forms of digital crime, and give your opinion as to why those forms you ...

What is the transmission type transmission form

What is the Transmission Type, Transmission Form, Transmission Speed, Address for Transmission and Collusion for hubs?

Explain how the company newmans own brand fulfills the

Explain how the company Newman's Own brand fulfills the definition of a business for profit and a non-profit business at the same time. Consider in the response the functions of business, entrepreneurship and production ...

Explain that when an unauthorized individual gains access

Explain that when an unauthorized individual gains access to the information an organization trying to protect, that act is categorized as a deliberate act of espionage or trespass.

The below figure represents the potential outcomes of your

The below figure represents the potential outcomes of your first salary negotiation after graduation. Assuming this is a sequential-move game with the employer moving first, indicate the most likely outcome. Does the abi ...

A balloon has 050 mol ar at 175 k 0997 atm and 0775 l if

A balloon has 0.50 mol Ar at 175 K, 0.997 atm and 0.775 L. If the moles are doubled and the temperature dropped to 115 K at constant pressure, what would the volume (in L) be?

In a survey of 3236 adults 1470 say they have started

In a survey of 3236 adults, 1470 say they have started paying bills online in the last year. Construct a? 99% confidence interval for the population proportion. Interpret the results.

Truefalse questions no explanation is neededa on a

True/False Questions (NO explanation is needed): (a) On a Windows-based laptop computer the SECURITY registry file maintains a list of the host computer's wireless connections. (b) IEEE 802.11 Security Standard WEP (Wire ...

Question suppose that a web server has a link speed of

Question : Suppose that a web server has a link speed of 1Gbps. And suppose that each machine in a botnet has a link speed of 1Mbps a. How many botnet machines are needed to send data to the web server in order to fill t ...

Question a small financial focused business is looking to

Question : A small, financial focused business is looking to organize and secure its network. It currently has a single public IP address from a local telecom. Construct an argument as to how you think a company should e ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As