Ask Question, Ask an Expert

+61-413 786 465

info@mywordsolution.com

Ask Applied Statistics Expert

Question - The city of Pittsburgh, Pennsylvania, lies where three rivers, the Allegheny, Monongahela and Ohio, meet. It has long been important to build bridges there, to enable its residents to cross the rivers safely. See List of bridges of Pittsburgh Wikipedia page for a listing (with pictures) of the bridges. The data contains detail for a large number of past and present bridges in Pittsburgh. All the variables we will use are categorical.

Here they are:

  • id identifying the bridge (we ignore)
  • river: initial letter of river that the bridge crosses
  • location: a numerical code indicating the location within Pittsburgh (we ignore)
  • erected: time period in which the bridge was built (a name, from CRAFTS, earliest, to MODERN, most recent.
  • purpose: what the bridge carries: foot traffic ("walk"), water (aqueduct), road or railroad.
  • length categorized as long, medium or short.
  • lanes of traffic (or number of railroad tracks): a number, 1, 2, 4 or 6, that we will count as categorical.
  • clear g: whether a vertical navigation requirement was included in the bridge design (that is, ships of a certain height had to be able to get under the bridge). I think G means "yes".
  • t_d: method of construction. DECK means the bridge deck is on top of the construction, THROUGH means that when you cross the bridge, some of the bridge supports are next to you or above you.
  • material the bridge is made of: iron, steel or wood.
  • span: whether the bridge covers a short, medium or long distance.
  • rel_l: Relative length of the main span of the bridge (between the two central piers) to the total crossing length. The categories are S, S-F and F. I don't know what these mean.
  • type of bridge: wood, suspension, arch and three types of truss bridge: cantilever, continuous and simple.

The website SteelConstruction is an excellent source of information about bridges.

(a) The bridges are stored in CSV format. Some of the information is not known and was recorded in the spreadsheet as ?. Turn these into genuine missing values by adding na="?" to your file-reading command. Display some of your data, enough to see that you have some missing data.

(b) The R function complete.cases takes a data frame as input and returns a vector of TRUE or FALSE values. Each row of the data frame is checked to see whether it is "complete" (has no missing values), in which case the result is TRUE, or not (has one or more missing values), in which case the result is FALSE. Add a new column called is complete to your data frame that indicates whether each row is complete. Save the result, and then display (some of) your length column along with your new column. Do the results make sense?

(c) Create the data frame that will be used for the analysis by picking out only those rows that have no missing values. (Use what you have done so far to help you.)

(d) We are going to assess the dissimilarity between two bridges by the number of the categorical variables they disagree on. This is called a "simple matching coefficient", and is the same thing we did in the question about clustering fruits based on their properties. This time, though, we want to count matches in things that are rows of our data frame (properties of two different bridges), so we will need to use a strategy like the one I used in calculating the BrayCurtis distances.

First, write a function that takes as input two vectors v and w and counts the number of their entries that differ (comparing the first with the first, the second with the second, . . . , the last with the last. I can think of a quick way and a slow way, but either way is good.) To test your function, create two vectors (using c) of the same length, and see whether it correctly counts the number of corresponding values that are different.

(e) Write a function that has as input two row numbers and a data frame to take those rows from. The function needs to select all the columns except for id, location and is complete, select the rows required one at a time, and turn them into vectors. (There may be some repetitiousness here. That's OK.) Then those two vectors are passed into the function you wrote in the previous part, and the count of the number of differences is returned. This is like the code in the Bray-Curtis problem. Test your function on rows 3 and 4 of your bridges data set (with the missings removed).

There should be six variables that are different.

(f) Create a matrix or data frame of pairwise dissimilarities between each pair of bridges (using only the ones with no missing values). Use loops, or crossing and map2 int, as you prefer. Display the first six rows of your matrix (using head) or the first few rows of your data frame. (The whole thing is big, so don't display it all.)

(g) Turn your matrix or data frame into a dist object. Do not display your distance object.

(h) Run a cluster analysis using Ward's method, and display a dendrogram. The labels for the bridges (rows of the data frame) may come out too big; experiment with a cex less than 1 on the plot so that you can see them.

(i) How many clusters do you think is reasonable for these data? Draw them on your plot.

(j) Pick three bridges in the same one of your clusters (it doesn't matter which three bridges or which cluster). Display the data for these bridges. Does it make sense that these three bridges ended up in the same cluster? Explain briefly.

Finish Question 8 - d, e, f, g, give me both R code and output.

Attachment:- Assignment Files.rar

Applied Statistics, Statistics

  • Category:- Applied Statistics
  • Reference No.:- M92757535

Have any Question?


Related Questions in Applied Statistics

Exercise -q1 do the example data in table 35-2 meet the

Exercise - Q1. Do the example data in Table 35-2 meet the assumptions for the Pearson χ 2 test? Provide a rationale for your answer. Q2. Compute the χ 2 test. What is the χ 2 value? Q3. Is the χ 2 significant at α = 0.05 ...

Analysis - comparing group means background and objectives

Analysis - Comparing group means Background and objectives - The balance_FALL17.xlsx dataset represents data from a fictitious study that explores the impact of two different interventions designed to help elderly client ...

Part a -question 1 - true or false in data collection the

Part A - Question 1 - True or False: In data collection, the most common technique to ensure proper representation of the population is to use a random sample. True False Question 2 - Most analysts focus on the cost of H ...

A company produces labeled packaging material and the

A company produces labeled packaging material and the company intends to buy a new machine for shaping and labeling. The company has been approached by two different companies and the production manager did a test on the ...

As the him member of your research team you are preparing

As the HIM member of your research team, you are preparing for a clinical trial study involving human subjects. The research would track two possible mutually exclusive outcomes, such as survive/did not survive, yes/no, ...

Assessment task -you become interested in the non-skeletal

Assessment Task - You become interested in the non-skeletal effects of vitamin D and review the literature. On the basis of your reading you find that there is some evidence to suggest that vitamin D deficiency is linked ...

Business analytics and statistics research report

Business Analytics and Statistics Research Report Assignment - This assignment is based on fictional data - do not contact the company listed below. You are creating a business report for the CEO of a retail company call ...

Business analytics and statistics research report -this

Business Analytics and Statistics Research Report - This assignment is based on fictional data. You are creating a business report for the CEO of a retail company called, Athlete Panda. It must be professional in present ...

Assignment -a psychologist was interested in evaluating the

Assignment - A psychologist was interested in evaluating the effectiveness of a domestic violence prevention program in changing attitudes towards domestic violence in a sample of high school students. The psychologist r ...

Go to the website of ahrqs website titled advances in

Go to the website of AHRQ's website titled Advances in Patient Safety: From Research to Implementation located in the Webliography and select a specific research article. Bring back to the class, at minimum, the followin ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As