Ask Question, Ask an Expert

+61-413 786 465

info@mywordsolution.com

Ask Computer Engineering Expert

Data Science Project Report

Submit the files listed below in a single ZIP file:

• Titanic.rmd - R Markdown document used to generate your Data Science Project Report. An initial Sample_RMD.rm template file has been provided for you.

• Titanic.html - standalone HTML document (embedded images and code) generated in R Studio using Knitr and your Titanic.rmd R Markdown file.

• /data directory with your dataset files

Reproducible Research

For your Data Science Project Report you are expected to meet the criteria of a reproducible research project. Your Project report will document your analysis of the Titanic dataset. It will include your initial data exploration, model building and evaluation and your final predicted outcomes for the test dataset. For your research to be considered reproducible you must provide:

• The data used for your analysis

• All final code files, with appropriate comments

• A report of your analysis which includes background information explaining the question you are trying to answer, a discussion of the analysis and conclusions reached for your project with appropriate supporting explanations and figures.

To comply with this final requirement, your final report will be a standalone HTML document created using R Studio with Knitr & R Markdown tools. Using Knitr with R Markdown allows you to create a report that interweaves your discussion with your code and figures. See R Markdown - Dynamic Documents for R in the list of online resources provided below for further information.

Data Analysis Project

This assessment is based on a Kaggle competition. For this assignment you are asked to predict which of the Titanic's passengers survived the disaster. More information on the competition is available at the Kaggle competition site: Titanic: Machine Learning from Disaster

[https://www.kaggle.com/c/titanic].

The sinking of the RMS Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers and crew. This sensational tragedy shocked the international community and led to better safety regulations for ships.

One of the reasons that the shipwreck led to such loss of life was that there were not enough lifeboats for the passengers and crew. Although there was some element of luck involved in surviving the sinking, some groups of people were more likely to survive than others, such as women, children, and the upper-class.

In this challenge, we ask you to complete the analysis of what sorts of people were likely to survive. In particular, we ask you to apply the tools of machine learning to predict which passengers survived the tragedy. (Kaggle 2012)

Project Report Outline

Please use the project report outline provided below as a general guide to the specific sections and content that you should include in your project report.

1. Background

Introduce and discuss the background and purpose of your project. What information does the dataset provide? What question(s) are you trying to answer?

2. Exploratory analysis

Conduct exploratory analysis to discover which of the independent variables are most informative. You are required to explore and report on at least four variables. Three of the four must be Age, Sex and Class. You are free to explore and report on any other independent variables in the dataset. Your discussion should include at least one table or figure for each variable illustrating the relationship between each variable and passenger survival.

3. Building and evaluating the model

a. Discuss your choice of model. Explain why you've chosen this specific model. What are its strengths? What are its limitations?

b. Evaluate your model. The discussion for the evaluation section should include answers to following questions: How well does your model predict? Is it overfitting to the training set? Do you trust this model?

c. This section should include at least 2 tables or figures to summarize/ illustrate your discussion.

4. Predicting passenger survival

Finally, use the model you've built to predict the outcomes for the test set and compare these results to your training data. Optionally, I encourage you to submit your predictions to the Kaggle competition site and include your results in your report.

5. Conclusions

Discuss the conclusions you've drawn based on your analysis.

List of online resources

• Titanic: Machine Learning from Disaster

Kaggle competition site.

https://www.kaggle.com/c/titanic

• R Markdown -Dynamic Documents for R http://rmarkdown.rstudio.com/

• Getting Started with R: Kaggle's Titanic Competition:

List of 4 excellent tutorials for using R to compete in the Titanic competition. https://www.kaggle.com/c/titanic/details/new-getting-started-with-r

• Kaggle and DataCamp R Tutorial on Machine Learning

Interactive tutorial by Kaggle and DataCamp which provides coding exercises to help you predict the passenger survival rates for Kaggle's Titanic competition.

https://www.datacamp.com/courses/kaggle-tutorial-on-machine-learing-the-sinking-of-thetitanic

References

Titanic: Machine Learning from Disaster 2012, Kaggle, viewed 8 Oct 2015, https://www.kaggle.com/c/titanic.

Computer Engineering, Engineering

  • Category:- Computer Engineering
  • Reference No.:- M91605685

Have any Question?


Related Questions in Computer Engineering

Question topic business informationsearching relevant

Question: Topic: Business Information Searching relevant scholarly journal articles, research and discuss the following prompts. Include a minimum of two (2) scholarly journal articles relevant to each prompt for a total ...

The rooted fibonacci trees tn are defined recursively in

The rooted Fibonacci trees T n are defined recursively in the following way. T 1 and T 2 are both the rooted tree consisting of a single vertex, and for n = 3, 4, ...., the rooted tree T n is constructed from a root with ...

Recall the definition of a complete graph kn is a graph

Recall the definition of a complete graph K_n is a graph with n vertices such that every vertex is connected to every oilier vertex. Recall also that a clique is a complete subset of some graph. The graph coloring proble ...

Question what some of the reasons that evolutionary models

Question : What some of the reasons that evolutionary models are considered by many to be the best approach to software development. The response must be typed, single spaced, must be in times new roman font (size 12) an ...

Question suppose a wireless channel has a coherence

Question : Suppose a wireless channel has a coherence bandwidth of 100 kHz. What range of bit rates can be supported to have flat fading? The response must be typed, single spaced, must be in times new roman font (size 1 ...

Aaninformationretrievalsystemhasacertainpairofaverageprecisi

(a) An information retrieval system has a certain pair of average precision and recall values when the system returns 10 documents in response queries. Would the precision and recall rate remain unchanged if the system w ...

Give an example of a binary relation which is not

Give an example of a binary relation which is not transitive, and then give an example of a binary relation which is reflexive and transitive but not connected.

Suppose that relations r and s contain 20 and 30 blocks

Suppose that relations r and s contain 20 and 30 blocks respectively, and the block nested-loop join algorithm is used to natural-join r and s. What is the cost in terms of number of seeks and number of block transfers i ...

What do you gain from being able to see the data in a

What do you gain from being able to "see" the data in a graphic presentation, that a table of the data may not readily provide? Discuss and explain why.

Single purpose processors design the sequence recognizer

Single Purpose Processors Design the sequence recognizer for 110. Perform the following steps: - 1. the state diagram - 2. the state table -K-map - 3. Simplification of the function by using the K-map -Circuit (logic dia ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As