Ask Question, Ask an Expert

+61-413 786 465

info@mywordsolution.com

Ask Computer Engineering Expert

Assignment -

1. Machine learning has now permeated multiple disciplines, even politics. The current landscape in the US is rife with data scientists and other quantitative experts making predictions about ongoing and upcoming elections. Consider the Congressional Voting Records dataset from the UCI machine learning repository

The dataset contains two files: one with a ".names" suffix and one with a ".data" suffix. The actual data is in the ".data" suffix and ".names" describes the metadata (i.e., describes what the different columns mean). Note that each row of the ".data" file contains one instance and includes both features and the class label (please take care to note the order). The machine learning problem here is to take the votes of US congressmen/congresswomen as input and predict whether they are a Republican or a Democrat. In particular, our goal is to solve this problem using both decision trees and a naïve Bayes classifier.

First, spend some time understanding the structure of the dataset, how the instances are organized, how the features/class are organized, and so on. You need to "massage" this data into the form that scikit-learn requires before you can apply either a decision tree or a naïve Bayes classifier. So spend some time understanding and planning how you will do this massaging. You can do this in Python or in Excel or any way you choose. Note that this step is a natural part of the machine learning and knowledge discovery process. Data is rarely given in the form that machine learning can be directly applied, so that considerable effort goes into cleaning, manipulating, and massaging it. Do not apply scikit-learn before ensuring that it is in the form required.

Just like the PlayTennis dataset, the features are binary-valued but note that some features have missing values for some rows (instances). You need to decide how you will handle them. There are three possibilities here: i) discard instances that have missing feature values, ii) treat "missing" as if it is a value (and thus a binary feature becomes a ternary, or three-valued, feature), iii) impute missing values (i.e., for each feature, replace missing values with the most common value for that feature), so that they are no longer missing or unknown. If you read the ".notes" file, it explains why some values are missing and what they mean.

  • Implement a decision tree and Naïve Bayes classifier for classification, with each of the above three ways of dealing with missing values. So you are experimenting with 6 scenarios.
  • Perform 5-fold cross validation and report precision, recall, and F1-scores for each of the 6 scenarios.

2. For what type of dataset would you choose decision trees as a classifier over Naive Bayes? Vice versa?

Computer Engineering, Engineering

  • Category:- Computer Engineering
  • Reference No.:- M92481303
  • Price:- $30

Priced at Now at $30, Verified Solution

Have any Question?


Related Questions in Computer Engineering

Reusability of prior designs is critical when creating

Reusability of prior designs is critical when creating best-practice solutions in the enterprise. Capturing such designs provides useful guidance for Enterprise Architects. However, cataloguing such blueprints for reuse ...

Question capstone project - informationalthis part of the

Question: Capstone Project - Informational This part of the assignment is NOT FOR GRADING this week. This part of the assignment is to contribute to the capstone project and also to show the instructor that progress is b ...

Sorting algorithms are one kind of algorithm whose

Sorting algorithms are one kind of algorithm whose performance may depend upon the data. Choose one of the sorting algorithms or any other algorithm and explain whether the there are any differences in the best, average ...

Question research analyze and discuss the advantages and

Question : Research, analyze, and discuss the advantages and disadvantages of Web-based collaboration tools. Compare and contrast Google Apps and Microsoft SharePoint. The response must be typed, single spaced, must be i ...

On the spot courier services grew and changed over the

On the Spot courier services grew and changed over the years. At first, Bill received requests for package pickups on his mobile phone, recorded that information in a log, and would then drive around to retrieve all the ...

Explain and discuss the following quotepoliticians can be

Explain and discuss the following quote: "Politicians can be strange. They have been calling for the breakup of firms as diverse as energy companies and tech giants like Microsoft and Google because they believe these co ...

Question explain why you should always search the free

Question: Explain why you should always search the free space and slack space if you suspect person who deliberately delete file or information on a workstation that you are analyzing. The response must be typed, single ...

System analysis and design1 provide a scenario in which an

System Analysis and Design: 1) Provide a scenario in which an input display screen is inconsistent and explain how both the user and the organization would suffer as a result as a result. 2) Explain how structured walkth ...

Total cholesterol in children 10 to 15 years of age is

Total cholesterol in children 10 to 15 years of age is assumed to follow a normal distribution of 191 and a standard deviation 22.4. What proportion of children 10 to 15 years of age has total cholesterol between 180 and ...

A report claims that for the investment portfolios with a

A report claims that for the investment portfolios with a single stock had a standard deviation of 0.57, while the returns for portfolios with 31 stocks have a standard deviation of 0.325. Explain how the standard deviat ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As