Ask Question, Ask an Expert

+61-413 786 465

info@mywordsolution.com

Ask Computer Engineering Expert

Assignment -

1. Machine learning has now permeated multiple disciplines, even politics. The current landscape in the US is rife with data scientists and other quantitative experts making predictions about ongoing and upcoming elections. Consider the Congressional Voting Records dataset from the UCI machine learning repository

The dataset contains two files: one with a ".names" suffix and one with a ".data" suffix. The actual data is in the ".data" suffix and ".names" describes the metadata (i.e., describes what the different columns mean). Note that each row of the ".data" file contains one instance and includes both features and the class label (please take care to note the order). The machine learning problem here is to take the votes of US congressmen/congresswomen as input and predict whether they are a Republican or a Democrat. In particular, our goal is to solve this problem using both decision trees and a naïve Bayes classifier.

First, spend some time understanding the structure of the dataset, how the instances are organized, how the features/class are organized, and so on. You need to "massage" this data into the form that scikit-learn requires before you can apply either a decision tree or a naïve Bayes classifier. So spend some time understanding and planning how you will do this massaging. You can do this in Python or in Excel or any way you choose. Note that this step is a natural part of the machine learning and knowledge discovery process. Data is rarely given in the form that machine learning can be directly applied, so that considerable effort goes into cleaning, manipulating, and massaging it. Do not apply scikit-learn before ensuring that it is in the form required.

Just like the PlayTennis dataset, the features are binary-valued but note that some features have missing values for some rows (instances). You need to decide how you will handle them. There are three possibilities here: i) discard instances that have missing feature values, ii) treat "missing" as if it is a value (and thus a binary feature becomes a ternary, or three-valued, feature), iii) impute missing values (i.e., for each feature, replace missing values with the most common value for that feature), so that they are no longer missing or unknown. If you read the ".notes" file, it explains why some values are missing and what they mean.

  • Implement a decision tree and Naïve Bayes classifier for classification, with each of the above three ways of dealing with missing values. So you are experimenting with 6 scenarios.
  • Perform 5-fold cross validation and report precision, recall, and F1-scores for each of the 6 scenarios.

2. For what type of dataset would you choose decision trees as a classifier over Naive Bayes? Vice versa?

Computer Engineering, Engineering

  • Category:- Computer Engineering
  • Reference No.:- M92481303
  • Price:- $30

Priced at Now at $30, Verified Solution

Have any Question?


Related Questions in Computer Engineering

You have an opportunity to buy a bond with a face value of

You have an opportunity to buy a bond with a face value of $10,000 and coupon rate of 14%, payable semi-annually. NOTE: Interest per 6-month period is 7% of Face Value (i.e. $10,000x0.07 = $700 per 6-month period).  (i) ...

This is in reference to cshort answer questions1 what is

This is in reference to C++ Short Answer Questions: 1) What is the definition of Big-O? What do we use Big-O notation to show? 2) Define recursion. Can every recursive algorithm be written as an iterative algorithm? 3) H ...

What is the name of the text file on a windows computer

What is the name of the text file on a Windows computer that may store DNS to IP address mappings?

Describe a study you might conduct in which it would be

Describe a study you might conduct in which it would be appropriate to compute a Pearson r(i.e., a study with one group of participants with two scores per participant). Predict if you perceive that the r-value would be ...

A confidence interval for a population mean is to be

A confidence interval for a population mean is to be estimated. The population standard deviation is guessed to be anywhere from 14 to 24. The half-width B desired could be anywhere from 2 to 7. Tabulate the minimum samp ...

In defining demand and supply why do economists focus on

In defining demand and supply, why do economists focus on price while holding constant other factors that might have an impact on the behavior of buyers and sellers?

Consider two computer companies - orange and ph - that

Consider two computer companies - Orange and PH - that report current sales receipts of $323 million and $294 million, respectively. Their cur-rent operating expenses were $150 million each. Orange issued $5 million in n ...

What is unified threat management utm and the services it

What is Unified Threat Management (UTM) and the services it combines into one device. Does UTM holds true to the principle of defense-in-depth

Be as specific as you can for the following questions

Be as specific as you can for the following questions, explain each concept in detail and how you arrive at your solutions. This is JUST as important as your actual answers. You are provided a unit square domain where th ...

Question a sequential circuit with two d flip-flops a and b

Question : A sequential circuit with two D flip-flops A and B, one input X and one output Y is specified by the following equations: Da = X'(A+B), Db = A'B, Y = A'XB' (a) Draw the circuit diagram. (b) Derive the state ta ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As