Ask Homework Help/Study Tips Expert

Questions -

Q1. The following dataset is created based on the fraud detection data discussed in class. An extra record (the last one) is added to the dataset. Also added is another predictor, AccountAge, which has three categories, <10, 10~30 and >30, referring to the number of days the account created. Using the Naïve Bayes method, calculate by hand the probabilities of the last record being truthful or fraudulent. Does the Naïve Bayes correctly classify this new record? Use all of the 11 records in your calculation. Show calculation steps similar to those in the Naïve Bayes lecture notes.

Transaction Time

Transaction Amount

Account Age

Class

night

small

>30

truthful

day

small

10~30

truthful

day

large

<10

truthful

day

large

>30

truthful

day

small

<10

truthful

day

small

>30

truthful

night

small

<10

fraudulent

night

large

10~30

fraudulent

day

large

>30

fraudulent

night

large

10~30

fraudulent

day

small

10~30

fraudulent

Q2. Download the data file CongressVote.arff. Open it with Notepad or WordPad and read the information about the data. Our task is to classify each record (i.e., a House member) to either a democrat or a republican based on his/her voting records. Note that this dataset has many missing values, labeled by '?'.

a. Run the Naïve Bayes classifier in Weka on the data, using the default parameters. What is the 10-fold cross-validation error rate? Show the output screen with the error rate and confusion matrix.

b. Run the k-nearest neighbor classifier in Weka on the data, using the default parameters. What is the 10-fold cross-validation error rate when k = 5? With all attributes categorical, how can the distances between records be measured? Explain this question using the following three records (which are records 27, 28 and 29 of the dataset). Which of the two records are closer to each other? Why?

y,n,y,n,n,n,y,y,y,n,y,n,n,n,y,y,democrat

y,y,y,n,n,n,y,y,y,n,y,n,n,n,y,y,democrat

y,n,n,y,y,n,y,y,y,n,n,y,y,y,n,y,republican

Q3. Download the BostonHousing2.xls file and read the data description. The dataset in the FullData sheet is taken from the BostonHousing.xls file used in Assignment 1. The target attribute is CATMEDV, which is a binary attribute converted from MEDV (which was removed).

a. Consider the data in the SmallData sheet, which includes the first 10 records of the full data and a subset of the original predictors. Calculate in Excel to classify record 6 (row 7, highlighted), using 1-NN and 3-NN respectively, based on the other 9 records. Show your results with Excel in a format similar to the screenshot on page 2 of the Nearest Neighbors lecture notes. Do 1-NN and 3-NN classify the record correctly?

b. Now, work on the FullData sheet. Within Excel, save the FullData sheet as a CSV file. Run k-NN in Weka on the CSV data file using the default parameters (10-fold cross-validation, k = 1). Show the output screen that displays the 10-fold cross-validation error rate and the related confusion matrix.

c. Run the C4.5 (J48) decision tree algorithm in Weka on the CSV data file created for Part (b) above. Show the output screen that displays the 10-fold cross-validation error rate and the related confusion matrix.

d. Which technique do you believe is better, k-NN or decision trees? Why? Please consider factors other than the error rates, which are about the same for the two techniques. (This is an open-ended question. It is more important to justify your choice than the choice itself.)

e. Now, back to the small data set with 10 records again. Save the data as a CSV file. Write and run R commands to classify record 6 (row 7), using 1-NN and 3-NN respectively, based on the other 9 records. Show the R commands and results (similar to those in the Nearest Neighbors lecture notes for the Admission example).

Attachment:- Assignment Files.rar

Homework Help/Study Tips, Others

  • Category:- Homework Help/Study Tips
  • Reference No.:- M92703726
  • Price:- $35

Guranteed 24 Hours Delivery, In Price:- $35

Have any Question?


Related Questions in Homework Help/Study Tips

Review the website airmail service from the smithsonian

Review the website Airmail Service from the Smithsonian National Postal Museum that is dedicated to the history of the U.S. Air Mail Service. Go to the Airmail in America link and explore the additional tabs along the le ...

Read the article frank whittle and the race for the jet

Read the article Frank Whittle and the Race for the Jet from "Historynet" describing the historical influences of Sir Frank Whittle and his early work contributions to jet engine technologies. Prepare a presentation high ...

Overviewnow that we have had an introduction to the context

Overview Now that we have had an introduction to the context of Jesus' life and an overview of the Biblical gospels, we are now ready to take a look at the earliest gospel written about Jesus - the Gospel of Mark. In thi ...

Fitness projectstudents will design and implement a six

Fitness Project Students will design and implement a six week long fitness program for a family member, friend or co-worker. The fitness program will be based on concepts discussed in class. Students will provide justifi ...

Read grand canyon collision - the greatest commercial air

Read Grand Canyon Collision - The greatest commercial air tragedy of its day! from doney, which details the circumstances surrounding one of the most prolific aircraft accidents of all time-the June 1956 mid-air collisio ...

Qestion anti-trustprior to completing the assignment

Question: Anti-Trust Prior to completing the assignment, review Chapter 4 of your course text. You are a manager with 5 years of experience and need to write a report for senior management on how your firm can avoid the ...

Question how has the patient and affordable care act of

Question: How has the Patient and Affordable Care Act of 2010 (the "Health Care Reform Act") reshaped financial arrangements between hospitals, physicians, and other providers with Medicare making a single payment for al ...

Plate tectonicsthe learning objectives for chapter 2 and

Plate Tectonics The Learning Objectives for Chapter 2 and this web quest is to learn about and become familiar with: Plate Boundary Types Plate Boundary Interactions Plate Tectonic Map of the World Past Plate Movement an ...

Question critical case for billing amp codingcomplete the

Question: Critical Case for Billing & Coding Complete the Critical Case for Billing & Coding simulation within the LearnScape platform. You will need to create a single Microsoft Word file and save it to your computer. A ...

Review the cba provided in the resources section between

Review the CBA provided in the resources section between the Trustees of Columbia University and Local 2110 International Union of Technical, Office, and Professional Workers. Describe how this is similar to a "contract" ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As