Ask Homework Help/Study Tips Expert

Purpose

In this assessment, you need to demonstrate your skills for applying regularized logistic regression to perform two-class and multi-class classification for real-world tasks. You also need to demonstrate your skill in recognizing under-fitting/overfitting situations

Instructions

This is group assessment task. Students will be required to analyse a given real-world scenario and contribute to the classifier design.

The group response to problem solution should not exceed 30 pages. Students will be required to consolidate their individual solutions and propose best solution that evidences each group member's contribution along with a rationale for the group's response to solving the problem.

Task A - Binary Classification

For this problem, we will use a subset of here. Note that this dataset has some information missing.

1.1 Data Munging

Cleaning the data is essential when dealing with real world problems. Training and testing data is stored in "data/wisconsin_data" folder. You have to perform the following:

- Read the training and testing data. Print the number of features in the dataset.

- For the data label, print the total number of 1's and 0's in the training and testing data. Comment on the class distribution. Is it balanced or unbalanced?

- Print the number of features with missing entries.

- Fill the missing entries. For filling any feature, you can use either mean or median value of the feature values from observed entries.

- Normalize the training and testing data.

1.2 Logistic Regression Train logistic regression models with L1 regularization and L2 regularization using alpha = 0.1

and lambda = 0.1. Report accuracy, precision, recall, f1-score and print the confusion matrix.
1.3 Choosing the best hyper-parameter
For L1 model, choose the best alpha value from the following set:

{0.1,1,3,10,33,100,333,1000, 3333, 10000, 33333}.

For L2 model, choose the best lambda value from the following set:

{0.001, 0.003, 0.01, 0.03, 0.1,0.3,1,3,10,33}.

To choose the best hyperparameter (alpha/lambda) value, you have to do the following:

- For each value of hyperparameter, perform 100 random splits of training data into training and validation data.

- Find the average validation accuracy for each 100 train/validate pairs. The best hyperparameter will be the one that gives maximum validation accuracy. Use the best alpha and lambda parameter to re-train your final L1 and L2 regularized model. Evaluate the prediction performance on the test data and report the following:

- Precision

- Accuracy

- The top 5 features selected in decreasing order of feature weights.

- Confusion matrix

Finally, discuss if there is any sign of underfitting or overfitting with appropriate reasoning.

Task B Multiclass Classification

For this experiment, we will use a small subset of MNIST dataset for handwritten digits. This dataset has no missing data. You will have to implement one-versus-rest scheme to perform multi-class classification using a binary classifier based on L1 regularized logistic regression.

2.1 Read and understand the data, create a default One-vs-Rest Classifier
1- Use the data from the file reduced_mnist.csv in the data directory. Begin by reading the data. Print the following information:

- Number of data points

- Total number of features

- Unique labels in the data

2- Split the data into 70% training data and 30% test data. Fit a One-vs-Rest Classifier (which uses Logistic regression classifier with alpha=1) on training data, and report accuracy, precision, recall on testing data.

2.2 Choosing the best hyper-parameter

1- As in section 1.3 above, now create 10 random splits of training data into training and validation data. Choose the best value of alpha from the following set: {0.1, 1, 3, 10, 33, 100, 333, 1000, 3333, 10000, 33333}. To choose the best alpha hyperparameter value, you have to do the following:

- For each value of hyperparameter, perform 10 random splits of training data into training and validation data as said above.

- For each value of hyperparameter, use its 10 random splits and find the average training and validation accuracy.

- On a graph, plot both the average training accuracy (in red) and average validation accuracy (in blue) w.r.t. each hyperparameter setting. Comment on this graph by identifying regions of overfitting and underfitting.

- Print the best value of alpha hyperparameter.

2- Evaluate the prediction performance on test data and report the following:

- Total number of non-zero features in the final model.

- The confusion matrix

- Precision, recall and accuracy for each class.

Finally, discuss if there is any sign of underfitting or overfitting with appropriate reasoning

Attachment:- Machine learning.zip

Homework Help/Study Tips, Others

  • Category:- Homework Help/Study Tips
  • Reference No.:- M93096160
  • Price:- $80

Guranteed 48 Hours Delivery, In Price:- $80

Have any Question?


Related Questions in Homework Help/Study Tips

Review the website airmail service from the smithsonian

Review the website Airmail Service from the Smithsonian National Postal Museum that is dedicated to the history of the U.S. Air Mail Service. Go to the Airmail in America link and explore the additional tabs along the le ...

Read the article frank whittle and the race for the jet

Read the article Frank Whittle and the Race for the Jet from "Historynet" describing the historical influences of Sir Frank Whittle and his early work contributions to jet engine technologies. Prepare a presentation high ...

Overviewnow that we have had an introduction to the context

Overview Now that we have had an introduction to the context of Jesus' life and an overview of the Biblical gospels, we are now ready to take a look at the earliest gospel written about Jesus - the Gospel of Mark. In thi ...

Fitness projectstudents will design and implement a six

Fitness Project Students will design and implement a six week long fitness program for a family member, friend or co-worker. The fitness program will be based on concepts discussed in class. Students will provide justifi ...

Read grand canyon collision - the greatest commercial air

Read Grand Canyon Collision - The greatest commercial air tragedy of its day! from doney, which details the circumstances surrounding one of the most prolific aircraft accidents of all time-the June 1956 mid-air collisio ...

Qestion anti-trustprior to completing the assignment

Question: Anti-Trust Prior to completing the assignment, review Chapter 4 of your course text. You are a manager with 5 years of experience and need to write a report for senior management on how your firm can avoid the ...

Question how has the patient and affordable care act of

Question: How has the Patient and Affordable Care Act of 2010 (the "Health Care Reform Act") reshaped financial arrangements between hospitals, physicians, and other providers with Medicare making a single payment for al ...

Plate tectonicsthe learning objectives for chapter 2 and

Plate Tectonics The Learning Objectives for Chapter 2 and this web quest is to learn about and become familiar with: Plate Boundary Types Plate Boundary Interactions Plate Tectonic Map of the World Past Plate Movement an ...

Question critical case for billing amp codingcomplete the

Question: Critical Case for Billing & Coding Complete the Critical Case for Billing & Coding simulation within the LearnScape platform. You will need to create a single Microsoft Word file and save it to your computer. A ...

Review the cba provided in the resources section between

Review the CBA provided in the resources section between the Trustees of Columbia University and Local 2110 International Union of Technical, Office, and Professional Workers. Describe how this is similar to a "contract" ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As