Ask Statistics and Probability Expert

10-701 Machine Learning - Spring 2012 - Problem Set 2

Q1. Logistic regression

1.1 Logistic vs linear regression-

In both logistic (LR) and linear regressions (R), given input X, the goal is to predict the response Y. The difference is that LR is typically used for classification whereas R is used for regression.

1. Propose a simple modification to R that makes it amenable to classification (instead of regression) tasks. Comment on whether such a proposal is superior than LR or not and briefly explain why.

2. Recall in R where y = wx (w being the estimated linear coefficient for a one-dimensional variable x), a unit change in x would induce a multiplicative w change in y. In LR, y and wx are linked by the sigmoid function. Explain how you would interpret the w coefficients in logistic regression. Suppose w = 2, calculate the change in the odds of the classes induced by a unit change in x, assuming there are two available classes.

1.2 Logistic vs naive Bayes-

Suppose in a binary classification problem, the input variable X = [x1, ..., xM] is M-dimensional and the response variable Y is a class indicator (0 or 1). In this section, you will work in steps to establish a connection between logistic regression and Gaussian naive Bayes.

1. Write down expressions of the class conditional probability for each class, P(Y = 1|X) and P(Y = 0|X), for logistic regression.

2. Using the Bayes rule, derive the posterior probabilities for each class, P(Y = 1|X) and P(Y = 0|X), for naive Bayes.

3. Assuming a Gaussian likelihood function in each of N dimensions, write down the full likelihood f(X|Y ) for naive Bayes.

4. Assuming a uniform prior on the two classes and using the results from part 2 and 3, derive a full expression for P(Y = 1|X) for naive Bayes.

5. Show that with appropriate manipulation and parameterization, P(Y = 1|X) in naive Bayes from part 4 is equivalent to P(Y = 1|X) for logistic regression in part 1.

1.3 Loss function-

Write down the loss function, or the negative log likelihood, for logistic regression. Denote y as the class indicator, x as the predictor vector, w as the coefficient vector and N as the number of data points. Derive the derivative of the loss function with respect to w (hint: first derive the derivative of the sigmoid function σ(u) with respect to a generic input u.).

Q2. Learning theory

 2.1 PAC learning-

Imagine yourself as an apprentice chef in a restaurant. Your first task is to figure out how to make a salad. The rules are supposedly simple: 1) you are free to combine any of the ingredients as they are 2) you can also slice any of the ingredients into two distinct pieces before mixing them. Since you have learnt PAC learning theory, you wonder how much effort you would need to figure out the makeup in a salad.

1. Suppose that a naive chef makes salads following only rule 1. Given N available ingredients and that each salad made out of these constitutes a distinct hypothesis. How large would the hypothesis space be? Explain how you arrive at your answer.

2. Suppose that a more experienced chef follows both rules when making a salad. How large is the hypothesis space now? Explain.

3. An experienced chef decides to train you to discern the makeup of a salad by showing you the salad samples he has made. There are 6 available ingredients. If you would like to learn any salad at 0.01 error with probability 99%, how many sample salads would you want to see? Show your workings in clear steps.

2.2 VC dimensions-

 Consider a 2D space or x1-x2 plane. What is the VC dimension of circles where points inside are labeled as 1's and those outside as 0's? Draw an example scenario with minimal number of points where these circles would fail to shatter the space.

Q3. Mistake bounds

Suppose you have a team of N robot agents and you wish to train them to help you make predictions in life. As a simple start, the prediction will be based on majority votes from these agents. To assure that they won't fail you in crucial tasks, you went on to analyze their prediction mistakes. Soon you find that their mistakes are curiously related to that of the best agent...

Your strategy of training these agents on a binary classification problem is as follows:

1. Initialize all robots with equal weight wi = 1 for i = 1...N.

2. Since each robot makes a prediction of either class (yi = 0 or 1), the ensemble prediction follows the weighted majority and predicts 1 if

i=1NwiI(yi = 1) ≥ i=1NwiI(yi = 0)                                    (1)

and otherwise 0, where I(·) is the indicator function and equals 1 if its argument is true.

3. If any robot makes a mistake, you penalize them by reducing their weights by a half.

4. Go to step 2.

You discover that your best agent makes Mb prediction mistakes while the ensemble agent makes Me mistakes. You are now going to figure out how these numbers are related. In other words, you are going to find an upper bound for Me in terms of Mb.

1. What would the weight of the best agent be after making Mb prediction errors and why? Let's denote it as wb.

2. What is the maximal ensemble weight (∑wi) after making Me errors? Let's denote it as Wmax.

3. Write a simple inequality that relates part 1 (wb) and 2 (Wmax).

4. Using the equality in part 3 and your solutions to part 1 and 2, derive an upper bound for the ensemble mistake Me in terms of the mistake from the best agent Mb.

Q4. Guess the lean animal

In the animal kingdom, there are lean candidates such as the monkey and chubby ones such as the giant panda. In this section, you will develop a classifier that predicts whether an animal is lean or not given some of its properties.

1. Load "LeanAnimals.mat" in MATLAB. You should see the following: "names" for animals, "properties" for properties of animals, "labs" for indicators of leanness and "D" for relational matrix (animals vs properties). Sort "labs" into groups of 0s and 1s and sort the rows in D accordingly. How many animals are lean? Plot the sorted matrix D in black and white using imagesc command. Formulate the problem using logistic regression given that the goal is to predict whether an animal is lean or not. Specify clearly your inputs and outputs, and write down the expressions for the class conditional probabilities.

2. Write a generic logistic regression classifier. Attach your MATLAB codes for the LR classifier ONLY in compact format in the writeup.

3. Apply your LR classifier to the data and perform leave-one-out crossvalidated (LOOCV) predictions on the animals. In other words, at each round you would first train the classifier on 49 animals, and predict whether the held-out animal is lean or not. Report your classification accuracies for the lean and non-lean classes in percentage separately.

4. Now, instead of LOOCV, fit your classifier on the entire dataset "D". This should return a single set of weights. List them and interpret them for properties 2 to 6-do they make sense, why or why not? The annotation for property 1 is missing-can you guess what property it might be given your estimated weight? [Note: no credits would be deducted or granted here so don't agonize if you are stuck.]

5. Using the "corr" function in MATLAB, compute the correlation coefficients between each of the properties and "labs"-tabulate these. Do these matches well with your estimated weights in part 6-explain briefly why or why not.

6. From your outputs in part 4, you should be able to compute p(lean|animal) for each animal. Rank the animals by sorting their class conditionals in descending orders. You should produce a table that consist of two columns: Animal Name (sorted), Conditional Probability (p(lean-animal)). Is this how you would sort these animals?

Attachment:- Data.rar

Statistics and Probability, Statistics

  • Category:- Statistics and Probability
  • Reference No.:- M91839177
  • Price:- $50

Priced at Now at $50, Verified Solution

Have any Question?


Related Questions in Statistics and Probability

Introduction to epidemiology assignment -assignment should

Introduction to Epidemiology Assignment - Assignment should be typed, with adequate space left between questions. Read the following paper, and answer the questions below: Sundquist K., Qvist J. Johansson SE., Sundquist ...

Question 1 many high school students take the ap tests in

Question 1. Many high school students take the AP tests in different subject areas. In 2007, of the 144,796 students who took the biology exam 84,199 of them were female. In that same year,of the 211,693 students who too ...

Basic statisticsactivity 1define the following terms1

BASIC STATISTICS Activity 1 Define the following terms: 1. Statistics 2. Descriptive Statistics 3. Inferential Statistics 4. Population 5. Sample 6. Quantitative Data 7. Discrete Variable 8. Continuous Variable 9. Qualit ...

Question 1below you are given the examination scores of 20

Question 1 Below you are given the examination scores of 20 students (data set also provided in accompanying MS Excel file). 52 99 92 86 84 63 72 76 95 88 92 58 65 79 80 90 75 74 56 99 a. Construct a frequency distributi ...

Question 1 assume you have noted the following prices for

Question: 1. Assume you have noted the following prices for paperback books and the number of pages that each book contains. Develop a least-squares estimated regression line. i. Compute the coefficient of determination ...

Question 1 a sample of 81 account balances of a credit

Question 1: A sample of 81 account balances of a credit company showed an average balance of $1,200 with a standard deviation of $126. 1. Formulate the hypotheses that can be used to determine whether the mean of all acc ...

5 of females smoke cigarettes what is the probability that

5% of females smoke cigarettes. What is the probability that the proportion of smokers in a sample of 865 females would be greater than 3%

Armstrong faber produces a standard number-two pencil

Armstrong Faber produces a standard number-two pencil called Ultra-Lite. The demand for Ultra-Lite has been fairly stable over the past ten years. On average, Armstrong Faber has sold 457,000 pencils each year. Furthermo ...

Sppose a and b are collectively exhaustive in addition pa

Suppose A and B are collectively exhaustive. In addition, P(A) = 0.2 and P(B) = 0.8. Suppose C and D are both mutually exclusive and collectively exhaustive. Further, P(C|A) = 0.7 and P(D|B) = 0.5. What are P(C) and P(D) ...

The time to complete 1 construction project for company a

The time to complete 1 construction project for company A is exponentially distributed with a mean of 1 year. Therefore: (a) What is the probability that a project will be finished in one and half years? (b) What is the ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As