Assume p » n (many more predictor variables than observations), you have design matrix X and quantitative response vector y, and you plan to fit linear regression model.
(a) Describe why ordinary least squares solution is not unique. What can you say about residuals of any solution?
(b) Is ridge regression solution unique? Explain why or why not?
(c) Assume you calculate the series of ridge solutions βˆ(λ) for X and y, letting λ get mono-tonically smaller. What can you say about limiting ridge solution as λ ↓ 0?