Ask Homework Help/Study Tips Expert

Big Data Assignment -

Regression Models - Regression models are concerned with target variables that can take any real value. The underlying principle is to find a model that maps input features to predicted target variables. Regression is also a form of supervised learning.

Regression models can be used to predict just about any variable of interest. A few examples include the following:

  • Predicting stock returns and other economic variables
  • Predicting loss amounts for loan defaults (this can be combined with a classification model that predicts the probability of default, while the regression model predicts the amount in the case of a default)
  • Recommendations (the Alternating Least Squares factorization model from Chapter 5, Building a Recommendation Engine with Spark, uses linear regression in each iteration)
  • Predicting customer lifetime value (CLTV) in a retail, mobile, or other business, based on user behavior and spending patterns

In the different sections of this chapter, we will do the following:

Introduce the various types of regression models available in ML

  • Explore feature extraction and target variable transformation for regression models
  • Train a number of regression models using ML
  • Building a Regression Model with Spark
  • See how to make predictions using the trained model
  • Investigate the impact on performance of various parameter settings for regression using cross-validation

Types of regression models - The core idea of linear models (or generalized linear models) is that we model the predicted outcome of interest (often called the target or dependent variable) as a function of a simple linear predictor applied to the input variables (also referred to as features or independent variables).

y = f(wTx)

Here, y is the target variable, w is the vector of parameters (known as the weight vector), and x is the vector of input features. wTx is the linear predictor (or vector dot product) of the weight vector w and feature vector x. To this linear predictor, we applied a function f (called the link function). Linear models can, in fact, be used for both classification and regression simply by changing the link function. Standard linear regression uses an identity link (that is, y = wTx directly), while binary classification uses alternative link functions as discussed here.

Spark's ML library offers different regression models, which are as follows:

  • Linear regression
  • Generalized linear regression
  • Logistical regression
  • Decision trees
  • Random forest regression
  • Gradient boosted trees
  • Survival regression
  • Isotonic regression
  • Ridge regression

Regression models define the relationship between a dependent variable and one or more independent variables. It builds the best model that fits the values of independent variables or features.

Linear regression unlike classification models such as support vector machines and logistic regression is used for predicting the value of a dependent variable with generalized value rather than predicting the exact class label.

Linear regression models are essentially the same as their classification counterparts, the only difference is that linear regression models use a different loss function, related link function, and decision function. Spark ML provides a standard least squares regression model (although other types of generalized linear models for regression are planned).

Assignment -

1. Utilising Python 3 Build the following regression models:

  • Decision Tree
  • Gradient Boosted Tree
  • Linear regression

2. Select a dataset (other than the example dataset given in section 3) and apply the Decision Tree and Linear regression models created above. Choose a dataset from Kaggle.

3. Build the following in relation to the gradient boost tree and the dataset choosen in step 2

  • Gradient boost tree iterations
  • Gradient boost tree Max Bins

4. Build the following in relation to the decision tree and the dataset choosen in step 2

  • Decision Tree Categorical features
  • Decision Tree Log
  • Decision Tree Max Bins
  • Decision Tree Max Depth

5. Build the following in relation to the linear regression and the dataset choosen in step 2

a) Linear regression Cross Validation

  • Intercept
  • Iterations
  • Step size
  • L1 Regularization
  • L2 Regularization

b) Linear regression Log (see section 5.4)

6. Follow the provided example of the Bike sharing data set and the guide lines in the sections that follow this section to develop the requirements given in steps 1, 3, 4 and 5.

Attachment:- Assignment Files.rar

Homework Help/Study Tips, Others

  • Category:- Homework Help/Study Tips
  • Reference No.:- M93104357
  • Price:- $60

Priced at Now at $60, Verified Solution

Have any Question?


Related Questions in Homework Help/Study Tips

Review the website airmail service from the smithsonian

Review the website Airmail Service from the Smithsonian National Postal Museum that is dedicated to the history of the U.S. Air Mail Service. Go to the Airmail in America link and explore the additional tabs along the le ...

Read the article frank whittle and the race for the jet

Read the article Frank Whittle and the Race for the Jet from "Historynet" describing the historical influences of Sir Frank Whittle and his early work contributions to jet engine technologies. Prepare a presentation high ...

Overviewnow that we have had an introduction to the context

Overview Now that we have had an introduction to the context of Jesus' life and an overview of the Biblical gospels, we are now ready to take a look at the earliest gospel written about Jesus - the Gospel of Mark. In thi ...

Fitness projectstudents will design and implement a six

Fitness Project Students will design and implement a six week long fitness program for a family member, friend or co-worker. The fitness program will be based on concepts discussed in class. Students will provide justifi ...

Read grand canyon collision - the greatest commercial air

Read Grand Canyon Collision - The greatest commercial air tragedy of its day! from doney, which details the circumstances surrounding one of the most prolific aircraft accidents of all time-the June 1956 mid-air collisio ...

Qestion anti-trustprior to completing the assignment

Question: Anti-Trust Prior to completing the assignment, review Chapter 4 of your course text. You are a manager with 5 years of experience and need to write a report for senior management on how your firm can avoid the ...

Question how has the patient and affordable care act of

Question: How has the Patient and Affordable Care Act of 2010 (the "Health Care Reform Act") reshaped financial arrangements between hospitals, physicians, and other providers with Medicare making a single payment for al ...

Plate tectonicsthe learning objectives for chapter 2 and

Plate Tectonics The Learning Objectives for Chapter 2 and this web quest is to learn about and become familiar with: Plate Boundary Types Plate Boundary Interactions Plate Tectonic Map of the World Past Plate Movement an ...

Question critical case for billing amp codingcomplete the

Question: Critical Case for Billing & Coding Complete the Critical Case for Billing & Coding simulation within the LearnScape platform. You will need to create a single Microsoft Word file and save it to your computer. A ...

Review the cba provided in the resources section between

Review the CBA provided in the resources section between the Trustees of Columbia University and Local 2110 International Union of Technical, Office, and Professional Workers. Describe how this is similar to a "contract" ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As