Ask DBMS Expert


Home >> DBMS

Project Tasks

Task 1: Analytic Objective

Individual groups are expected to come up with an analytic objective for which they are to utilize the knowledge and application of pattern discovery and predictive modelling using the SAS enterprise mining software. A well drafted business case will help you understand your data set; identify variable roles and measurement levels and ultimately your choice or method for doing your analytics.

An example of your analytic objective could take this form:

"A radio station wants to analyze the use of Web services such as simulcasts, podcasts, news streams, music streams, archives, and live Web music to see whether any unusual patterns exist in the combinations of services selected by its Web users. In this case study, you perform an association analysis"

Note: Individual groups are encouraged to come up with different Analytic objectives. No two (2) groups should have the same. Each group should attempt pattern discovery and predictive modelling using the assigned data set for this exercise.

Task 2: Data Analysis and Definition  

Prepare in tabulated form the data dictionary which defines the variables as they appear in your data set as well as the model roles and Measurement levels. An example can be seen below.

Name

Model

Role

Measurement

Level

Description

STOREID

       ID

     Nominal

                 Identification number of the store

 Tip 1: Execute the following steps in SAS Enterprise Miner

(i). Create a project with your group and group number as its name.

(ii). Create a library.

(iii). Create a data source by defining the data set (the one assigned to you) as a data source.

(iv). Determine whether the variable roles and measurement levels assigned to the variables are appropriate. The variable roles and measurement levels should match with the values in the data definition table above. Examine the distribution of the variables

2.1 Answer the following Questions.

1. Are there any unusual data values in any of your assigned input variables? Support your answer with appropriate argument.

2. List  two  possible  strategies  to  handle  cases  with  unusual  values  before  attaching your desired analysis node? Explain the possible scenarios in which those strategies are appropriate.

3. Are there missing values in any of the input variables?

4. If you assigned a variable a rejected role, why is this case?

Task 3: Cluster and Association analysis

For groups requiring running Cluster or Association Analysis the following tips should help you and the questions should be responded to.

Tip 2: Execute the following steps in SAS Enterprise Miner

(v). Add your data source to the diagram workspace.

(vi). Add a  Cluster node to the diagram workspace and connect it to the  data source node.

(vii). Select the Cluster node and select  Internal Standardization - Standardization.

(viii).  Specify a maximum of six clusters and run the diagram from the Cluster node.

(ix). Add a  Segment Profile node to the diagram workspace and connect it to the  Cluster node.

(x). Run the diagram from the Segment Profile node.

3.1      Answer the following Questions.

5. What would happen if you did not standardize your inputs?

6. Using the results of the Segment Profile node, interpret the characteristics of the first three biggest clusters.

7. Why was cluster analysis chosen?

Tip 3: Execute the following steps in SAS Enterprise Miner

(i). Create a new diagram and Name the diagram (Name of your dataset).

(ii). Create a new data source using the data set.

(iii). Assign the variable roles to the variable.

(iv). Add the node for the data set and an Association node to the diagram.

(v). Change the setting for Export Rule by ID to Yes.

(vi). Leave the remaining default settings for the Association node and run the analysis.

3.2 Answer the following Questions.

1. What is the highest lift value for the resulting rules?

2. Which rule has this value?  

3. Why was an Association Analysis run?

Task 4: Predictive Modeling

For groups requiring running their analysis with decision trees, regression and neural networks the following tips should help you and the questions should be responded to

Tip 4: Decision trees - Execute the following steps in SAS Enterprise Miner

(i). Create a new diagram named Predictive Analysis in your project

(ii). Define the data set as a data source for the project. Set the roles for the analysis variables as shown above.

(iii). Add the data set to the diagram workspace.

(iv). Add a  Data Partition node to the diagram and connect it to the Data Source node. Assign 50% of the data for training and 50% for validation.

(v). Add a Decision Tree node to the workspace and connect it to the  Data Partition node.

(vi). Create a decision tree model autonomously using average squared error as the model assessment statistic.

(vii). Add a second  Decision Tree node to the diagram and connect it to the  Data Partition node.

(viii).  In the  Properties panel of the new  Decision Tree node, change the maximum number of branches from a node to 3 to allow for three-way splits.

(ix). Create a second decision tree model autonomously using average squared error as the model assessment statistic.

4.1 Answer the following Questions.

1. Why was the Target Variable assigned that variable role?  

3. How many leaves are there in the optimal tree created in step (vi)? Which variable was used for the first split and explain why this variable was chosen over others?

4. How many leaves are there in the optimal tree created in step (ix)?

5. Which of the decision tree models appears to be better  

a.   based on average squared error on training data?

b. based on average squared error on validation data?

Tip 5: Regression - Execute the following steps in SAS Enterprise Miner

(x). Attach the  StatExplore tool to the data source and run it. View the results of the StatExplore tool and determine if any of the variables have missing values.

(xi). Add an  Impute node to the diagram and connect it to the  Data Partition node. Set the node to impute  U for unknown class variable values and the overall mean for unknown interval variable values. Create imputation indicators for all imputed inputs.

(xii). Add a  Regression node to the diagram and connect it to the  Impute node. Choose the stepwise selection and average squared error as the selection criterion.  Run the Regression node and view the results.

(xiii). Disconnect the  Impute node from the  Data Partition node. Add a  Transform Variables node to the diagram and connect it to the  Data Partition node. Connect the  Transform Variables node to the Impute node.

(xiv). Apply  a  log  transformation  to  the  DemAffl  and  PromTime  inputs  and  Run  the Transform Variables node.

(xv). Rerun the Regression node.

4.2 Answer the following Questions.

6. In preparation for regression, is any missing values imputation needed? If yes, should you do this imputation before generating the decision tree models? Why or why not?  

7. Which variables are included in the final regression model generated in step (xii)? List the variables in the descending order of importance to the model.

8. Which variables are included in the final regression model generated in the last step?

9. Based on average squared error on the validation data, which of the two regression models generated appear to be better?

Tip 6 : Neural Networks - Execute the following steps in SAS Enterprise Miner

(xvi).   Add a  Neural Network tool to the diagram. Connect the  Impute node to the Neural Network node.

(xvii). Set the model selection criterion to average squared error. Run the Neural Network node.

4.3 Answer the following Questions.

10. How many weights does the neural network model generated in step (xvii) include?

11. Examine the validation average squared error of the neural network model. How does it compare  to  the  two  decision  tree  models  and  the  regression  model  generated  after applying log transformation?

Task 5: Compare your models

Execute the following steps in SAS Enterprise Miner

(xviii). Add a  Model Comparison node to the diagram. Connect it to all the predictive models generated in the earlier steps.

(xix).   Run the Model Comparison node.

4.4 Answer the following Questions.

12. Examine the results of the Model Comparison node. Of the predictive models compared which model has been selected by the Model Comparison node? Based on what selection criteria this model has been selected?  

13. Change the default values of the Model Comparison node properties so that it selects the model having the least average squared error on the validation data. Run the Model Comparison node again. Which model has been selected now?  

14. Why are the models compared

Task 6: Business Implication

1. From the outcome of your analysis of the data set and the business case you have come up with, what can you deduce, recommend and conclude.

2. What is the business implications that can be drawn from the process of building and comparing these models, and has this practice helped resolve the business issue? Why or why not?

DBMS, Programming

  • Category:- DBMS
  • Reference No.:- M9741920

Have any Question?


Related Questions in DBMS

Data mining assignment -in this assignment you are asked to

Data Mining Assignment - In this assignment you are asked to explore the use of neural networks for classification and numeric prediction. You are also asked to carry out a data mining investigation on a real-world data ...

Sql query assignment -for this assignment you are to write

SQL Query Assignment - For this assignment you are to write your answers in a word document. This assignment is in three parts: Part A (reporting queries), Part B (query performance), Part C (query design). For this assi ...

The groceries datasetimagine 10000 receipts sitting on your

The groceries Dataset Imagine 10000 receipts sitting on your table. Each receipt represents a transaction with items that were purchased. The receipt is a representation of stuff that went into a customer's basket. That ...

You are in a real estate business renting apartments to

You are in a real estate business renting apartments to customers. Your job is to define an appropriate schema using SQL DDL in MySQL. The relations are Property(Id, Address, NumberOfUnits), Unit(ApartmentNumber, Propert ...

Objectivethe objective of this lab is to be familiar with a

OBJECTIVE: The objective of this lab is to be familiar with a process in big data modeling. You're required to produce three big data models using the MS PowerPoint software. This tool is available on UMUC Virtual Deskto ...

The relation memberstudentid organizationid roleid stores

The relation Member(StudentId, OrganizationId, RoleId) stores the membership information of student joining organization. For example, ('S1', 'O2', 'R3') indicates that student with Id 'S1' joined the organization with i ...

Relational database exerciseyou have been assigned to a new

Relational Database Exercise: You have been assigned to a new development team. A client is requesting a relational database system to manage their present store with the anticipation of adding more stores in the future. ...

Relational database design a given the following business

Relational Database Design A) Given the following business rules, identify entity types, attributes (at least two attributes for each entity, including the primary key) and relationships, and then draw an Entity-Relation ...

We can represent a data set as a collection of object nodes

We can represent a data set as a collection of object nodes and a collection of attribute nodes, where there is a link between each object and each attribute, and where the weight of that link is the value of the object ...

Data model development and implementationpurpose of the

Data model development and implementation Purpose of the assessment (with ULO Mapping) The purpose of this assignment is to develop data models and map Database System into a standard development environment to gain unde ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As