Ask Question, Ask an Expert

+61-413 786 465

info@mywordsolution.com

Ask Management Information System Expert

Data Engineering and Mining

Part I:

For this part, you need to explore the bank data (bankdata_csv_all.csv), available on the LMS, and an accompanying description (bankdataDescription.doc) of the attributes and their values. The dataset contains attributes on each person's demographics and banking information in order to determine they will want to obtain the new PEP (Personal Equity Plan).

Your goal is to perform Association Rule discovery on the dataset using R.

First perform the necessary preprocessing steps required for association rule mining, specifically the id field needs to be removed and a number of numeric fields need discretization or otherwise converted to nominal.

Next, set PEP as the right hand side of the rules, and see what rules are generated.

Select the top 5 most "interesting" rules and for each specify the following:

• Support, Confidence and Lift values

• An explanation of the pattern and why you believe it is interesting based on the business objectives of the company.

• Any recommendations based on the discovered rule that might help the company to better understand behavior of its customers or to develop a business opportunity.

Note that the top 5 most interesting rules are most likely not the top 5 in the strong rules. They are rules, that in addition to having high lift and confidence, also provide some non-trivial, actionable knowledge based on underlying business objectives.

To complete this assignment, write a short report describing your association rule mining process and the resulting 5 interesting rules, each with their three items of explanation and recommendations. For at least one of the rules, discuss the support, confidence and lift values and how they are interpreted in this data set.

You should write your answers as if you are working for a client who knows little about data mining. Your report should give your client some insightful and reliable suggestions on what kinds of potential buyers your client should contact, and convince your client that your suggestions are reliable based on the evidence gathered from your experiment results.

In more detail, your answers should include:

• Description of preprocessing steps

• Description of parameters and experiments in order to obtain strong rules

• Give the top 5 most interesting rules and the 3 items listed above for each rule.

Part II:

In this part of homework, you are expected to apply decision tree induction algorithm to solve a mystery in history: who wrote the disputed essays, Hamilton or Madison?

1. About the Federalist Papers

Quote from the Library of Congress

The Federalist Papers were a series of eighty-five essays urging the citizens of New York to ratify the new United States Constitution. Written by Alexander Hamilton, James Madison, and John Jay, the essays originally appeared anonymously in New York newspapers in 1787 and 1788 under the pen name "Publius."

A bound edition of the essays was first published in 1788, but it was not until the 1818 edition published by the printer Jacob Gideon that the authors of each essay were identified by name. The Federalist Papers are considered one of the most important sources for interpreting and understanding the original intent of the Constitution.

2. About the disputed authorship

The original essays can be downloaded from the Library of Congress.

In the author column, you will find 74 essays with identified authors: 51 essays written by Hamilton, 15 by Madison, 3 by Hamilton and Madison, 5 by Jay. The remaining 11 essays, however, is authored by "Hamilton or Madison". These are the famous essays with disputed authorship. Hamilton wrote to claim the authorship before he was killed in a duel. Later Madison also claimed authorship. Historians were trying to find out which one was the real author.

3. Computational approach for authorship attribution

In 1960s, statistician Mosteller and Wallace analyzed the frequency distributions of common function words in the Federalist Papers, and drew their conclusions. This is a pioneering work on using mathematical approaches for authorship attribution.

Nowadays, authorship attribution has become a classic problem in the data mining field, with applications in forensics (e.g. deception detection), and information organization.

The Federalist Paper data set (fedPapers85.csv) is provided in LMS. The features are a set of "function words", for example, "upon". The feature value is the percentage of the word occurrence in an essay. For example, for the essay "Hamilton_fed_31.txt", if the function word "upon" appeared 3 times, and the total number of words in this essay is 1000, the feature value is 3/1000=0.3%

Organize your report using the following template:

Section 1: Data preparation

You will need to separate the original data set to training and testing data for classification experiments. Describe what examples in your training and what in your test data.

Section 2: Build and tune decision tree models

First build a DT model using the default setting, and then tune the parameters to see if better model can be generated. Compare these models using appropriate evaluation measures. Describe and compare the patterns learned in these models.

Section 3: Prediction

After building the classification model, apply it to the disputed papers to find out the authorship and report the performance accuracy of your models.

Management Information System, Management Studies

  • Category:- Management Information System
  • Reference No.:- M92801450

Have any Question?


Related Questions in Management Information System

Review the cloudcrushers business scenario to understand

Review the "CloudCrushers Business Scenario" to understand the various components needed to ensure connectivity. Prepare a 1.5 page paper describing these components, as well as how the types of connectivity, such as Eth ...

Research projecton march 11th 2011 stackoverflow made a

Research Project On March 11th, 2011, StackOverflow made a significant change to their leaderboard. They now show users with top reputation gained in the past week, rather than all-time high reputation users. The rationa ...

Healthcare delivery systemsassignment health services

Healthcare Delivery Systems Assignment: Health Services Professionals Using the Word chart on page 2 of this assignment, describe the major types of health services professionals including key roles and training, practic ...

Question what are some of the key factors to consider when

Question : What are some of the key factors to consider when gathering requirements for a network? What difficulties might you encounter in this process, and how might you mitigate the problems? Discuss some methods you ...

Management information systems assignment -select an

Management Information Systems Assignment - Select an organization (may be the organization you work) and analyze the Information System/s used by the organization considering following aspects. A brief introduction/ exp ...

Promptin our final assignment you will create an 8-10 slide

Prompt In our final assignment, you will create an 8-10 slide PowerPoint presentation examining the topic you selected at the procon.org website. Step One - Present and evaluate the premises and conclusions that support ...

Topics in information technology ethics assignment

Topics in Information Technology Ethics Assignment - Learning outcomes - On successful completion of this subject, you should: be able to identify ethical issues related to ICT; be able to assess the implications of ethi ...

Differentiate between the browser object model and the

Differentiate between the browser object model and the document object (DOM) model. Recommend three (3) DOM methods that you believe are essential to an effective and efficient Website. List and describe three (3) DOM ob ...

Using microsoft project or other similar software create a

Using Microsoft Project or other similar software create a GANTT chart for a hypothetical project that involves at least 7 tasks, and two milestones. In your posting, include enough support material to describe the proje ...

Using an organization of your choicedevelop a complete

Using an organization of your choice: Develop a Complete Disaster Recovery Plan to be submitted to the executive board of your company. Please note that this is a formal writing, all references (peer-reviewed) must be ci ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As