Ask Question, Ask an Expert

+61-413 786 465

Ask DBMS Expert

Home >> DBMS

The groceries Dataset

Imagine 10000 receipts sitting on your table. Each receipt represents a transaction with items that were purchased. The receipt is a representation of stuff that went into a customer's basket. That is exactly what the Groceries Data Set contains: a collection of receipts with each line representing 1 receipt and the items purchased. Each line is called a transaction and each column in a row represents an item.

Task 1: Data Pre-processing

Read the data in R. There are many ways to read in csv tables in R. For more details, please refer to data import/export in R

For the clustering experiments, the column for class labels need to be removed. Refer to lecture Module 10 to see how to do so.

Verify if any other pre-processing is beneficial for the analysis. For example, replacing missing values, attribute range normalization, converting numerical or string to nominal values etc.

Task 2: Data Mining

- Association Rule Mining experiments: Using R to explorer "association rules" on the groceries dataset.Try out different algorithms. Visualize the result you found. Report any interesting association rules discovered in the experiments and explain why they are interesting.
- Classification experiments: Using to construct classifiers on the mushroom or Ionosphere dataset. Randomly split the data set in the training and test data set (80% v.s. 20%). Select at least one classifier from each of the following two categories of classifiers: Tree-based models, Bayes classifiers, and Rule-based classifiers. Compare the result of the chosen classifers.
- Clustering experiments: Using R explorer clusters on the mushroom or Ionospheredataset.Select and compare two clustering algorithms from R(e.g. k-means v.s. density-based). Use R to visually explore the resulting clusters.
- For all the above experimentations, try different parameter settings to fine tune the outcome. In principle select methods that work well on the given data set.
Task 3: Prepare a report
Your report should contain the following:
- Theoretical Discussion: Limited to two pages discussing about data preprocessing steps, the motivation for selecting a particular method, and how the parameters are chosen.
- Results: Include results and screenshots of the above experimentations.
- Discussion and error analysis: Try to interpret the results of your model. Discuss intuitions or hypothesis that can be obtained by visual inspections of the resulting classes or clusters. Mention about assumptions if any, discuss issues that might have affected the model's performance.
- References: If you are using information from other sources apart from R manual and official website, you should cite them.

Attachment:- Assignment.rar

DBMS, Programming

  • Category:- DBMS
  • Reference No.:- M93108172
  • Price:- $50

Priced at Now at $50, Verified Solution

Have any Question? 

Related Questions in DBMS

Sqlwrite a select statement that returns one column from

SQL Write a SELECT statement that returns one column from the Vendor table named Full Name. Create this column from the VendorContactFName and VendorContactLName columns. Format it as follows: last name, comma, first nam ...

Instructionsfor decades relational databases remained

Instructions For decades, relational databases remained essentially unchanged; data was segmented into specific chunks for columns, slots, and repositories, also called structured data. However, in this Internet of Thing ...

Q1 given the following file for assignment workercom

Q1. Given the following file for assignment, identify data anomalies that must be removed before data can be loaded in data warehouse. Worker_assignment ← -----------------on course web site File is available ...

You are in a real estate business renting apartments to

You are in a real estate business renting apartments to customers. Your job is to define an appropriate schema using SQL DDL in MySQL. The relations are Property(Id, Address, NumberOfUnits), Unit(ApartmentNumber, Propert ...

Data model development and implementationpurpose of the

Data model development and implementation Purpose of the assessment (with ULO Mapping) The purpose of this assignment is to develop data models and map Database System into a standard development environment to gain unde ...

Quesiton 1 what is data-manipulation language dml there are

Quesiton: 1. What is Data-Manipulation Language (DML)? There are four types of access in DML, explain each one. 2. Assume we have a Library Database consists of the following relations: author(author_id, first_name, last ...

Objectivethe objective of this lab is to be familiar with a

OBJECTIVE: The objective of this lab is to be familiar with a process in big data modeling. You're required to produce three big data models using the MS PowerPoint software. This tool is available on UMUC Virtual Deskto ...

Football association of zambia faz super leaguethe faz has

Football Association of Zambia (FAZ) Super League The FAZ has recently decided to reorganise their operations to support both existing and possibly expanded league operations in Zambia and part of preparation for the 201 ...

Backgrounda new training organization called abc

Background A new training organization called ABC TechTraining is opening soon and they have approached you to help design their new database. They have just completed the refurbishment of the premises and are now lookin ...

Assignmentqueries functions and triggersdatabase

Assignment Queries, Functions and Triggers Database Systems Aims The aims of this assignment are to: formulate SQL queries; populate an RDBMS with a real dataset, and analyse the data; design test data for testing SQL qu ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As