Ask Question, Ask an Expert

+61-413 786 465

info@mywordsolution.com

Ask Computer Engineering Expert

Assignment - Data Mining Practice and Analysis

Aims

- Familiarise with some well-known data mining techniques in order to understand their working principles;
- Apply data mining techniques to domain-specific datasets;
- Review cutting-edge data mining techniques to gain good overview on current data mining technology;

Requirements (Tasks)

The whole task of this assignment consists of the following procedural steps.

Step 1 :
Set up (by your imagination of a real-like business situation or by applying an actual analysis problem case) a scenario in which you are given a set of domain-specific dataset and asked to analyze the given dataset. The purpose of the analysis might be to understand (overview or learn about) the given data or to solve a specific analytical problem - depending on the scenario you made up.

Step 2 :
Find and get your own domain-specific dataset to fit for the scenario you made up. The dataset could be unique or publicly available. Some public datasets are available from the UCI machine learning repository (http://archive.ics.uci.edu/ml/).

Step 3 :
Choose appropriate data mining techniques (algorithms) - see more details for each option in Step 4 below.
** Note: The procedural order of the above three steps can be alternated. For example, you may find an interesting dataset first and then set up a specific data-mining scenario which fits for the analysis on the dataset chosen. **

Step 4 :

You can select either of two options for this assignment.

- Option (1) - Programming-intensive Assignment

- Once you have your own domain-specific dataset and chosen data mining algorithm, then you need to design and implement the chosen algorithm in your preferred programming language.

- A series of preprocessing will be required at this step. The preprocessing procedure should be designed carefully (considering what kind of processing will be required? How? Why?) to make your data ready to be fed to your program. Some parts of this preprocessing procedure can be included in your program as a part of "pre-data-mining module".

- Your final program must become a stand-alone data-mining tool designed for your own purpose of data analysis. It is expected that your program should include the following modules (and may include more sub-modules if needed);

1) pre-data-mining module - designed for necessary preprocessing and for getting the data ready to be fed to the next module (data-mining module). You don't need to include all required pre-processing in this module. It is assumed that some initial preprocessing (e.g. cleaning noise data) can be done externally using other software tools (e.g. Excel or Weka).

2) data-mining module - the chosen data mining algorithm is implemented. You can directly borrow the algorithm from one popular existing data mining method, or you can design your own algorithm (by amending the existing one)

3) post-mining module - this module is for presenting/reporting the output result produced through previous modules. The result can be made in a simple text report or additionally in a non-text visualization way (e.g. graph, chart or diagram).

- This programming-intensive assignment still requires an analysis. Try to find all the patterns you can detect with your implemented algorithm. Try to compare and contrast the result using your chosen preprocessing scheme and algorithm with using other existing algorithm or with using other preprocessing methods.

Note: in particular for the comparison the result using your program with using other existing algorithm, you can use other existing data mining tools (e.g. Weka) to get the result using other algorithm.

- Option (2) - Analysis-intensive Assignment

- Once you have your own domain-specific dataset chosen, you need to design your own data-mining analysis scheme. This analysis scheme can consist of multiple steps of procedures:

1) Set up a strategy for preprocessing on your data.
A series of preprocessing will be required and need to be designed carefully (considering what kind of processing will be required? How? Why?). You may include multiple different preprocessing schemes for the comparison analysis.

2) Set up a strategy for data-mining.

you need to select one data mining areas (clustering, classification, association rules mining) of your choice and select AT LEAST TWO existing data mining algorithms in your chosen data mining area. For example, if you chose Clustering as your data mining area, you can apply two algorithms; DBScan and K-mean and compare the two results. Alternatively you can design a combined algorithm which applies multiple algorithms from same/different data mining areas in a series. Your strategy also can be designed to apply different parameters for one algorithm. Another strategy you can set up is to apply multiple preprocessing (attribute selection) schemes for one algorithm.

- You can choose one data mining tool (e.g. Weka) to analyze your chosen dataset. Apply the data-mining strategy (you had set up) on your chosen data (preprocessed) using the data mining tool and try to find all the patterns you can detect.

- Do various comparison experiments either by applying different data mining algorithms (or strategy) to the same chosen dataset or by applying a same algorithm to the differently pre-processed datasets.

- Critically analyze experimental results and discuss/demonstrate why a chosen algorithm (strategy) is superior/inferior to other algorithm (strategy).
Step 5 :

- You need to present an in-class presentation (15 minutes presentation + 5 minutes question) based on your chosen algorithm (strategy) and experimental test, and also you need to write a scientific paper as an experimental report.
The presentation must generally include a good overview on your project, aims/objectives, reasons of your choice, brief overview of strategy/algorithm you chosen, findings, comparison including experimental results and conclusion.

You need to write a research report paper of not more than 15 pages (for CP3403 students) or not more than 20 pages (for CP5605/CP5634 students) in length on your project to summarise your algorithm and experimental results. The report should contain all topics listed above for presentation but with more details. For CP5605/CP5634 students, you need to add in your report one additional section for a brief (mini) literature review about the data mining methods (strategy, algorithm and/or preprocessing methods) you chose for your project.

- The research paper must follow the generally accepted format of research article consisting of introduction, related work (brief review of methodologies (algorithm/strategy used), a summarized description of your experimental settings and procedures (description of data, justification of chosen data mining area, justification of chosen algorithm, preprocessing details, etc.), comparison, discussion, issues, conclusion, possible future work and a list of references. (you may add more sections if needed)

- In addition to the general components listed above, the report from "Programming-intensive option" should include a summary of your program (including the program structure, implementation details, a summarized algorithm for the main modules etc. including code if necessary).

- For "Analysis-intensive option", it is required to include a more in-depth analysis on the investigation and experimental comparison made through the project.

Computer Engineering, Engineering

  • Category:- Computer Engineering
  • Reference No.:- M92277894
  • Price:- $90

Priced at Now at $90, Verified Solution

Have any Question?


Related Questions in Computer Engineering

1 explain how the following industries should adapt

1. Explain how the following industries should adapt their businesses to the ever expanding use of social networks and mobile computing (smart phones, tablet computers, etc.): 1) Media and Entertainment, 2) Department st ...

A community hospital wants to estimate the body mass index

A community hospital wants to estimate the body mass index (BMI) of its local population. To estimate the BMI with an error of at most 0.5 at a 95% confidence level, what sample size should they use? The standard deviati ...

Each of the following lists has an average of 50 for which

Each of the following lists has an average of 50. For which one is the spread of the numbers around the average biggest?smallest? a. 0, 20, 40, 50, 60, 80, 100 b. 0, 48, 49, 50, 51, 52, 100 c. 0, 1, 2, 50, 98, 99, 100 Gu ...

Question after reviewing the assigned reading materials

Question: After reviewing the assigned reading materials, complete the following activities: 1. Develop a product service idea. A. Describe the product/service including the benefits of using the product/service B. Discu ...

Question 1 should on four different conflicts you have

Question: 1. Should on four different conflicts you have encountered. These conflicts can be work related or personal conflicts. The presentation will consist of 5 slides from each group member and must have at least 1 a ...

Select one of the discussion topics and respond begin your

Select one of the discussion topics and respond. Begin your response by indicating which question you chose. Discussion topics: Section 1 Should the government set a maximum wage? Why or why not? Section 2 Most people be ...

What are the best practices to follow for microsoft windows

What are the best practices to follow for Microsoft Windows network security. Which two would you start with and why?

Really needing some help with this assignmentto convert

Really needing some help with this assignment. To convert degrees Celsius to degrees Kelvin, we simply add 273 (°K= °C + 273). Prompt the user for a temperature in degrees Celsius, then convert that temperature to degree ...

C programminghelp with a program positivec that include the

***C PROGRAMMING*** Help with a program positive.c that include the following function: void extract(int *a, int n, int *positive, int *size); The function should use pointer arithmetic, not subscripting. The extract fun ...

Question shuffling a linked list design a divide and

Question : Shuffling a linked list. Design a divide and conquer algorithm that randomly shuffles a linked list in O(nlog(n)) time and logarithmic extra space. The response must be typed, single spaced, must be in times n ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As