Ask DBMS Expert


Home >> DBMS

Data Mining Project Assignment

Choice One: Data Analysis

In this project, you are asked to identify a dataset suitable for data mining purposes, and perform data mining tasks, such as classification, association analysis, clustering etc, to the datasets, and report your results and observations.

The following is the step-by-step suggestion to finish the project and the report.

Step 1. Identify suitable datasets and application

In order to identify a suitable dataset to use, start with an application domain that interest you. There are many public available datasets available such as NBA performance data, climate data, intrusion detection benchmark data, manufacturing data, and public wish list data. The class notes contain a collection of websites (in the first self-learning slides) but you are encouraged to use search engine to identify your own datasets that interest you. Be prepared to spend substantial time in preprocessing and exploring the datasets that you choose.

The data set needs to have at least 20 variables and at least 3,000 observations; OR at least 10 variables and at least 30,000 observations. You need to have both categorical and quantitative interval variables.

What to turn in? The final report for this section should contain the following components:

a. An introduction of the application domain that you are interested in

b. A description of the dataset that you selected. It should include details such as the origin and size of the datasets, how the data is represented, e.g. graph, records, attribute, statistics of the values of the attributes...

c. Describe any exploratory analyze and results that you have performed on your datasets.

d. The raw and processed datasets (a brief summary and comparison).

e. A formal problem definition with what is given, what is the goal, and what are the constraints.

f. A plan for your data mining task.

Step 2. Perform data mining tasks on your dataset

In this phase, you will try the intended data mining tasks on your dataset. You can use SAS Enterprise Miner or write your own scripts/codes, or a combination to mine the data. Select an alternative method or several alternative methods to compare your method with. The alternative method(s) could be tree classification or logistic regression, trees with different max number of branches, clustering with different distance choices etc. Compare the results of different methods.

What to turn in? A report describing

a. Your method, e.g. the algorithm, the workflow and any other tasks that you performed.
b. You experimental results.
c. Comparison of the experiment results of your method and the alternative methods.
d. Possible explanations for the experiment results.

Step 3. Make the conclusion

Summarize what you did and what you have learned from this data mining tasks. Describe any future work you think is worthy while.

The final report should be no longer than 25 pages (single-sided, single space, letter size 12). So please pick the most important information to include in the report. Points will be deducted if the report is too long.

Choice One: Free Data Mining Software Evaluation

In this project, you are asked to choose some free data mining software and write to evaluate it or try to write a report to tell us how to use it. Choices of free data mining software and where to download it can be found in Topic 1 folder in D2L.

The following is the step-by-step suggestion to finish the project and the report.

Step 1. Download the software

Include (and not limit to) the following in your report:

- Where to download the software?

- Is there any requirement for the operating system? Can it be run on both Mac and Windows machine, etc.

- How large is the package?

- In general, is the downloading and installation straightforward? Anything need attention during the downloading and installation? If yes, provide step-by-step guidelines.

- The platform (the look) of the software after installation, and the general instruction of each component.

- Some other things you thought of that can show the characteristic of the software.

Step 2. Choose a data set and import it into the software

You can use any data set we used in this class for your project, including the IRIS data. Note that the IRIS data is so popular that it may already exist as one of the built-in data sets in the software.

Include (and not limit to) the following in your report:

- The requirement for the data format, or structure.

- Does the software support for mining very large database? What is the maximum data the software can handle (maximum sample size, maximum number of observations, maximum number of variables, maximum number of levels/categories for a class variable, etc)?

- Does it support for multiple formats? Or is it easy to transform other formats to the formats the software requires?

- Briefly introduce how to import a data into the software. If any format transformation is needed, please explain how to do it also.

- How to set up the modeling rules and measurement levels for variables?

Step 3. Data Exploration

Include (and not limit to) the following in your report:

- Does the software support for graphs, maps, tables, rotation, etc?
- What are the exploration tools available for interval variables?
- What are the exploration tools available for class variables?
- Illustrate some explorations based on your data (refer to homework 2, problem 2).

Step 4. Data Preparation

Include (and not limit to) the following in your report:

- How does the software identify missing, inconsistent, or incorrect data?
- How does the software fix the above problem?
- How does the software perform data conversion and transformations?
- How does the software assist with the sampling process?
- How does the software assist with selection of independent variables (before any modeling)?

Step 5. Modeling

Run at least ONE model for the data, and include the following in your report if it is appropriate for the model you illustrate:

- Does the software support for major prediction (tree, regression, neural network, nearest neighbor, etc.) and description (clustering, principle components, etc.) approaches?

- How many data mining techniques are supported?

- The detailed illustration on how to run the model you choose. How to set up the parameters, how to run the model, how to get the results, and how is the result and running time compare to Enterprise Miner?

- How is the scoring process (scoring means applying the model on a new data set) performed in the software?

If you built more than one model, also consider the following in your report:

- Does the software support model comparison? If yes, how?

Step 6. Conclusion and some additional evaluation

Overall, is the software easy to learn? Is it easy to use? How does it compare to Enterprise Miner in general? Would you recommend it to new data miners? ....

The final report should be no longer than 25 pages (single-sided, single space, letter size 12).

About the Presentation

If you decide to present on SAS day, it's a poster presentation. Please contact Dr. Priestley as soon as possible to reserve a slot and to get some suggestion about the format of the poster.

Other teams, please also prepare the slides for an in-class presentation about what you did.

We will discuss in class how long you have for the presentation. Every person should present part of the project. You will get a separate grading for the presentation. Please refer to the syllabus for the grading rubrics.

Some suggestions for the slides

1. Contain clear outline/agenda/schedule

2. Avoid using more than six lines of text and minimize the number of words on each visual aid

3. Simple is better, avoid a lot of unnecessary formatting. We are interested in your technical content, not your PowerPoint skills.

4. Put your company or university logo on your title slide only; this is a technical presentation to your peers, not a marketing pitch to a customer

5. Use spell check.

6. Avoid flashy Christmas light multiple colors and other distracting means

Some general suggestions for presentations:

• A fast presentation is one slide per minute. A more relaxed pace would be two minutes per slide

• Practice the presentation. There are grading rubrics in the syllabus, which gives the expectation of an outstanding presentation.

• Time your practice sessions to ensure you keep within your allotted time. Remember a team has a time limit and points will be deducted if the presentation is too long.

• Never read the slides verbatim.

DBMS, Programming

  • Category:- DBMS
  • Reference No.:- M92765138

Have any Question?


Related Questions in DBMS

Data mining assignment -in this assignment you are asked to

Data Mining Assignment - In this assignment you are asked to explore the use of neural networks for classification and numeric prediction. You are also asked to carry out a data mining investigation on a real-world data ...

Sql query assignment -for this assignment you are to write

SQL Query Assignment - For this assignment you are to write your answers in a word document. This assignment is in three parts: Part A (reporting queries), Part B (query performance), Part C (query design). For this assi ...

The groceries datasetimagine 10000 receipts sitting on your

The groceries Dataset Imagine 10000 receipts sitting on your table. Each receipt represents a transaction with items that were purchased. The receipt is a representation of stuff that went into a customer's basket. That ...

You are in a real estate business renting apartments to

You are in a real estate business renting apartments to customers. Your job is to define an appropriate schema using SQL DDL in MySQL. The relations are Property(Id, Address, NumberOfUnits), Unit(ApartmentNumber, Propert ...

Objectivethe objective of this lab is to be familiar with a

OBJECTIVE: The objective of this lab is to be familiar with a process in big data modeling. You're required to produce three big data models using the MS PowerPoint software. This tool is available on UMUC Virtual Deskto ...

The relation memberstudentid organizationid roleid stores

The relation Member(StudentId, OrganizationId, RoleId) stores the membership information of student joining organization. For example, ('S1', 'O2', 'R3') indicates that student with Id 'S1' joined the organization with i ...

Relational database exerciseyou have been assigned to a new

Relational Database Exercise: You have been assigned to a new development team. A client is requesting a relational database system to manage their present store with the anticipation of adding more stores in the future. ...

Relational database design a given the following business

Relational Database Design A) Given the following business rules, identify entity types, attributes (at least two attributes for each entity, including the primary key) and relationships, and then draw an Entity-Relation ...

We can represent a data set as a collection of object nodes

We can represent a data set as a collection of object nodes and a collection of attribute nodes, where there is a link between each object and each attribute, and where the weight of that link is the value of the object ...

Data model development and implementationpurpose of the

Data model development and implementation Purpose of the assessment (with ULO Mapping) The purpose of this assignment is to develop data models and map Database System into a standard development environment to gain unde ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As