Ask Question, Ask an Expert

+61-413 786 465

info@mywordsolution.com

Ask DBMS Expert


Home >> DBMS

Data Mining Project Assignment

Choice One: Data Analysis

In this project, you are asked to identify a dataset suitable for data mining purposes, and perform data mining tasks, such as classification, association analysis, clustering etc, to the datasets, and report your results and observations.

The following is the step-by-step suggestion to finish the project and the report.

Step 1. Identify suitable datasets and application

In order to identify a suitable dataset to use, start with an application domain that interest you. There are many public available datasets available such as NBA performance data, climate data, intrusion detection benchmark data, manufacturing data, and public wish list data. The class notes contain a collection of websites (in the first self-learning slides) but you are encouraged to use search engine to identify your own datasets that interest you. Be prepared to spend substantial time in preprocessing and exploring the datasets that you choose.

The data set needs to have at least 20 variables and at least 3,000 observations; OR at least 10 variables and at least 30,000 observations. You need to have both categorical and quantitative interval variables.

What to turn in? The final report for this section should contain the following components:

a. An introduction of the application domain that you are interested in

b. A description of the dataset that you selected. It should include details such as the origin and size of the datasets, how the data is represented, e.g. graph, records, attribute, statistics of the values of the attributes...

c. Describe any exploratory analyze and results that you have performed on your datasets.

d. The raw and processed datasets (a brief summary and comparison).

e. A formal problem definition with what is given, what is the goal, and what are the constraints.

f. A plan for your data mining task.

Step 2. Perform data mining tasks on your dataset

In this phase, you will try the intended data mining tasks on your dataset. You can use SAS Enterprise Miner or write your own scripts/codes, or a combination to mine the data. Select an alternative method or several alternative methods to compare your method with. The alternative method(s) could be tree classification or logistic regression, trees with different max number of branches, clustering with different distance choices etc. Compare the results of different methods.

What to turn in? A report describing

a. Your method, e.g. the algorithm, the workflow and any other tasks that you performed.
b. You experimental results.
c. Comparison of the experiment results of your method and the alternative methods.
d. Possible explanations for the experiment results.

Step 3. Make the conclusion

Summarize what you did and what you have learned from this data mining tasks. Describe any future work you think is worthy while.

The final report should be no longer than 25 pages (single-sided, single space, letter size 12). So please pick the most important information to include in the report. Points will be deducted if the report is too long.

Choice One: Free Data Mining Software Evaluation

In this project, you are asked to choose some free data mining software and write to evaluate it or try to write a report to tell us how to use it. Choices of free data mining software and where to download it can be found in Topic 1 folder in D2L.

The following is the step-by-step suggestion to finish the project and the report.

Step 1. Download the software

Include (and not limit to) the following in your report:

- Where to download the software?

- Is there any requirement for the operating system? Can it be run on both Mac and Windows machine, etc.

- How large is the package?

- In general, is the downloading and installation straightforward? Anything need attention during the downloading and installation? If yes, provide step-by-step guidelines.

- The platform (the look) of the software after installation, and the general instruction of each component.

- Some other things you thought of that can show the characteristic of the software.

Step 2. Choose a data set and import it into the software

You can use any data set we used in this class for your project, including the IRIS data. Note that the IRIS data is so popular that it may already exist as one of the built-in data sets in the software.

Include (and not limit to) the following in your report:

- The requirement for the data format, or structure.

- Does the software support for mining very large database? What is the maximum data the software can handle (maximum sample size, maximum number of observations, maximum number of variables, maximum number of levels/categories for a class variable, etc)?

- Does it support for multiple formats? Or is it easy to transform other formats to the formats the software requires?

- Briefly introduce how to import a data into the software. If any format transformation is needed, please explain how to do it also.

- How to set up the modeling rules and measurement levels for variables?

Step 3. Data Exploration

Include (and not limit to) the following in your report:

- Does the software support for graphs, maps, tables, rotation, etc?
- What are the exploration tools available for interval variables?
- What are the exploration tools available for class variables?
- Illustrate some explorations based on your data (refer to homework 2, problem 2).

Step 4. Data Preparation

Include (and not limit to) the following in your report:

- How does the software identify missing, inconsistent, or incorrect data?
- How does the software fix the above problem?
- How does the software perform data conversion and transformations?
- How does the software assist with the sampling process?
- How does the software assist with selection of independent variables (before any modeling)?

Step 5. Modeling

Run at least ONE model for the data, and include the following in your report if it is appropriate for the model you illustrate:

- Does the software support for major prediction (tree, regression, neural network, nearest neighbor, etc.) and description (clustering, principle components, etc.) approaches?

- How many data mining techniques are supported?

- The detailed illustration on how to run the model you choose. How to set up the parameters, how to run the model, how to get the results, and how is the result and running time compare to Enterprise Miner?

- How is the scoring process (scoring means applying the model on a new data set) performed in the software?

If you built more than one model, also consider the following in your report:

- Does the software support model comparison? If yes, how?

Step 6. Conclusion and some additional evaluation

Overall, is the software easy to learn? Is it easy to use? How does it compare to Enterprise Miner in general? Would you recommend it to new data miners? ....

The final report should be no longer than 25 pages (single-sided, single space, letter size 12).

About the Presentation

If you decide to present on SAS day, it's a poster presentation. Please contact Dr. Priestley as soon as possible to reserve a slot and to get some suggestion about the format of the poster.

Other teams, please also prepare the slides for an in-class presentation about what you did.

We will discuss in class how long you have for the presentation. Every person should present part of the project. You will get a separate grading for the presentation. Please refer to the syllabus for the grading rubrics.

Some suggestions for the slides

1. Contain clear outline/agenda/schedule

2. Avoid using more than six lines of text and minimize the number of words on each visual aid

3. Simple is better, avoid a lot of unnecessary formatting. We are interested in your technical content, not your PowerPoint skills.

4. Put your company or university logo on your title slide only; this is a technical presentation to your peers, not a marketing pitch to a customer

5. Use spell check.

6. Avoid flashy Christmas light multiple colors and other distracting means

Some general suggestions for presentations:

• A fast presentation is one slide per minute. A more relaxed pace would be two minutes per slide

• Practice the presentation. There are grading rubrics in the syllabus, which gives the expectation of an outstanding presentation.

• Time your practice sessions to ensure you keep within your allotted time. Remember a team has a time limit and points will be deducted if the presentation is too long.

• Never read the slides verbatim.

DBMS, Programming

  • Category:- DBMS
  • Reference No.:- M92765138

Have any Question?


Related Questions in DBMS

Football association of zambia faz super leaguethe faz has

Football Association of Zambia (FAZ) Super League The FAZ has recently decided to reorganise their operations to support both existing and possibly expanded league operations in Zambia and part of preparation for the 201 ...

Answer the following question explain the difference

Answer the following Question : Explain the difference between a database management system (DBMS) and a database. Are Microsoft Access, SQL Server, and Oracle examples of databases or database management systems (DBMS)?

Question as explained throughout this course entity

Question: As explained throughout this course, entity relationship modeling is a critical element of database design. If the database is not properly modeled, it is unlikely that the database will be properly developed. ...

Tableau is business intelligence software that helps people

Tableau is business intelligence software that helps people see and understand their data. Fast Analytics Connect and visualize your data in minutes. Tableau is 10 to 100x faster than existing solutions. Ease of Use Anyo ...

Sql transactions exercisesconsider table itemnameprice

SQL Transactions Exercises Consider table Item(name,price) where name is a key, and the following two concurrent transactions. T1: Begin Transaction; Update Item Set price = 2*price Where name = 'pencil'; Insert Into Ite ...

Query 1 the bookstore has decided to keep track of the

Query 1: The bookstore has decided to keep track of the vendors' information. In order to do this, one new table will be added to the database. The schema for this table, as related to the existing tables, is the followi ...

Databases assignment - monash library services monlib case

Databases Assignment - Monash Library Services (MonLib) Case Study TASK 1: Data Definition For this task you are required to complete the following: 1.1 - Add to your solutions script, the CREATE TABLE and CONSTRAINT def ...

A schools office of the registrar maintains data about the

A School's office of the registrar maintains data about the following entities: a) courses (including course number, title, credits, syllabus and prerequisites), b) course offerings (including course number, year, semest ...

Question suppose we have two kinds of doctors hospital

Question : Suppose we have two kinds of doctors: hospital doctors and family physicians. In addition to the doctor's id number, name, specialty, and years of experience, we want to record the hospital name for the hospit ...

Systems analysis project scenic routes operates a bus

Systems analysis project Scenic Routes operates a bus company that specializes in travelling on secondary roads, rather than Interstate highways. Their slogan is: "It Takes a Little Longer, But It's Scenic." The firm nee ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As