Ask Homework Help/Study Tips Expert

The project must be carried out using any programming language or one of the suggested

platforms and libraries: references to them are listed here and are also available on Blackboard.

·         KNIME, open source Data Mining platform (http://www.knime.org).

·         Weka, open source ML library in Java (http://www.cs.waikato.ac.nz/ml/weka).

·         R, free programming language for statistical computing (http://www.r-project.org).

The following data files are required for this coursework and are provided in Blackboard:

·         wine.csv (data file for tasks 1 and 2)

·         training100Ku.csv (data file for tasks 3)

·         test1K.csv (data file for tasks 3)

·

Wine dataset for Task #1 and Task #2

The data set (wine.csv) is obtained from a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 chemical constituents found in each wine. Each data record contains the cultivar ID (1, 2 or 3) and 13 numerical attributes.

Task #1 – Data Exploration and Clustering

You are required to perform a clustering analysis for the multidimensional data set indicated above. This task has to be carried out two times: with and without normalisation.

Task1.1: Clustering without normalisation

Apply Principal Component Analysis (PCA) to generate two-dimensional coordinates and a 2D plot (plot1) of the records. The data points in plot1 should be represented with a colour associated to their class label. Apply a clustering algorithm to the data set to generate three partitions. Generate a 2D plot (plot2) based on the same PCA projection, similarly to the previous one, where the colour is associated to the cluster ID (use different colours w.r.t. plot1), and compare it with plot1. For the records associated to each cluster generate a 2D plot (plot3a, plot3b, plot3c) with colour associated to the class label (same colours of plot1): visually verify the distribution of class labels in each cluster.

Select, describe and apply at least one cluster validity measure: report the results in the report. Task1.2: Clustering with normalisation

Apply a normalisation pre-processing to the data set and repeat the steps of the part 1. Compare the new plots and the cluster validity measure with the previous ones.

The submission for Task #1 must contain two components:

·         a report section dedicated to your solution for Task #1,

·         any KNIME workflow(*) and source code used (a zip/jar archive).

 

Task #2 – Comparison of Classification Models

You are required to learn and test classification models for the wine data set. For this task you need to carry out a performance comparison of TWO different classification algorithms. You should use a 10-fold cross-validation method to estimate the generalisation error.

In the report you should briefly describe the two algorithms and the method used to compare the two algorithms.

The submission for Task #2 must contain two components:

·         a report section dedicated to your solution for Task #2,

·         any KNIME workflow(*) and source code used (a zip/jar archive).

 

Task #3 – The Search for God Particle: a Binary Classification Challenge

The CERN’s Large Hadron Collider (LHC) typically produces approximately 1011 collisions per hour and about 300 (0.0000003%) of these collisions result in a Higgs boson, the so called God particle. Detecting when interesting particles are produced is an important challenge, which is typically studied by the use of simulations. The data set for this task is related to simulations of collision events, which can be used to train a classification model to distinguish between collisions producing particles of interest (signal) and those producing other particles (background).

 Two data files are provided: the training set (training100Ku.csv) and the test set (test1K.csv). The training set file has 100,000 records, each containing, in this order, 21 numerical low-level attributes, 7 high-level attributes and the class label (signal/background). The low-level attributes are kinematic properties measured by the particle detectors in the accelerator during the experiment. The high-level attributes are computed after the experiment by means of some complex model as function of the low-level attributes (feature transformation).

The test set has 1,000 records, each containing a unique record identifier and 21 numerical low-level attributes (the same measurements in the same order as in the training set). The 7 high-level attributes and the class label are not present.

Your task is to predict the class label for the records of the test set. The resulting predictions must be submitted as a single file (CSV format) with only two columns: the record ID and the predicted class label (signal/background).

You must also include a section in the report to describe the method used to generate the submitted predictions and an estimation of these performance indices: accuracy, F-measure, precision and recall.

In summary, the submission for Task #3 must contain three components:

·         a report section dedicated to your solution for Task #3,

·         any KNIME workflow(*) and/or source code used (a zip/jar archive) and

·         the file “Task3-predictions.csv”.

 

 

 

(*) Important: do not include data when you export a KNIME workflow as a zip archive.

Homework Help/Study Tips, Others

  • Category:- Homework Help/Study Tips
  • Reference No.:- M91708538
  • Price:- $170

Guranteed 48 Hours Delivery, In Price:- $170

Have any Question?


Related Questions in Homework Help/Study Tips

Review the website airmail service from the smithsonian

Review the website Airmail Service from the Smithsonian National Postal Museum that is dedicated to the history of the U.S. Air Mail Service. Go to the Airmail in America link and explore the additional tabs along the le ...

Read the article frank whittle and the race for the jet

Read the article Frank Whittle and the Race for the Jet from "Historynet" describing the historical influences of Sir Frank Whittle and his early work contributions to jet engine technologies. Prepare a presentation high ...

Overviewnow that we have had an introduction to the context

Overview Now that we have had an introduction to the context of Jesus' life and an overview of the Biblical gospels, we are now ready to take a look at the earliest gospel written about Jesus - the Gospel of Mark. In thi ...

Fitness projectstudents will design and implement a six

Fitness Project Students will design and implement a six week long fitness program for a family member, friend or co-worker. The fitness program will be based on concepts discussed in class. Students will provide justifi ...

Read grand canyon collision - the greatest commercial air

Read Grand Canyon Collision - The greatest commercial air tragedy of its day! from doney, which details the circumstances surrounding one of the most prolific aircraft accidents of all time-the June 1956 mid-air collisio ...

Qestion anti-trustprior to completing the assignment

Question: Anti-Trust Prior to completing the assignment, review Chapter 4 of your course text. You are a manager with 5 years of experience and need to write a report for senior management on how your firm can avoid the ...

Question how has the patient and affordable care act of

Question: How has the Patient and Affordable Care Act of 2010 (the "Health Care Reform Act") reshaped financial arrangements between hospitals, physicians, and other providers with Medicare making a single payment for al ...

Plate tectonicsthe learning objectives for chapter 2 and

Plate Tectonics The Learning Objectives for Chapter 2 and this web quest is to learn about and become familiar with: Plate Boundary Types Plate Boundary Interactions Plate Tectonic Map of the World Past Plate Movement an ...

Question critical case for billing amp codingcomplete the

Question: Critical Case for Billing & Coding Complete the Critical Case for Billing & Coding simulation within the LearnScape platform. You will need to create a single Microsoft Word file and save it to your computer. A ...

Review the cba provided in the resources section between

Review the CBA provided in the resources section between the Trustees of Columbia University and Local 2110 International Union of Technical, Office, and Professional Workers. Describe how this is similar to a "contract" ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As