Ask Question, Ask an Expert

+61-413 786 465

info@mywordsolution.com

Ask Homework Help/Study Tips Expert

The project must be carried out using any programming language or one of the suggested

platforms and libraries: references to them are listed here and are also available on Blackboard.

·         KNIME, open source Data Mining platform (http://www.knime.org).

·         Weka, open source ML library in Java (http://www.cs.waikato.ac.nz/ml/weka).

·         R, free programming language for statistical computing (http://www.r-project.org).

The following data files are required for this coursework and are provided in Blackboard:

·         wine.csv (data file for tasks 1 and 2)

·         training100Ku.csv (data file for tasks 3)

·         test1K.csv (data file for tasks 3)

·

Wine dataset for Task #1 and Task #2

The data set (wine.csv) is obtained from a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 chemical constituents found in each wine. Each data record contains the cultivar ID (1, 2 or 3) and 13 numerical attributes.

Task #1 – Data Exploration and Clustering

You are required to perform a clustering analysis for the multidimensional data set indicated above. This task has to be carried out two times: with and without normalisation.

Task1.1: Clustering without normalisation

Apply Principal Component Analysis (PCA) to generate two-dimensional coordinates and a 2D plot (plot1) of the records. The data points in plot1 should be represented with a colour associated to their class label. Apply a clustering algorithm to the data set to generate three partitions. Generate a 2D plot (plot2) based on the same PCA projection, similarly to the previous one, where the colour is associated to the cluster ID (use different colours w.r.t. plot1), and compare it with plot1. For the records associated to each cluster generate a 2D plot (plot3a, plot3b, plot3c) with colour associated to the class label (same colours of plot1): visually verify the distribution of class labels in each cluster.

Select, describe and apply at least one cluster validity measure: report the results in the report. Task1.2: Clustering with normalisation

Apply a normalisation pre-processing to the data set and repeat the steps of the part 1. Compare the new plots and the cluster validity measure with the previous ones.

The submission for Task #1 must contain two components:

·         a report section dedicated to your solution for Task #1,

·         any KNIME workflow(*) and source code used (a zip/jar archive).

 

Task #2 – Comparison of Classification Models

You are required to learn and test classification models for the wine data set. For this task you need to carry out a performance comparison of TWO different classification algorithms. You should use a 10-fold cross-validation method to estimate the generalisation error.

In the report you should briefly describe the two algorithms and the method used to compare the two algorithms.

The submission for Task #2 must contain two components:

·         a report section dedicated to your solution for Task #2,

·         any KNIME workflow(*) and source code used (a zip/jar archive).

 

Task #3 – The Search for God Particle: a Binary Classification Challenge

The CERN’s Large Hadron Collider (LHC) typically produces approximately 1011 collisions per hour and about 300 (0.0000003%) of these collisions result in a Higgs boson, the so called God particle. Detecting when interesting particles are produced is an important challenge, which is typically studied by the use of simulations. The data set for this task is related to simulations of collision events, which can be used to train a classification model to distinguish between collisions producing particles of interest (signal) and those producing other particles (background).

 Two data files are provided: the training set (training100Ku.csv) and the test set (test1K.csv). The training set file has 100,000 records, each containing, in this order, 21 numerical low-level attributes, 7 high-level attributes and the class label (signal/background). The low-level attributes are kinematic properties measured by the particle detectors in the accelerator during the experiment. The high-level attributes are computed after the experiment by means of some complex model as function of the low-level attributes (feature transformation).

The test set has 1,000 records, each containing a unique record identifier and 21 numerical low-level attributes (the same measurements in the same order as in the training set). The 7 high-level attributes and the class label are not present.

Your task is to predict the class label for the records of the test set. The resulting predictions must be submitted as a single file (CSV format) with only two columns: the record ID and the predicted class label (signal/background).

You must also include a section in the report to describe the method used to generate the submitted predictions and an estimation of these performance indices: accuracy, F-measure, precision and recall.

In summary, the submission for Task #3 must contain three components:

·         a report section dedicated to your solution for Task #3,

·         any KNIME workflow(*) and/or source code used (a zip/jar archive) and

·         the file “Task3-predictions.csv”.

 

 

 

(*) Important: do not include data when you export a KNIME workflow as a zip archive.

Homework Help/Study Tips, Others

  • Category:- Homework Help/Study Tips
  • Reference No.:- M91708538
  • Price:- $170

Guranteed 48 Hours Delivery, In Price:- $170

Have any Question?


Related Questions in Homework Help/Study Tips

Business report strategic and ethical analysis of an

Business Report: Strategic and Ethical Analysis of an Organisation You are required to select an organisation from Bloomberg resource based (two Bloomberg terminals are on level 10 at Sydney CBD campus and level 2 in bui ...

Charting your cultural awarenessthere are two parts to this

Charting Your Cultural Awareness There are two parts to this assignment: Describing your cultural awareness goals; and Charting action strategies for achieving these goals. Part One: The first part of this assignment pro ...

Discussion within the discussion board area write 400-600

Discussion : Within the Discussion Board area, write 400-600 words that respond to the following questions with your thoughts, ideas, and comments. This will be the foundation for future discussions by your classmates. B ...

Question create an outline and reference page for the final

Question: Create an outline and reference page for the final paper. Include in-text citations with the ideas on the outline. For each line in the outline, write only a phrase or single sentence to convey your main point; ...

Question please respond to the followingsuggest the major

Question: Please respond to the following: Suggest the major benefits of utilizing a flow chart to define and improve a work process. Outline the key steps required to construct and develop an effective flow chart. Deter ...

Explain and give an example of one aspect of change in the

Explain and give an example of one aspect of change in the healthcare industry today, and link leadership competencies necessary to successfully navigating the organizational changes required for dealing with the change. ...

As noted in your text as a result of the idea 2004

As noted in your text, "As a result of the IDEA 2004 regulations, schools are moving toward a more global approach for the identification of students with suspected disabilities through the development of a district-base ...

Question in a 1- to 2-page paper address the

Question: In a 1- to 2-page paper, address the following: • Briefly describe how supportive and interpersonal psychotherapies are similar. • Explain at least three differences between these therapies. Include how these d ...

Question there has been some controversy about the goals of

Question: There has been some controversy about the goals of therapy for victims of traumatic events. Some therapists see forgiveness as a way for victims to forget what has happened to them; some see revenge as unhealth ...

Read the zero-based budgeting powerpoint presentation the

Read the Zero-Based Budgeting. PowerPoint presentation. The Board of Directors of Windsor Memorial Hospital has hired you to be their zero-based budget consultant. Specify how Windsor Memorial Hospital can implement a ze ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As