Ask Computer Engineering Expert

Hand in: A printed report about your experiments. Electronic copies of your report, source and parameter files (online hand-in). Use Excel to create performance graphs, to be included in your report. Submit electronic copies of all data and report files using the "submit5p71" script on sandcastle. It will submit the entire directory substructure of your assignment.

System: You are free to use any GP system you want (ECJ, lilGP, Open Beagle, RobGP, etc.).The system that will be discussed in class is lilGP 1.1. This is a C-based GP system with the same functionality as Koza's GP system in his first and second books. It also has many additional enhancements (strong typing, multiple populations, etc.). However, ECJ has been the most popular choice for students, as it is more contemporary, better maintained, has more features, and is documented. See the end of the assignment for the location of these systems.

A. Symbolic regression

This question will introduce you to genetic programming. You are to take an existing symbolic regression example, compile it for the platform you are using, and execute it to get some results. The lilGP, ECJ, and Open Beagle systems all include this example. Once you get it running, modify the training initialization setup, by reading in the training points from a text file. You should have the total number of points to process and the name of the text file as parameters in the "input file" user parameter file. This will let you run your system on new data sets, without having to recompile the GP system. (Note that this modification will be useful for part B below).

Try the system a few times, on new data you have put into a new training file. Also try it on a few variations of GP languages. For example, try it on the default language, as well as a stronger language (add new functions).

Decide on two comparative experiments to do, using the same training data. You might compare the GP language variants, or selection strategy, or fitness evaluation scoring. Do 10 runs using each of these experimental variations. Your report should compare these two experiments. A summary table consisting of the average over 10 runs of all the population averages, and average "best" at the end of each run, should be computed for each experiment, and put into a table. Performance graphs showing the overall run averages of the population average and best per generations should also be plotted in Excel.

B. Auto MPG Data Set

The Auto MPG data set is a famous classification problem from the UCI Machine Learning Repository:
http://archive.ics.uci.edu/ml/datasets/Auto+MPG

You are to use genetic programming to evolve a rule that predicts the fuel consumption (mpg, miles per gallon) for a given var. The data set has 398 cases, all with the following numeric attributes:

1. mpg: continuous
2. cylinders: multi-valued discrete
3. displacement: continuous
4. horsepower: continuous
5. weight: continuous
6. acceleration: continuous
7. model year: multi-valued discrete
8. origin: multi-valued discrete
9. car name: string (unique for each instance)

Here are three examples from the database, for three different cars:

25.0 4 113.0 95.00 2228. 14.0 71 3 "toyota corona"
25.0 4 98.00 ? 2046. 19.0 71 1 "ford pinto"
19.0 6 232.0 100.0 2634. 13.0 71 1 "amc gremlin"

Attributes 1, 3, 4, 5, and 6 are continuous (floating point); attributes 2, 7, 8 are discrete (integers). Attribute 9 is a string of the car name. You should ignore this value, as it will not be useful for GP to use. (You can load the data into Excel, and delete that column). Once this is done, then all values in the data are numeric (with an exception mentioned below). Since all the values are numeric, it is not necessary to use strongly typed GP. (However, strong typing can still be useful.)

Attribute #1 is the MPG for the car. It is the "answer value" for a car example. We want to evolve a GP program will evolve an expression that uses attributes 2 through 8, to determine attribute 1. For example, if we supply an evolved GP expression with the attribute values for the Toyota Corona, we want the GP program to return a value close to 25.0. Of course, do not give your GP tree expression access to the MPG value, because it is the solution (that would be like giving students an exam key during an exam!).

The Ford Pinto example has a missing horsepower ("?"). There are 6 examples in the database that have missing values like this. Please delete these examples the database.

Here is a recommended approach to follow.

1. Read the database into a large table, one row per example. You should randomly shuffle the rows. Then split the table into 2 independent sets of examples - a training set, and a testing set. The exact size of these sets is up to you, and may have to be experimentally determined. Be aware that machine learning is usually affected by the size of training sets. For example, sometimes smaller training sets are more effective.

2. Define a set of language primitives to be used by GP. These primitives should work sensibly on the input data. A possible set of primitives to consider are:
a. functions: arithmetic operators, min and max, relationals, if-then-else,...
b. terminals: attribute variables (7), and ephemeral random constants.

3. The top-level wrapper that executes each classifier program works as follows. The 7 attribute values, from a row in the table, are copied into a set of 7 variables accessible by the GP expression, and which will take the form of leaf terminals within the tree. The program is executed by the GP system interpreter using these values. The resulting answer is recorded. The fitness function will then compare this answer to the known quality solution from the example table, and the fitness score will be appropriately updated (see next point). So in other words, the following pseudo-code shows the general execution strategy:

float cylinders, displacement, horsepower, ..., origin; int ans;

loop for i=0 to N-1 : // N training examples from table cylinders = training[i][1];
displacement = training[i][2]; horsepower = training[i][3];
(... etc. all the way to training[i][7])
computed_mpg = execute_tree(t); // t is the current GP tree
// now the predicted quality found in ans can be compared to
// real recorded quality of that example,
// perhaps stored in integer array ans[i] (i.e. training[i][0]).

4. Fitness is based upon how well the GP answers match the "real" mpg for cars as recorded in the database. There are a few options.
(i) Compute a sum of absolute error between the actual mpg and target mpg for a car:
∑ abs( computed_mpg - ans[i] ) This summation is computed over all the training examples.
(ii) Compute the sum of errors squared:
∑ (computed_mpg - ans[i]) ^ 2 This tends to punish large differences in predicted mpg.

Note that both these formula prefer smaller sums (an exact match would have a sum of 0.0).

5. At the end of the run, you must test the best GP solution by running it on the testing set (i.e. all the examples not used for training). This gives an idea of the generality of your evolved solution when given new data it has never seen. Record the test performance of each run. This information will be included in the report.

6. Do 10 runs, each with different random number seeds and shuffled data sets. Collate the performance of your runs together. Your report should have a table summarizing all the runs, for both the training scores and testing scores. You should report the average best solution scores, as well as the scores for the overall best solution from all runs. Include a confusion matrix with each experiment. Also include performance

graphs showing the population fitness and best fitness plotted over generations (averaged over for all the runs).

7. Your report should describe all relevant aspects of your experiment. The goal of the report is to give enough information so that someone else could reproduce your experiments. Describe your problem set (and its source), and describe what your GP program is attempting to do. Be sure to describe your GP language, all GP parameters, and your fitness evaluation methodology. Include source code of evolved GP programs for your 2 best rules obtained. If they are large, they can be put into an appendix of your report. Include a discussion on your results. Tables and graphs do not speak for themselves, and so you must describe and discuss the important features in them. Also include a conclusion section that summarizes the strengths or weaknesses of your experiment.

The report format should be:

1. Introduction: problem description
2. Experiment details:
a. parameters for GP
b. fitness function
c. GP language
d. format of data
e. variations of experiments
3. Results
a. tables of results (best, avg, etc.), confusion matrices
b. graphs
4. Analytical discussion of results
5. Conclusions
6. Bibliography (data set URL, references,...)

Please write separate reports for parts A Regression and B Auto MPG.

https://drive.google.com/file/d/0B3-8Q4yDXVU8SkNDRW4xT0gzWUk/view

Attachment:- Genetic programming.zip

Attachment:- Tutorial.txt

Computer Engineering, Engineering

  • Category:- Computer Engineering
  • Reference No.:- M92238233

Have any Question?


Related Questions in Computer Engineering

Does bmw have a guided missile corporate culture and

Does BMW have a guided missile corporate culture, and incubator corporate culture, a family corporate culture, or an Eiffel tower corporate culture?

Rebecca borrows 10000 at 18 compounded annually she pays

Rebecca borrows $10,000 at 18% compounded annually. She pays off the loan over a 5-year period with annual payments, starting at year 1. Each successive payment is $700 greater than the previous payment. (a) How much was ...

Jeff decides to start saving some money from this upcoming

Jeff decides to start saving some money from this upcoming month onwards. He decides to save only $500 at first, but each month he will increase the amount invested by $100. He will do it for 60 months (including the fir ...

Suppose you make 30 annual investments in a fund that pays

Suppose you make 30 annual investments in a fund that pays 6% compounded annually. If your first deposit is $7,500 and each successive deposit is 6% greater than the preceding deposit, how much will be in the fund immedi ...

Question -under what circumstances is it ethical if ever to

Question :- Under what circumstances is it ethical, if ever, to use consumer information in marketing research? Explain why you consider it ethical or unethical.

What are the differences between four types of economics

What are the differences between four types of economics evaluations and their differences with other two (budget impact analysis (BIA) and cost of illness (COI) studies)?

What type of economic system does norway have explain some

What type of economic system does Norway have? Explain some of the benefits of this system to the country and some of the drawbacks,

Among the who imf and wto which of these governmental

Among the WHO, IMF, and WTO, which of these governmental institutions do you feel has most profoundly shaped healthcare outcomes in low-income countries and why? Please support your reasons with examples and research/doc ...

A real estate developer will build two different types of

A real estate developer will build two different types of apartments in a residential area: one- bedroom apartments and two-bedroom apartments. In addition, the developer will build either a swimming pool or a tennis cou ...

Question what some of the reasons that evolutionary models

Question : What some of the reasons that evolutionary models are considered by many to be the best approach to software development. The response must be typed, single spaced, must be in times new roman font (size 12) an ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As