Ask Question, Ask an Expert

+61-413 786 465

info@mywordsolution.com

Ask Computer Engineering Expert

The dataset provided in this assignment contains a collection of real DNA sequences. The number of true binding sites is quite limited and that makes the problem challenging. In machine learning community, this is termed as imbalanced datasets. Some techniques dealing with imbalanced data classification, such as sampling or filtering, can be applied for the biological data. It is a good idea to find some relevant publications to see in which way you can build effective classifiers for motif recognition.

The whole dataset should be partitioned into a training dataset used to build the learner models, and a testing dataset used to evaluate generalization capability of the classification systems. System performance will be evaluated by looking at the recall, precision, F-measure and recognition rate for both the training dataset and the test dataset.

It is very important to notice that unlike traditional way for evaluating classifier's performance, here a kmer is classified as a motif instance if its location has at least 50% overlap with a true binding site in the DNA sequences. For example, consider two true binding sites ACACGGGA and ACACGGGA in the following DNA sequence.

ccttacacaaACACGGGAgaattaatACACGGGAtcagatcaataaa (1)

Suppose that the 8mers acaaACAC and ACGGGAtc are classified as binding sites by a learner model. Then, we will count them as correct prediction because they have 50% and 75% overlaps with the true binding sites in sequence (1), respectively. Conversely, if classifiers classify them as non-binding sites, then we will count them as incorrect prediction because they have at least 50% overlaps with the true binding sites. Take another 8mer, GAgaatta, in (1). If it is classified by a learner model as a binding site, then it will be counted as a misclassified one because it has only 25% overlap with the true binding site ACACGGGA

Computer Engineering, Engineering

  • Category:- Computer Engineering
  • Reference No.:- M9533129

Have any Question?


Related Questions in Computer Engineering

Need help with a java program that takes two arrays a and b

Need help with a Java program that takes two arrays a and b of length 5 storing int values, and returns the dot product of a and b. That is, it returns an array c of length n such that c[i]=a[i]*b[i].

Is there any drawbacks to hashingwhat is hash value and why

Is there any drawbacks to hashing? What is hash value and why do you think that it is important

What are the ways that it can help comply with legal

What are the ways that IT can help comply with legal requirements and social responsibilities surrounding the sales of alcohol?

I really need help writing this c progam i have seen many

I really need help writing this C++ progam. I have seen many answers to this question on this site, but they are all deeply flawed. Design the following classes in C++. Person class is the base class for other classes. P ...

Be as specific as you can for the following questions

Be as specific as you can for the following questions, explain each concept in detail and how you arrive at your solutions. This is JUST as important as your actual answers. You are provided a unit square domain where th ...

Terry amp sons makes bearings for autos the production

Terry & Sons makes bearings for autos. The production system involves two independent processing machines so that each bearing passes through these two processes. The probability that the first processing machine is not ...

Reading the biographybook where the body meets memory by

Reading the Biography Book : "Where the Body Meets Memory" by David Mura Questions: On page 62 to 66, David Mura talked about an incident when he was in fifth grade. A bully called Mike Wrangel was trying to beat up anot ...

Suppose you are a manager in the it department for the

Suppose you are a manager in the IT department for the government of a corrupt dictator, who has a collection of computers that need to be connected together to create a communication network for his spies. You are given ...

An important decision places christmas holiday celebrators

An important decision places Christmas holiday celebrators: To buy real or artificial trees? A market research firm reported that 62% of individuals polled preferred an artificial tree. We conduct independent serving of ...

1 under what circumstances is it advantageous for a company

1. Under what circumstances is it advantageous for a company competing in foreign markets to concentrate its value chain activities in a select few locations? Under what circumstances is it advantageous for a company com ...

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As