Ask Question, Ask an Expert

+61-413 786 465

info@mywordsolution.com

Ask Biology Expert

Coding for biologists:

SUBMISSION INSTRUCTIONS

You should submit a single zipped file containing the entire work directory for the assignment.

This should include: all FASTA files, all of your code, and an iPython notebook with the details of your work. All code should be either included in the notebook, or written in separate files that are either imported or run from the notebook via the %run iPython magic,

see - https://ipython.org/ipython-doc/rel-0.10.2/html/interactive/tutorial.html

All output and all comments should appear in the notebook. It should be possible to run the entire notebook by running the cells sequentially from the beginning to the end (check that this works by restarting the kernel and working through the notebook from the top). Code and graphic output not linked to (directly or indirectly) from the notebook will not be marked. Only the notebook,Python code, text files required by the software and graphic output produced by the softwarewill be marked.Comment your code thoroughly and format it properly.

MARKING CRITERIA

Your work will be marked based on:
- completeness and correctness: 60%
- quality of the algorithmic solutions (including appropriate use of data and control structures, use of functions, etc.): 30%
- coding style (comments, variable names, readability of code): 10%

Outline

For this assignment, you will implement a simple utility for drawing dotplots comparing two proteins. You can refer to the dotter program and the lecture notes for the Computational Genomics module for inspiration. The assignment is presented as a sequence of stages.

Attempt all questions in the "Requiredfunctionality" part before implementing any features marked as "Optional functionality". You can implement any subset you like of the optional functionality. Check that your program runs correctly in the terminal, then use the iPython %run magic to run it from within a notebook. Include sample output for each functionality you implement and any other relevantinformation in the notebook.

Indicate clearly near the top of the notebook which of the questions you have attempted.

Required functionality

a) Write a dotplot program that reads two proteins from FASTA files specified on the command line (see sys.argv in the Python documentation). The program should output a simple dotplot to the terminal. The dotplot should involve only the first 70 residues of the sequence displayed horizontally and the first 20 residues of the sequence displayed vertically, so as to fit in the standard terminal screen. The first row and the first column should display the two sequences. In the dotplot proper, an asterisk (*) should mark locations corresponding to matching entries, while the rest should be left empty. A sample output (limited here for convenience to 10 residues from one sequence and 5 from the other) should look like:

TSLWWAPQQR
A *
K
Q **
P *
R *
Include a sample output in your notebook.

b) Code a simple help message to be displayed when the program is invoked with wrong or insufficient arguments or with the string help on the command line. Run your program from within the notebook to display the help message. To allow for easy modification and translation, the help message should be stored in a separate text file and loaded and displayed upon request.

c) Program a simple menu system of the type found in clustalw that allows the user to specify the names of the input files, obtain help, and quit the program. The menu should be displayed if the program is invoked without command-line arguments, or in any case after a dotplot is produced. You should wait for the user to press the enter key before reverting to the menu, to avoid wiping out the dotplotimmediately when running in a terminal. For clarity, print the following line just below the dotplot: Hit to return to menu:
Include a screenshot of the menu in the notebook.

d) Implement panning through the sequences to visualise the rest of the dotplot. When a dotplot is displayed, the user should have a choice to press one of five keys to "page" forwards or backwards through either sequence, or return to the main menu. Following this a different portion of the dotplot should be displayed, or the user should be returned to the main menu. For example, a text line printed just below the dotplot should read:

Enter [r]ight, [l]eft, [u]p, [d]own or [m]enu:

The system should be able to handle sequences with a number of residues that isn't a multiple of 20 or 70. Demonstrate this feature in the notebook.

Optional functionality

e) Use a scoring matrix instead than a simple identity check to score corresponding amino acids. Only plot a (*) if the score is above a threshold. The scoring matrix should be stored in a separate file that is loaded as required. The user should be able to select the threshold with a command line option and through the menu; for example mydotplot -t0.3 proteinA.fastaproteinB.fastashould select a threshold value of 0.3. Include sample output in the notebook and comment on the difference with respect to the simpler scoring scheme, if any (you can return to identity matching by choosing the identity matrix as your scoring scheme).

f) Implement filtering with a window of length w.

If you are not implementing (e): only draw a (*) at position (i,j) on the dotplot if the number of matching residues in corresponding positions within windows of length w centred at positions i (respectively j) onthe two sequences is above a threshold t. So for instance if w=5 and t=3 a (*) should appear at any givenposition only if at least 3 corresponding residues within windows of length 5 match (both in the sense that they are the same residue, and that they are in the same position within the window; so for example if the two filtering windows contain "APKTR" and "AKQWR" then A and R count as a matches but K does not).

If you are implementing (e): For each position (i,j) in the two sequences, pairs of amino acids in corresponding positions in the filtering windows should be scored using the scoring matrix. These scores should be averaged and compared against the threshold. A (*) should then be printed only if the resulting average score is above the threshold.

In either case you should implement a command line option -f to allow the user to request the use of the filter and specify the length of the window, and an option -t for threshold selection. For instance mydotplot -f5 -t2.0 proteinA.fastaproteinB.fastashould produce a dotplot of protein Avs protein B, filtered with a window of length 5 and a threshold of 2.0. The same functionality should also be accessible through the menu. Invoke your dotplot program on two sample sequences, without and with filtering, include the output in the notebook and comment on the differences.

g) Give the user the option to display the dotplot for the entire sequences using a graphic library. I suggest the imshow function from the matplotlib library, but other equivalent choices are also fine (if this library is not present on your system, use the software installer to install python-matplotlib). http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.imshow

Note that you will not be able to display the sequences with imshow, only the dots will be displayed as an image.

For this to work, you will need to create a two-dimensional array of the appropriate size and set each single entry to 0.0 (black) or 1.0 (white) to differentiate between dots and background. You can pass the keyword argument cmap='gray' to imshow to select a grey scale colormap. If you have implemented point (e) and/or (f), you may want to display the matching score itself as a grey level, instead of creating a black-and-white (two-level) dotplot. It is still useful to set a threshold below which the point is set to white. According to your scoring scheme, you may need to rescale/normalize thethresholded scores for display with imshow (read the function description and the example carefully). Graphic output should be selectable from the command line (via option -g) and from the menu of your program. Include sample output in the notebook (you can get matplotlib images to display directly in the notebook by running the magic %matplotlib inline).

SUBMISSION CHECKLIST:
- Notebook contains links to all relevant code and all output required
- Notebook runs in a sequence from the first cell to the last with a fresh kernel
- Notebook and software include name of author and/or student number
- No Microsoft Word or other files other than Python code, text and a notebook file, and images generated by the code (with links in the notebook)
- All relevant files are included in the submission as a single .zip file

Biology, Academics

  • Category:- Biology
  • Reference No.:- M92017581
  • Price:- $200

Guranteed 48 Hours Delivery, In Price:- $200

Have any Question?


Related Questions in Biology

Assignment on nutrition - q1 task you need to select 2

Assignment on Nutrition - Q1. Task: You need to select 2 different age groups of your choice. You will need to plan balanced meals with snacks for a day. Once you have laid out the meal plan you need to: Explain why the ...

Question summaryfor readability please be sure to

Question: Summary For readability, please be sure to double-space your assignments. A new wild strain of nopal cactus has been identified as having remarkable promise to solve the international dietary manganese deficit ...

You take a small section of leaf and view it underneath a

You take a small section of leaf and view it underneath a microscope. You count 25 stomata. Now you want to calculate how many stomata would be in a leaf the size of a penny. a) Calculate the area of the view of view (A= ...

What did you determine was the relationship between surface

What did you determine was the relationship between surface tension and the polarity of the liquids you tested?

Igfbp2 rbp4 and factor d post bariatric surgeryigfbp2 what

IGFBP2/ RBP4 and Factor D Post Bariatric Surgery IGFBP2 ( what the normal physiological action in the body? And how it affectedby obesity? andpost bariatric surgery?) RBP4 (what the normal physiological action in the bod ...

Trace the flow of carbon within the process of

Trace the flow of carbon within the process of photosynthesis. Be sure to include the following terms in your description: Glucose.NADPH,ATP, Calvin cycle, RUBISCO,CO2.

Assignment 2 biological basiscontinuing on the research

Assignment 2: Biological Basis Continuing on the research that you started in Week 3, explain what your chosen biotechnology accomplishes and how it is implemented, and describe the body of knowledge that it is based upo ...

John is a 53-year old contraction worker who has come into

John is a 53-year old contractIon worker who has come into your office complaining of a sore knee joint. You see a buildup fluid close to the patella (kneecap) but deep to the skin and suspect the soreness is due to burs ...

Quesiton synthetic chromosomes transcriptomes and patents

Quesiton: "Synthetic chromosomes, Transcriptomes, and Patents on BRCA genes" For your primary post, please respond to one of the following three topics with a post of at least 125 words that addresses each point given in ...

If the atomic number of an element is 12 and the atomic

If the atomic number of an element is 12 and the atomic mass is 25. How many protons are there in the nucleus? How many neutrons are there in the nucleus? How many electrons are in the atom?

  • 4,153,160 Questions Asked
  • 13,132 Experts
  • 2,558,936 Questions Answered

Ask Experts for help!!

Looking for Assignment Help?

Start excelling in your Courses, Get help with Assignment

Write us your full requirement for evaluation and you will receive response within 20 minutes turnaround time.

Ask Now Help with Problems, Get a Best Answer

Why might a bank avoid the use of interest rate swaps even

Why might a bank avoid the use of interest rate swaps, even when the institution is exposed to significant interest rate

Describe the difference between zero coupon bonds and

Describe the difference between zero coupon bonds and coupon bonds. Under what conditions will a coupon bond sell at a p

Compute the present value of an annuity of 880 per year

Compute the present value of an annuity of $ 880 per year for 16 years, given a discount rate of 6 percent per annum. As

Compute the present value of an 1150 payment made in ten

Compute the present value of an $1,150 payment made in ten years when the discount rate is 12 percent. (Do not round int

Compute the present value of an annuity of 699 per year

Compute the present value of an annuity of $ 699 per year for 19 years, given a discount rate of 6 percent per annum. As