- Details
- Parent Category: Programming Assignments' Solutions
We Helped With This Python Programming Assignment: Have A Similar One?
Assignment Description
CHEM 450: Spring 2017: Python Project. 10 points.
In this project you will develop a python script to calculate characteristics of a nucleotide sequence.
On cougar courses you will find a file names “NucleotideSequence.txt”. This file is in the familiar FASTA format with the first line starting with a “>” character (and containing the name of the nucleotide sequence) and subsequent lines containing the nucleotide sequence.
Your program should do the following:
1. Read the FASTA file containing the nucleotide sequence.
2. Calculate the AT and GC content of the nucleotide sequence.
The GC content of a nucleotide sequence is generally taken as an indicator of molecular stability. A nucleotide sequence that has higher GC content (than AT) content is said to be more stable due to the presence of 3 hydrogen bonds between G-C pairs (compared to only two hydrogen bonds between A-T pairs). The GC content determines the melting temperature of double stranded nucleotide sequences, and thus the GC content parameter must be taken into consideration in applications such as PCR, where double stranded DNA is melted to produce single strands. The GC content (percentage) is defined as:
(number of G nucleotides + number of C nucleotides)*100/sequence length
The AT content is defined in a similar manner. Note that the GC/AT content can also be reported as a ratio.
3. Find the complementary strand of the nucleotide sequence.
4. Determine the coding region of the nucleotide sequence given the following information:
The nucleotide sequence given in the FASTA file has two exons and one intron. The first exon spans from nucleotide 1 to 63 and the second exon spans from nucleotide 91 to the end of the sequence.
5. Find the mRNA strand transcribed by the coding region.
6. Calculate the number of amino acids that would be translated by the mRNA strand (determined in step 5).
7. Print out the following (formatted in a manner that is easy to understand for the user):
a. The name of the nucleotide sequence
b. The number of bases in the nucleotide sequence
c. The AT content of the nucleotide sequence
d. The GC content of the nucleotide sequence
e. The complementary strand of the nucleotide sequence
f. The coding region of the nucleotide sequence (i.e. the sequence without the intron)
g. The mRNA strand transcribed by the coding region
h. The number of amino acids that would be translated using the mRNA strand