Hi,
Ok, so i'm new to bioinformatics and don't really know what im doing!
I am doing an assignment on tumor protein 53 and have been using the NCBI website and at the moment im stuck on a question.
For the information below, I am supposed to find the nucleotide sequences of the gene (which is the sequences under the origin heading yes?) and the amino acid sequence data which I have been told is under the Features heading, though I have no idea how to read the information given! Any ideas?
FEATURES Location/Qualifiers
source 1..2331
/organism="Homo sapiens"
/mol_type="mRNA"
/db_xref="taxon:9606"
/chromosome="17"
/map="17p13.1"
gene 1..2331
/gene="TP53"
/gene_synonym="FLJ92943; LFS1; p53; TRP53"
/note="tumor protein p53"
/db_xref="GeneID:7157"
/db_xref="HGNC:11998"
/db_xref="MIM:191170"
exon 1..441
/gene="TP53"
/gene_synonym="FLJ92943; LFS1; p53; TRP53"
/inference="alignment:Splign"
/number=5a
STS 243..558
/gene="TP53"
/gene_synonym="FLJ92943; LFS1; p53; TRP53"
/standard_name="GDB:178567"
/db_xref="UniSTS:155019"
STS 243..486
/gene="TP53"
/gene_synonym="FLJ92943; LFS1; p53; TRP53"
/standard_name="GDB:363689"
/db_xref="UniSTS:156784"
STS 243..353
/gene="TP53"
/gene_synonym="FLJ92943; LFS1; p53; TRP53"
/standard_name="GDB:177724"
/db_xref="UniSTS:154952"
CDS 279..923
/gene="TP53"
/gene_synonym="FLJ92943; LFS1; p53; TRP53"
/note="isoform f is encoded by transcript variant 7; p53
tumor suppressor; phosphoprotein p53; p53 antigen; p53
transformation suppressor; transformation-related protein
53"
/codon_start=1
/product="tumor protein p53 isoform f"
/protein_id="NP_001119589.1"
/db_xref="GI:187830909"
/db_xref="GeneID:7157"
/db_xref="HGNC:11998"
/db_xref="MIM:191170"
/translation="MFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHM TEVVRRC
PHHERCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFRHSVVVPYEPPEVG SDCTTIHY
NYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRT EEENLRKK
GEPHHELPPGSTKRALPNNTSSSPQPKKKPLDGEYFTLQMLLDLRWCYFL INSS"
STS 360..434
/gene="TP53"
/gene_synonym="FLJ92943; LFS1; p53; TRP53"
/standard_name="PMC340938P3"
/db_xref="UniSTS:273171"
exon 442..554
/gene="TP53"
/gene_synonym="FLJ92943; LFS1; p53; TRP53"
/inference="alignment:Splign"
/number=6
exon 555..664
/gene="TP53"
/gene_synonym="FLJ92943; LFS1; p53; TRP53"
/inference="alignment:Splign"
/number=7
STS 635..833
/gene="TP53"
/gene_synonym="FLJ92943; LFS1; p53; TRP53"
/standard_name="PMC310707P2"
/db_xref="UniSTS:272633"
STS 639..713
/gene="TP53"
/gene_synonym="FLJ92943; LFS1; p53; TRP53"
/standard_name="GDB:190076"
/db_xref="UniSTS:155620"
exon 665..801
/gene="TP53"
/gene_synonym="FLJ92943; LFS1; p53; TRP53"
/inference="alignment:Splign"
/number=8
exon 802..875
/gene="TP53"
/gene_synonym="FLJ92943; LFS1; p53; TRP53"
/inference="alignment:Splign"
/number=9
exon 876..935
/gene="TP53"
/gene_synonym="FLJ92943; LFS1; p53; TRP53"
/inference="alignment:Splign"
/number=10b
exon 936..1042
/gene="TP53"
/gene_synonym="FLJ92943; LFS1; p53; TRP53"
/inference="alignment:Splign"
/number=11
exon 1043..2331
/gene="TP53"
/gene_synonym="FLJ92943; LFS1; p53; TRP53"
/inference="alignment:Splign"
/number=12
STS 1194..1310
/gene="TP53"
/gene_synonym="FLJ92943; LFS1; p53; TRP53"
/standard_name="D17S1678"
/db_xref="UniSTS:82485"
STS 1651..1797
/gene="TP53"
/gene_synonym="FLJ92943; LFS1; p53; TRP53"
/standard_name="D17S1506E"
/db_xref="UniSTS:151711"
STS 2186..2262
/gene="TP53"
/gene_synonym="FLJ92943; LFS1; p53; TRP53"
/standard_name="WI-20715"
/db_xref="UniSTS:59997"
polyA_signal 2295..2300
/gene="TP53"
/gene_synonym="FLJ92943; LFS1; p53; TRP53"
polyA_site 2312
/gene="TP53"
/gene_synonym="FLJ92943; LFS1; p53; TRP53"
ORIGIN
1 tgaggccagg agatggaggc tgcagtgagc tgtgatcaca ccactgtgct ccagcctgag
61 tgacagagca agaccctatc tcaaaaaaaa aaaaaaaaaa gaaaagctcc tgaggtgtag
121 acgccaactc tctctagctc gctagtgggt tgcaggaggt gcttacgcat gtttgtttct
181 ttgctgccgt cttccagttg ctttatctgt tcacttgtgc cctgactttc aactctgtct
241 ccttcctctt cctacagtac tcccctgccc tcaacaagat gttttgccaa ctggccaaga
301 cctgccctgt gcagctgtgg gttgattcca cacccccgcc cggcacccgc gtccgcgcca
361 tggccatcta caagcagtca cagcacatga cggaggttgt gaggcgctgc ccccaccatg
421 agcgctgctc agatagcgat ggtctggccc ctcctcagca tcttatccga gtggaaggaa
481 atttgcgtgt ggagtatttg gatgacagaa acacttttcg acatagtgtg gtggtgccct
541 atgagccgcc tgaggttggc tctgactgta ccaccatcca ctacaactac atgtgtaaca
601 gttcctgcat gggcggcatg aaccggaggc ccatcctcac catcatcaca ctggaagact
661 ccagtggtaa tctactggga cggaacagct ttgaggtgcg tgtttgtgcc tgtcctggga
721 gagaccggcg cacagaggaa gagaatctcc gcaagaaagg ggagcctcac cacgagctgc
781 ccccagggag cactaagcga gcactgccca acaacaccag ctcctctccc cagccaaaga
841 agaaaccact ggatggagaa tatttcaccc ttcagatgct acttgactta cgatggtgtt
901 acttcctgat aaactcgtcg taagttgaaa atattatccg tgggcgtgag cgcttcgaga
961 tgttccgaga gctgaatgag gccttggaac tcaaggatgc ccaggctggg aaggagccag
1021 gggggagcag ggctcactcc agccacctga agtccaaaaa gggtcagtct acctcccgcc
1081 ataaaaaact catgttcaag acagaagggc ctgactcaga ctgacattct ccacttcttg
1141 ttccccactg acagcctccc acccccatct ctccctcccc tgccattttg ggttttgggt
1201 ctttgaaccc ttgcttgcaa taggtgtgcg tcagaagcac ccaggacttc catttgcttt
1261 gtcccggggc tccactgaac aagttggcct gcactggtgt tttgttgtgg ggaggaggat
1321 ggggagtagg acataccagc ttagatttta aggtttttac tgtgagggat gtttgggaga
1381 tgtaagaaat gttcttgcag ttaagggtta gtttacaatc agccacattc taggtagggg
1441 cccacttcac cgtactaacc agggaagctg tccctcactg ttgaattttc tctaacttca
1501 aggcccatat ctgtgaaatg ctggcatttg cacctacctc acagagtgca ttgtgagggt
1561 taatgaaata atgtacatct ggccttgaaa ccacctttta ttacatgggg tctagaactt
1621 gacccccttg agggtgcttg ttccctctcc ctgttggtcg gtgggttggt agtttctaca
1681 gttgggcagc tggttaggta gagggagttg tcaagtctct gctggcccag ccaaaccctg
1741 tctgacaacc tcttggtgaa ccttagtacc taaaaggaaa tctcacccca tcccacaccc
1801 tggaggattt catctcttgt atatgatgat ctggatccac caagacttgt tttatgctca
1861 gggtcaattt cttttttctt tttttttttt ttttttcttt ttctttgaga ctgggtctcg
1921 ctttgttgcc caggctggag tggagtggcg tgatcttggc ttactgcagc ctttgcctcc
1981 ccggctcgag cagtcctgcc tcagcctccg gagtagctgg gaccacaggt tcatgccacc
2041 atggccagcc aacttttgca tgttttgtag agatggggtc tcacagtgtt gcccaggctg
2101 gtctcaaact cctgggctca ggcgatccac ctgtctcagc ctcccagagt gctgggatta
2161 caattgtgag ccaccacgtc cagctggaag ggtcaacatc ttttacattc tgcaagcaca
2221 tctgcatttt caccccaccc ttcccctcct tctccctttt tatatcccat ttttatatcg
2281 atctcttatt ttacaataaa actttgctgc cacctgtgtg tctgaggggt g
Then i'm supposed to mark sites of initiation and termination on the nucleotide sequence. I know the initiation codon is atg, so does that mean whenever I find atg in the sequence it is an initiation site?
For example in this line, is the place ive put the (()) around an initiation site?
1 tgaggccagg ag((atg))gaggc tgcagtgagc tgtgatcaca ccactgtgct ccagcctgag
How many initiation and termination sites should there be?
I'm so confused! Hopefully someone can set me straight, I tried talking to my lecturer but he isnt very good with english and has a hard time explaining. I've been trying to learn this course from a text book and can only get so far.
Thanks!