# -------------------------------------------------------------------------------------------------- ## |5| Dotplots and Sequence Alignments # -------------------------------------------------------------------------------------------------- ## Dot Plots # Nucleic Acid Dot Plots using the Molecular Toolkit (http://arbl.cvmbs.colostate.edu/molkit/) a1) Generate a random DNA sequence with GC content of 0.5 (http://www.faculty.ucr.edu/~mmaduro/random.htm) a2) Copy paste the sequence into both windows, choose windows size of 9, and click [Make Plot] a3) Generate a new random DNA sequence with GC content of 0.1 and repeat step a1-a3 a4) Change windows size to 5 and mismatch to 1 a5) Change windows size to 11 and mismatch to 2 and compare the plot with a5 # Nucleic Acid Dot Plots using Dotlet (http://myhits.isb-sib.ch/cgi-bin/dotlet) (b1) [Input]: Upload the following four sequences >S1 P05049 SNAK_DROME Serine protease snake OS=Drosophila melanogaster GN=snk PE=1 SV=2 MIILWSLIVHLQLTCLHLILQTPNLEALDALEIINYQTTKYTIPEVWKEQPVATIGEDVD DQDTEDEESYLKFGDDAEVRTSVSEGLHEGAFCRRSFDGRSGYCILAYQCLHVIREYRVH GTRIDICTHRNNVPVICCPLADKHVLAQRISATKCQEYNAAARRLHLTDTGRTFSGKQCV PSVPLIVGGTPTRHGLFPHMAALGWTQGSGSKDQDIKWGCGGALVSELYVLTAAHCATSG SKPPDMVRLGARQLNETSATQQDIKILIIVLHPKYRSSAYYHDIALLKLTRRVKFSEQVR PACLWQLPELQIPTVVAAGWGRTEFLGAKSNALRQVDLDVVPQMTCKQIYRKERRLPRGI IEGQFCAGYLPGGRDTCQGDSGGPIHALLPEYNCVAFVVGITSFGKFCAAPNAPGVYTRL YSYLDWIEKIAFKQH >S2 P08246 ELNE_HUMAN Neutrophil elastase OS=Homo sapiens GN=ELANE PE=1 SV=1 MTLGRRLACLFLACVLPALLLGGTALASEIVGGRRARPHAWPFMVSLQLRGGHFCGATLI APNFVMSAAHCVANVNVRAVRVVLGAHNLSRREPTRQVFAVQRIFENGYDPVNLLNDIVI LQLNGSATINANVQVAQLPAQGRRLGNGVQCLAMGWGLLGRNRGIASVLQELNVTVVTSL CRRSNVCTLVRGRQAGVCFGDSGSPLVCNGLIHGIASFVRGGCASGLYPDAFAPVAQFVN WIDSIIQRSEDNPCPHPRDPDPASRTH >S3 Q9P255 ZN492_HUMAN Zinc finger protein 492 OS=Homo sapiens GN=ZNF492 PE=2 SV=2 MLENYRNLVFVGIAASKPDLITCLEQGKEPWNVKRHEMVAEPPVVCSYFARDLWPKQGKK NYFQKVILRRYKKCGCENLQLRKYCKSMDECKVHKECYNGLNQCLTTTQNKIFQCDKYVK VFHKFSNSNRHTIRHTGKKSFKCKECEKSFCMLSHLAQHKRIHSGEKPYKCKECGKAYNE TSNLSTHKRIHTGKKPYKCEECGKAFNRLSHLTTHKIIHTGKKPYKCEECGKAFNQSANL TTHKRIHTGEKPYKCEECGRAFSQSSTLTAHKIIHAGEKPYKCEECGKAFSQSSTLTTHK IIHTGEKFYKCEECGKAFSQLSHLTTHKRIHSGEKPYKCEECGKAFKQSSTLTTHKRIHA GEKFYKCEVCSKAFSRFSHLTTHKRIHTGEKPYKCEECGKAFNLSSQLTTHKIIHTGEKP YKCEECGKAFNQSSTLSKHKVIHTGEKPYKYEECGKAFNQSSHLTTHKMIHTGEKPYKCE ECGKAFNNSSILNRHKMIHTGEKLYKPESCNNACDNIAKISKYKRNCAGEK >S4 P21997 SSGP_VOLCA Sulfated surface glycoprotein 185 OS=Volvox carteri PE=1 SV=1 MSKLLLVALFGAIAVVATSAEVLNLNGRSLLNNDDPNAFPYCKCTYRQRRSPYRLKYVGA ENNYKGNDWLCYSIVLDTTGTVCQTVPLTEPCCSADLYKIEFDVKPSCKGTVTRAMVFKG IDRTVGGVRVLESISTVGIDDVTGVPGAAILRIVKDLALPYSVVASFLPNGLPVCINRVP GSCTFPELFMDVNGTASYSVFNSDKDCCPTGLSGPNVNPIGPAPNNSPLPPSPQPTASSR PPSPPPSPRPPSPPPPSPSPPPPPPPPPPPPPPPPPSPPPPPPPPPPPPPPPPPPSPSPP RKPPSPSPPVPPPPSPPSVLPAATGFPFCECVSRSPSSYPWRVTVANVSAVTISGGAGER VCLKISVDNAAAATCNNGLGGCCSDGLEKVELFANGKCKGSILPFTLSNTAEIRSSFSWD STRPVLKFTRLGLTYAQGVAGGSLCFNIKGAGCTKFADLCPGRGCTVAVFNNPDNTCCPR VGTIA (b2) Compare sequence #1 with itself (b3) Compare sequence #3 with itself (windows size 21, zoom 1:2) (b4) Compare sequence #4 with itself (windows size 7, zoom 1:2) (b5) Compare sequence #1 with #2 (windows size 51, zoom 1:2) # Sequence comparison: (c1) Design a sequence with an inverted region in the middle and compare it with it self using dot plot approach. You might use the Molecular Toolkit to create the sequence (http://arbl.cvmbs.colostate.edu/molkit/). ## Alignments (a) Calculate "identity" and "similarity" for the two sequences below. Use a binary (0/1) matrix and apply a -1 penalty for gaps. (Hit: you could use a spreadsheet application like Excel) >SeqWordA THECATISINTHEHOUSE >SeqWordB THETWOCATSAREONTHEHOUSE # For the next exercises you might use one of the following web based alignment applications: Pairwise: LALIGN (http://www.ch.embnet.org/software/LALIGN_form.html) MSA: ClustalW2 (http://www.ebi.ac.uk/Tools/msa/clustalw2/) MAFFT (http://www.ebi.ac.uk/Tools/msa/mafft/) (b2) Align the two DNA sequences - How many of the 111 sites are identical? >Seq_2b1 TGTAGTAGAGGCGACGCGGGTGCGGTCATCACTAATAAGGATATTAATCAACATAGTAGAAAAACTCACAGGCCTCCGCCTTTAGGCGGTGCTTACTCTTAC >Seq_2b2 TGTAGTAGAAGGCGACGCGGGTGCGGTCATCACTAATAAGGATATTAATAAACAGCACAGCAAGTTGTCAAAAACTCACAGGCCTCCGCCTTTAGGCGGTGCTTACTCTTAC (b3) Translate the DNA into a protein sequence and re-align the sequences. Is the alignment better? What happen? (b4) Try to find a way to get a better protein alinment. (c) Global vs. local alignment (c1) What is better, global or local alignments? (c2) Use the following two sequences and perfom a local and a global alignment. You can use LALIGN for the alignments. http://www.ch.embnet.org/software/LALIGN_form.html >Seq_2c1 GCGATTTAGCGTGACAGCCCCAGGGAACCCACAAAATGTGATCGCAGTCCATCCGATCGTACACAGAAAGGAAGGTCCCCATACACCGACGCACCTGTTT >Seq_2c2 ACACGTCGTATGCATAAACGAGCCGCACGAACCAGAGAGCATAAAGAGGACCTCTAGTTCTGACAGCCCCAGGGAACCCACAAAATGCCAAGATGCCTTA (d1) Align the following four protein sequences. Do these sequences have something in common? >PS_2d1 DGVPARQVILDTCGARRDDMSAALQCSNSMQRQINLKGRVIAVGKSRGLPKRKYTNPAINYRKSFKGWAFQVIYFGFSAIGELELSLLLIIGPRDEA >PS_2d2 YLSDSPQEFAMQGYVLLFGQGISYPAKELKADVEKPARQVILDTCGARRDDMSAAFIHARTTSSAGCKKVERWNSPARLSRFIVKPNIPR >PS_2d3 AAFAIKESEELEGSKIVCDPVGGHGPVFRDSSELFRAKNVPELALVYKTQAGQYLLLEMNIPARQVILDTCGARRDDMSAARADKFDGTRSLTNGIVRLP >PS_2d4 EDKQQEPARQVILDTCGARRDDMSAARPHENVPVETAGFVKSGTWSLSFLKFSRGITLSLVEDLLEPAISQNAFSGLQANKVMVDHLDELTYDY (d2) Generate a sequence logo with the aligned protein sequences using Weblogo (http://weblogo.berkeley.edu/logo.cgi)