《Sequencing and Sequence Alignment》由會員分享,可在線閱讀,更多相關《Sequencing and Sequence Alignment(47頁珍藏版)》請在裝配圖網(wǎng)上搜索。
1、Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,*,Lecture 2.4,1,Sequencing&Sequence Alignment,2,Objectives,Understand how DNA sequence data is collected and prepared,Be aware of the importance of sequence searching and sequence alig
2、nment in biology and medicine,Be familiar with the different algorithms and scoring schemes used in sequence searching and sequence alignment,3,High Throughput DNA Sequencing,4,30,000,5,Shotgun Sequencing,Isolate,Chromosome,ShearDNA,into Fragments,Clone into,Seq.Vectors,Sequence,6,Principles of DNA
3、Sequencing,Primer,PBR322,Amp,Tet,Ori,DNA fragment,Denature with,heat to produce,ssDNA,Klenow+ddNTP,+dNTP+primers,7,The Secret to Sanger Sequencing,8,Principles of DNA Sequencing,5,5 Primer,3 Template,G C A T G C,dATP,dCTP,dGTP,dTTP,ddATP,dATP,dCTP,dGTP,dTTP,ddCTP,dATP,dCTP,dGTP,dTTP,ddTTP,dATP,dCTP,
4、dGTP,dTTP,ddCTP,G,ddC,GCATG,ddC,GC,ddA,GCA,ddT,ddG,GCAT,ddG,9,Principles of DNA Sequencing,G,C,T,A,+,_,+,_,G,C,A,T,G,C,10,Capillary Electrophoresis,Separation by Electro-osmotic Flow,11,Multiplexed CE with Fluorescent detection,ABI 3700,96x700 bases,12,Shotgun Sequencing,Sequence,Chromatogram,Send t
5、o Computer,Assembled,Sequence,13,Shotgun Sequencing,Very efficient process for small-scale(10 kb)sequencing(preferred method),First applied to whole genome sequencing in 1995(,H.influenzae,),Now standard for all prokaryotic genome sequencing projects,Successfully applied to,D.melanogaster,Moderately
6、 successful for,H.sapiens,14,The Finished Product,GATTACAGATTACAGATTACAGATTACAGATTACAG,ATTACAGATTACAGATTACAGATTACAGATTACAGA,TTACAGATTACAGATTACAGATTACAGATTACAGAT,TACAGATTAGAGATTACAGATTACAGATTACAGATT,ACAGATTACAGATTACAGATTACAGATTACAGATTA,CAGATTACAGATTACAGATTACAGATTACAGATTAC,AGATTACAGATTACAGATTACAGATTAC
7、AGATTACA,GATTACAGATTACAGATTACAGATTACAGATTACAG,ATTACAGATTACAGATTACAGATTACAGATTACAGA,TTACAGATTACAGATTACAGATTACAGATTACAGAT,15,Sequencing Successes,T7 bacteriophage,completed in 1983,39,937 bp,59 coded proteins,Escherichia coli,completed in 1998,4,639,221 bp,4293 ORFs,Sacchoromyces cerevisae,completed i
8、n 1996,12,069,252 bp,5800 genes,16,Sequencing Successes,Caenorhabditis elegans,completed in 1998,95,078,296 bp,19,099 genes,Drosophila melanogaster,completed in 2000,116,117,226 bp,13,601 genes,Homo sapiens,1st draft completed in 2001,3,160,079,000 bp,31,780 genes,17,So what do we do with all this s
9、equence data?,18,Sequence Alignment,19,Alignments tell us about.,Function or activity of a new gene/protein,Structure or shape of a new protein,Location or preferred location of a protein,Stability of a gene or protein,Origin of a gene or protein,Origin or phylogeny of an organelle,Origin or phyloge
10、ny of an organism,20,Factoid:,Sequence comparisons,lie at the heart of all,bioinformatics,21,Similarity versus Homology,Similarity refers to the likeness or%identity between 2 sequences,Similarity means sharing a statistically significant number of bases or amino acids,Similarity does not imply homo
11、logy,Homology refers to shared ancestry,Two sequences are homologous is they are derived from a common ancestral sequence,Homology usually implies similarity,22,Similarity versus Homology,Similarity can be quantified,It is correct to say that two sequences are X%identical,It is correct to say that t
12、wo sequences have a similarity score of Z,It is generally,incorrect,to say that two sequences are X%,similar,23,Homology cannot be quantified,If two sequences have a high%identity it is OK to say they are homologous,It is,incorrect,to say two sequences have a homology score of Z,It is,incorrect,to s
13、ay two sequences are X%homologous,Similarity versus Homology,24,Sequence Complexity,MCDEFGHIKLAN.,High Complexity,ACTGTCACTGAT.,Mid Complexity,NNNNTTTTTNNN.,Low Complexity,Translate those DNA sequences!,25,Assessing Sequence Similarity,THESTORYOFGENESIS,THISBOOKONGENETICS,THESTORYOFGENESI-S,THISBOOK
14、ONGENETICS,THE STORY OF GENESIS,THIS BOOK ON GENETICS,Two Character,Strings,Character,Comparison,Context,Comparison,*,26,Assessing Sequence Similarity,Rbn KETAAAKFERQHMD,LszKVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNT,RbnSST SAASSSNYCNQMMKSRNLTKDRCKPMNTFVHESLA,LszQATNRNTDGSTDYGILQINSRWWCNDGRTP GSRN,RbnD
15、VQAVCSQKNVACKNGQTNCYQSYSTMSITDCRETGSSKY,LszLCNIPCSALLSSDITASVNC AKKIVSDGDGMNAWVAWR,RbnPNACYKTTQANKHIIVACEGNPYVPHFDASV,LszNRCKGTDVQA WIRGCRL,is this alignment significant?,27,Is This Alignment Significant?,28,Some Simple Rules,If two sequence are 100 residues and 25%identical,they are likely related,
16、If two sequences are 15-25%identical they,may,be related,but more tests are needed,If two sequences are 15%identical they are probably not related,If you need more than 1 gap for every 20 residues the alignment is suspicious,29,Doolittles Rules of Thumb,Twilight Zone,30,Sequence Alignment-Methods,Dot Plots,Dynamic Programming,Heuristic(Fast)Local Alignment,Multiple Sequence Alignment,Contig Assembly,31,PAM Matrices,Developed by M.O.Dayhoff(1978),PAM=Point Accepted Mutation,Matrix assembled by lo