程序里面有很多小的知识点,大家要认真的看,才能发现:
a.fasta的DNA序列如下:
> sample dna | (This is a typical fasta header.) agatggcggcgctgaggggtcttgggggctctaggccggccacctactgg tttgcagcggagacgacgcatggggcctgcgcaataggagtacgctgcct gggaggcgtgactagaagcggaagtagttgtgggcgcctttgcaaccgcc tgggacgccgccgagtggtctgtgcaggttcgcgggtcgctggcgggggt cgtgagggagtgcgccgggagcggagatatggagggagatggttcagacc cagagcctccagatgccggggaggacagcaagtccgagaatggggagaat gcgcccatctactgcatctgccgcaaaccggacatcaactgcttcatgat cgggtgtgacaactgcaatgagtggttccatggggactgcatccggatca ctgagaagatggccaaggccatccgggagtggtactgtcgggagtgcaga gagaaagaccccaagctagagattcgctatcggcacaagaagtcacggga gcgggatggcaatgagcgggacagcagtgagccccgggatgagggtggag ggcgcaagaggcctgtccctgatccagacctgcagcgccgggcagggtca gggacaggggttggggccatgcttgctcggggctctgcttcgccccacaa atcctctccgcagcccttggtggccacacccagccagcatcaccagcagc agcagcagcagatcaaacggtcagcccgcatgtgtggtgagtgtgaggca tgtcggcgcactgaggactgtggtcactgtgatttctgtcgggacatgaa gaagttcgggggccccaacaagatccggcagaagtgccggctgcgccagt gccagctgcgggcccgggaatcgtacaagtacttcccttcctcgctctca ccagtgacgccctcagagtccctgccaaggccccgccggccactgcccac ccaacagcagccacagccatcacagaagttagggcgcatccgtgaagatg agggggcagtggcgtcatcaacagtcaaggagcctcctgaggctacagcc acacctgagccactctcagatgaggaccta
REBASE.txt的内容如下:
REBASE version 104 bionet.104 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= REBASE,The Restriction Enzyme Database http://rebase.neb.com copyright (c) Dr. Richard J. Roberts,2001. All rights reserved. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Rich Roberts Mar 30 2001 AaaI (XmaIII) C^GGCCG AacI (BamHI) GGATCC AaeI (BamHI) GGATCC AagI (ClaI) AT^CGAT AaqI (Apali) GTGCAC AarI CACCTGCNNNN^ AarI ^NNNNNNNNGCAGGTG AatI (StuI) AGG^CCT AatII GACGT^C AauI (Bsp1407I) T^GTACA AbaI (Bcli) T^GATCA AbeI (BbvCI) CC^TCAGC AbeI (BbvCI) GC^TGAGG AbrI (XhoI) C^TCGAG AcaI (AsuII) TTCGAA AcaII (BamHI) GGATCC AcaIII (MstI) TGCGCA AcaIV (HaeIII) GGCC AccI GT^MKAC AccII (FnuDII) CG^CG AccIII (BspMII) T^CCGGA Acc16I (MstI) TGC^GCA Acc36I (BspMI) ACCTGCNNNN^ Acc36I (BspMI) ^NNNNNNNNGCAGGT Acc38I (EcoRII) ccwGG Acc65I (KpnI) G^GTACC Acc113I (ScaI) AGT^ACT AccB1I (HgiCI) G^GYRCC AccB2I (HaeII) RGCGC^Y AccB7I (PflMI) CCANNNN^NTGG AccBSI (BsrBI) CCG^CTC AccBSI (BsrBI) GAG^CGG AccEBI (BamHI) G^GATCC AceI (TseI) G^CWGC AceII (NheI) GCTAG^C AceIII CAGCTCNNNNNNN^ AceIII ^NNNNNNNNNNNGAGCTG AciI C^CGC AciI G^CGG Acli AA^CGTT AclNI (SpeI) A^CTAGT AclWI (BinI) GGATCNNNN^
这里面要求两次输入要读取的文件,第一次读取的是a.fasta。第二次读取的是REBASE.txt
sub IUB_to_regexp { # A subroutine that,given a sequence with IUB ambiguity codes,# outputs a translation with IUB codes changed to regular Expressions # These are the IUB ambiguity codes my($iub) = @_; my $regular_Expression = ''; my %iub2character_class = ( # 这里除了四种常用的碱基外,用来表明核酸序列中不常见或不明确的碱基 A => 'A',C => 'C',G => 'G',T => 'T',R => '[GA]',#R可以代表G或者A中一种 Y => '[CT]',M => '[AC]',K => '[GT]',S => '[GC]',W => '[AT]',B => '[CGT]',D => '[AGT]',H => '[ACT]',V => '[ACG]',N => '[ACGT]',); # Remove the ^ signs from the recognition sites $iub =~ s/\^//g; # Translate each character in the iub sequence for ( my $i = 0 ; $i < length($iub) ; ++$i ) { $regular_Expression .= $iub2character_class{substr($iub,$i,1)}; } return $regular_Expression; } sub get_file_data { # A subroutine to get data from a file given its filename #读取文件的子序列 my $dna_filename; my @filedata; print "please input the Path just like this f:\\perl\\data.txt\n"; chomp($dna_filename=<STDIN>); open(DNAfilename,$dna_filename)||dIE("can not open the file!"); @filedata = <DNAfilename>; close DNAfilename; return @filedata;#子函数的返回值一定要记住写 } sub parseREBASE { my($rebasefile) = @_; use strict; use warnings; my @rebasefile = ( ); my %rebase_hash = ( ); my $name; my $site; my $regexp; # Read in the REBASE file @rebasefile = get_file_data($rebasefile); foreach ( @rebasefile ) { # discard header lines ( 1 .. /Rich Roberts/ ) and next; # discard blank lines /^\s*$/ and next; # Split the two (or three if includes parenthesized name) fIElds my @fIElds = split( " ",$_); # Get and store the name and the recognition site # by not saving the mIDdle fIEld,if any,# just the first and last $name = $fIElds[0]; $site = $fIElds[-1]; # Translate the recognition sites to regular Expressions $regexp = IUB_to_regexp($site); # Store the data into the hash # $site 表示位点序列,$regexp 表示位点的可以和DNA序列匹配的位点序列 $rebase_hash{$name} = "$site $regexp"; } # Return the hash containing the reformatted REBASE data return %rebase_hash; } sub extract_sequence_from_fasta_data { #******************************************************************* # A subroutine to extract FASTA sequence data from an array # 得到其中的序列 # fasta格式介绍: # 包括三个部分 # 1.第一行中以>开头的注释行,后面是名称和序列的来源 # 2.标准单字母符号的序列 # 3.*表示结尾 #******************************************************************* my (@fasta_file_data) =@_; my $sequence =' '; foreach my $line (@fasta_file_data) { #这里忽略空白行 if ($line=~/^\s*$/) { next; } #忽略注释行 elsif($line=~/^\s*#/) { next; } #忽略fasta的第一行 elsif($line=~/^>/) { next; } else { $sequence .=$line; } } $sequence=~s/\s//g; return $sequence; } sub match_positions{ my ($regexp,$sequence) = @_; use strict; my @positions =( ); while($sequence=~/$regexp/ig) { push (@positions,pos($sequence)-length($&)+1) # pos返回最后一次匹配的位置 # $&代表匹配的位置,$`代表匹配位置之前的位置,$'代表匹配位置之后的位置 } return @positions;}use strict;use warnings;my %rebase_hash = ( );my @file_data = ( );my $query = '';my $dna = '';my $recognition_site= '';my $regexp = '';my @locations = ( );@file_data = get_file_data( );$dna = extract_sequence_from_fasta_data(@file_data);%rebase_hash = parseREBASE();do { print "Please input restriction enzyme name\n"; chomp($query=<STDIN>); if ($query=~/^\s*$/) { exit; } if ($rebase_hash{$query}) { if ($rebase_hash{$query}) { ($recognition_site,$regexp) = split (" ",$rebase_hash{$query}); @locations = match_positions($regexp,$dna); if (@locations) { print "searching for $query $recognition_site $regexp\n"; print "A restriction site for $query at locations:\n"; print join(" ",@locations),"\n"; } else { print "A restriction site for $query is not in the DNA:\n"; } } print "\n"; }}until ($query=~/quit/);
最后的 结果如下:
F:\>perl\a.plplease input the Path just like this f:\perl\data.txtf:\perl\a.fastaplease input the Path just like this f:\perl\data.txtf:\perl\REBASE.txtPlease input restriction enzyme nameAbeIsearching for AbeI GC^TGAGG GCTGAGGA restriction site for AbeI at locations:11Please input restriction enzyme name总结
以上是内存溢出为你收集整理的perl应用:DNA序列酶切图谱的创建全部内容,希望文章能够帮你解决perl应用:DNA序列酶切图谱的创建所遇到的程序开发问题。
如果觉得内存溢出网站内容还不错,欢迎将内存溢出网站推荐给程序员好友。
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)