perl应用：DNA序列酶切图谱的创建_语言综合

概述程序里面有很多小的知识点，大家要认真的看，才能发现： a.fasta的DNA序列如下： > sample dna | (This is a typical fasta header.) agatggcggcgctgaggggtcttgggggctctaggccggccacctactgg tttgcagcggagacgacgcatggggcctgcgcaataggagtacgctgcct gg

程序里面有很多小的知识点，大家要认真的看，才能发现：

a.fasta的DNA序列如下：

> sample dna | (This is a typical fasta header.) agatggcggcgctgaggggtcttgggggctctaggccggccacctactgg tttgcagcggagacgacgcatggggcctgcgcaataggagtacgctgcct gggaggcgtgactagaagcggaagtagttgtgggcgcctttgcaaccgcc tgggacgccgccgagtggtctgtgcaggttcgcgggtcgctggcgggggt cgtgagggagtgcgccgggagcggagatatggagggagatggttcagacc cagagcctccagatgccggggaggacagcaagtccgagaatggggagaat gcgcccatctactgcatctgccgcaaaccggacatcaactgcttcatgat cgggtgtgacaactgcaatgagtggttccatggggactgcatccggatca ctgagaagatggccaaggccatccgggagtggtactgtcgggagtgcaga gagaaagaccccaagctagagattcgctatcggcacaagaagtcacggga gcgggatggcaatgagcgggacagcagtgagccccgggatgagggtggag ggcgcaagaggcctgtccctgatccagacctgcagcgccgggcagggtca gggacaggggttggggccatgcttgctcggggctctgcttcgccccacaa atcctctccgcagcccttggtggccacacccagccagcatcaccagcagc agcagcagcagatcaaacggtcagcccgcatgtgtggtgagtgtgaggca tgtcggcgcactgaggactgtggtcactgtgatttctgtcgggacatgaa gaagttcgggggccccaacaagatccggcagaagtgccggctgcgccagt gccagctgcgggcccgggaatcgtacaagtacttcccttcctcgctctca ccagtgacgccctcagagtccctgccaaggccccgccggccactgcccac ccaacagcagccacagccatcacagaagttagggcgcatccgtgaagatg agggggcagtggcgtcatcaacagtcaaggagcctcctgaggctacagcc acacctgagccactctcagatgaggaccta

REBASE.txt的内容如下：

REBASE version 104                                              bionet.104       =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=     REBASE,The Restriction Enzyme Database   http://rebase.neb.com     copyright (c)  Dr. Richard J. Roberts,2001.   All rights reserved.     =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=   Rich Roberts  Mar 30 2001   AaaI (XmaIII)                     C^GGCCG AacI (BamHI)                      GGATCC AaeI (BamHI)                      GGATCC AagI (ClaI)                       AT^CGAT AaqI (Apali)                      GTGCAC AarI                              CACCTGCNNNN^ AarI                              ^NNNNNNNNGCAGGTG AatI (StuI)                       AGG^CCT AatII                             GACGT^C AauI (Bsp1407I)                   T^GTACA AbaI (Bcli)                       T^GATCA AbeI (BbvCI)                      CC^TCAGC AbeI (BbvCI)                      GC^TGAGG AbrI (XhoI)                       C^TCGAG AcaI (AsuII)                      TTCGAA AcaII (BamHI)                     GGATCC AcaIII (MstI)                     TGCGCA AcaIV (HaeIII)                    GGCC AccI                              GT^MKAC AccII (FnuDII)                    CG^CG AccIII (BspMII)                   T^CCGGA Acc16I (MstI)                     TGC^GCA Acc36I (BspMI)                    ACCTGCNNNN^ Acc36I (BspMI)                    ^NNNNNNNNGCAGGT Acc38I (EcoRII)                   ccwGG Acc65I (KpnI)                     G^GTACC Acc113I (ScaI)                    AGT^ACT AccB1I (HgiCI)                    G^GYRCC AccB2I (HaeII)                    RGCGC^Y AccB7I (PflMI)                    CCANNNN^NTGG AccBSI (BsrBI)                    CCG^CTC AccBSI (BsrBI)                    GAG^CGG AccEBI (BamHI)                    G^GATCC AceI (TseI)                       G^CWGC AceII (NheI)                      GCTAG^C AceIII                            CAGCTCNNNNNNN^ AceIII                            ^NNNNNNNNNNNGAGCTG AciI                              C^CGC AciI                              G^CGG Acli                              AA^CGTT AclNI (SpeI)                      A^CTAGT AclWI (BinI)                      GGATCNNNN^

这里面要求两次输入要读取的文件，第一次读取的是a.fasta。第二次读取的是REBASE.txt

sub IUB_to_regexp {     # A subroutine that,given a sequence with IUB ambiguity codes,# outputs a translation with IUB codes changed to regular Expressions     # These are the IUB ambiguity codes       my($iub) = @_;      my $regular_Expression = '';      my %iub2character_class = 	(             	    # 这里除了四种常用的碱基外，用来表明核酸序列中不常见或不明确的碱基         A => 'A',C => 'C',G => 'G',T => 'T',R => '[GA]',#R可以代表G或者A中一种         Y => '[CT]',M => '[AC]',K => '[GT]',S => '[GC]',W => '[AT]',B => '[CGT]',D => '[AGT]',H => '[ACT]',V => '[ACG]',N => '[ACGT]',);      # Remove the ^ signs from the recognition sites     $iub =~ s/\^//g;      # Translate each character in the iub sequence     for ( my $i = 0 ; $i < length($iub) ; ++$i ) 	{         $regular_Expression .= $iub2character_class{substr($iub,$i,1)};     }      return $regular_Expression; } sub get_file_data  {        # A subroutine to get data from a file given its filename      #读取文件的子序列      my $dna_filename;      my @filedata;      print "please input the Path just like this f:\\perl\\data.txt\n";         chomp($dna_filename=<STDIN>);       open(DNAfilename,$dna_filename)||dIE("can not open the file!");          @filedata     = <DNAfilename>;        close DNAfilename;        return @filedata;#子函数的返回值一定要记住写  }  sub parseREBASE { 	my($rebasefile) = @_;      use strict;     use warnings;      my @rebasefile = (  );     my %rebase_hash = (  );     my $name;     my $site;     my $regexp;      # Read in the REBASE file     @rebasefile = get_file_data($rebasefile);      foreach ( @rebasefile ) 	{          # discard header lines         ( 1 .. /Rich Roberts/ ) and next;          # discard blank lines         /^\s*$/ and next;              # Split the two (or three if includes parenthesized name) fIElds         my @fIElds = split( " ",$_);          # Get and store the name and the recognition site          # by not saving the mIDdle fIEld,if any,# just the first and last         $name = $fIElds[0];          $site = $fIElds[-1];          # Translate the recognition sites to regular Expressions         $regexp = IUB_to_regexp($site);          # Store the data into the hash 		# $site 表示位点序列，$regexp 表示位点的可以和DNA序列匹配的位点序列        $rebase_hash{$name} = "$site $regexp";     }      # Return the hash containing the reformatted REBASE data     return %rebase_hash; } sub extract_sequence_from_fasta_data    {        #*******************************************************************        # A subroutine to extract FASTA sequence data from an array        # 得到其中的序列        # fasta格式介绍：        # 包括三个部分        # 1.第一行中以>开头的注释行，后面是名称和序列的来源        # 2.标准单字母符号的序列        # 3.*表示结尾        #*******************************************************************            my (@fasta_file_data) =@_;        my $sequence =' ';        foreach my $line (@fasta_file_data)        {            #这里忽略空白行            if ($line=~/^\s*$/)            {                next;            }            #忽略注释行            elsif($line=~/^\s*#/)            {                next;            }            #忽略fasta的第一行            elsif($line=~/^>/)            {                next;            }            else            {                $sequence .=$line;            }        }        $sequence=~s/\s//g;        return $sequence;    }    sub match_positions{	my ($regexp,$sequence) = @_;	use strict;	my @positions          =( );	while($sequence=~/$regexp/ig)	{		push (@positions,pos($sequence)-length($&)+1)		# pos返回最后一次匹配的位置		# $&代表匹配的位置，$`代表匹配位置之前的位置，$'代表匹配位置之后的位置	}	return @positions;}use strict;use warnings;my %rebase_hash     =  ( );my @file_data       =  ( );my $query           =  '';my $dna             =  '';my $recognition_site=  '';my $regexp          =  '';my @locations       =  ( );@file_data          =  get_file_data( );$dna                =  extract_sequence_from_fasta_data(@file_data);%rebase_hash        =  parseREBASE();do {	print "Please input restriction enzyme name\n";	chomp($query=<STDIN>);	if ($query=~/^\s*$/)	{		exit;	}	if ($rebase_hash{$query})	{		if ($rebase_hash{$query})		{			($recognition_site,$regexp) = split (" ",$rebase_hash{$query});			@locations                  = match_positions($regexp,$dna);			if (@locations)			{				print "searching for $query $recognition_site $regexp\n";				print "A restriction site for $query at locations:\n";				print join(" ",@locations),"\n";			}			else			{				print "A restriction site for $query is not in the DNA:\n";			}		}		print "\n";	}}until ($query=~/quit/);

最后的结果如下：

F:\>perl\a.plplease input the Path just like this f:\perl\data.txtf:\perl\a.fastaplease input the Path just like this f:\perl\data.txtf:\perl\REBASE.txtPlease input restriction enzyme nameAbeIsearching for AbeI GC^TGAGG GCTGAGGA restriction site for AbeI at locations:11Please input restriction enzyme name

总结

以上是内存溢出为你收集整理的perl应用：DNA序列酶切图谱的创建全部内容，希望文章能够帮你解决perl应用：DNA序列酶切图谱的创建所遇到的程序开发问题。

如果觉得内存溢出网站内容还不错，欢迎将内存溢出网站推荐给程序员好友。

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/langs/1293093.html

perl应用：DNA序列酶切图谱的创建

发表评论

评论列表（0条）