perl – 如何在蛋白质序列(string)中找到多个motifs(substring)？_语言综合

概述以下脚本用于在蛋白质序列中找到一个基序. use strict;use warnings;my @file_data=();my $protein_seq='';my $h= '[VLIM]'; my $s= '[AG]';my $x= '[ARNDCEQGHILKMFPSTWYV]';my $regexp = "($h){4}D($x){4}D"; #motif to be 以下脚本用于在蛋白质序列中找到一个基序.

use strict;use warnings;my @file_data=();my $protein_seq='';my $h= '[VliM]';   my $s= '[AG]';my $x= '[ARNDCEQGHILKmfpSTWYV]';my $regexp = "($h){4}D($x){4}D"; #motif to be searched is hhhhDxxxxDmy @locations=();@file_data= get_file_data("seq.txt");$protein_seq= extract_sequence(@file_data); #searching for a motif hhhhDxxxxD in each protein sequence in the give fileforeach my $line(@file_data){    if ($motif=~ /$regexp/){        print "found motif \n\n";      } else {        print "not found \n\n";    }}#recording the location/position of motif to be outputed@locations= match_position($regexp,$seq);if (@locations){     print "Searching for motifs $regexp \n";    print "Catalytic site is at location:\n";  } else {    print "motif not found \n\n";}exit;sub get_file_data{    my ($filename)=@_;    use strict;    use warnings;    my $sequence='';    foreach my $line(@fasta_file_data){        if ($line=~ /^\s*(#.*)?|^>/{            next;          }         else {            $sequence.=$line;        }    }    $sequence=~ s/\s//g;    return $sequence;}sub(match_positions) {    my ($regexp,$sequence)=@_;    use strict;    my @position=();    while ($sequence=~ /$regexp/ig){        push (@position,$-[0]);    }    return @position;}

我不知道如何扩展这个以在含有蛋白质序列的给定文件中找到多个基序(以固定顺序,即motif1,motif2,motif3).

解决方法您可以简单地使用序列的替换(由|分隔).这样,正则表达式引擎的每个序列都可以匹配它.