我们将起始密码子定义为ATG序列,并将末端密码子定义为TGA,TAA,TAG序列.
我遇到的问题是下面的代码只适用于前两个序列(DM208659和AF038953),但不适用于其余的序列.
我的方法下面有什么问题?
此代码可以从here复制粘贴.
#!/usr/bin/perl -wwhile (<DATA>) { chomp; print "$_\n"; my ($ID,$rna_sq) = split(/\s+/,$_); local $_ = $rna_sq; while (/atg/g) { my $start = pos() - 2; if (/tga|taa|tag/g) { my $stop = pos(); my $gene = substr( $_,$start - 1,$stop - $start + 1 ),$/; my $genelen = length($gene); my $ct = "$ID $start $stop $gene $genelen"; print "\t$ct\n"; } }}__DATA__DM208659 gtgggcctcaaatgtggagcactattctgatgtccaagtggaaagtgctgcgacatttgagcgtcacAF038953 gatcccagacctcggcttgcagtagtgttagactgaagataaagtaagtgctgtttgggctaacaggatctcctcttgcagtctgcagcccaggacgctgattccagcagcgccttaccgcgcagcccgaagattcactatggtgaaaatcgccttcaatacccctaccgccgtgcaaaaggaggaggcgcggcaagacgtggaggccctcctgagccgcacggtcagaactcagatactgaccggcaaggagctccgagttgccacccaggaaaaagagggctcctctgggagatgtatgcttactctcttaggcctttcattcatcttggcaggacttattgttggtggagcctgcatttacaagtacttcatgcccaagagcaccatttaccgtggagagatgtgcttttttgattctgaggatcctgcaaattcccttcgtggaggagagcctaacttcctgcctgtgactgaggaggctgacattcgtgaggatgacaacattgcaatcattgatgtgcctgtccccagtttctctgatagtgaccctgcagcaattattcatgactttgaaaagggaatgactgcttacctggacttgttgctggggaactgctatctgatgcccctcaatacttctattgttatgcctccaaaaaatctggtagagctctttggcaaactggcgagtggcagatatctgcctcaaacttatgtggttcgagaagacctagttgctgtggaggaaattcgtgatgttagtaaccttggcatctttatttaccaactttgcaataacagaaagtccttccgccttcgtcgcagagacctcttgctgggtttcaacaaacgtgccattgataaatgctggaagattagacacttccccaacgaatttattgttgagaccaagatctgtcaagagtaagaggcaacagatagagtgtccttggtaataagaagtcagagatttacaatatgactttaacattaaggtttatgggatactcaagatatttactcatgcatttactctattgcttatgccgtaaaaaaaaaaaaaaaaaaaaaaaaaaaaaBC021011 ggggagtccggggcggcgcctggaggcggagccgcccgctgggctaaatggggcagaggccgggaggggtgggggttccccgcgccgcagccatggagcagcttcgcgccgccgcccgtctgcagattgttctgDM208660 gggatactcaaaatgggggcgctttcctttttgtctgtactgggaagtgcttcgattttggggtgtcccAF038954 ggacccaagggggccttcgaggtgccttaggccgcttgccttgctctcagaatcgctgccgccatggctagtcagtctcaggggattcagcagctgctgcaggccgagaagcgggcagccgagaaggtgtccgaggcccgcaaaagaaagaaccggaggctgaagcaggccaaagaagaagctcaggctgaaattgaacagtaccgcctgcagagggagaaagaattcaaggccaaggaagctgcggcattgggatcccgtggcagttgcagcactgaagtggagaaggagacccaggagaagatgaccatcctccagacatacttccggcagaacagggatgaagtcttggacaacctcttggcttttgtctgtgacattcggccagaaatccatgaaaactaccgcataaatggatagaagagagaagcacctgtgctgtggagtggcattttagatgccctcacgaatatggaagcttagcacagctctagttacattcttaggagatggccattaaattatttccatatattataagagaggtccttccactttttggagagtagccaatctagctttttggtaacagacttagaaattagcaaagatgtccagctttttaccacagattcctgagggattttagatgggtaaatagagtcagactttgaccaggttttgggcaaagcacatgtatatcagtgtggacttttcctttcttagatctagtttaaaaaaaaaaaccccttaccattctttgaagaaaggaggggattaaataattttttcccctaacactttcttgaaggtcaggggctttatctatgaaaagttagtaaatagttctttgtaacctgtgtgaagcagcagccagccttaaagtagtccattcttgctaatggttagaacagtgaatactagtggaattgtttgggctgcttttagtttctcttaatcaaaattactagatgatagaattcaagaacttgttacatgtattacttggtgtatcgataatcatttaaaagtaaagactctgtcatgcaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa解决方法 我删除了$_的使用(当你本地化时我特别颤抖 – 你这样做是正确的,但为什么要强迫自己担心如果其他一些函数要破坏$_,而不是使用已经可用的$rna_sq?
另外我修正了$start和$stop为基于0的索引到字符串中(这使得数学的其余部分更加直接),并且提前计算了$genelen,因此可以直接在substr *** 作中使用. (或者,您可以本地化$[1以使用基于1的数组索引,请参阅perldoc perlvar.)
use strict;use warnings;while (my $line = <DATA>) { chomp $line; print "processing $line\n"; my ($ID,$line); while ($rna_sq =~ /atg/g) { # $start and $stop are 0-based indexes my $start = pos($rna_sq) - 3; # back up to include the start sequence # discard remnant if no stop sequence can be found last unless $rna_sq =~ /tga|taa|tag/g; my $stop = pos($rna_sq); my $genelen = $stop - $start; my $gene = substr($rna_sq,$start,$genelen); print "\t" . join(' ',$ID,$start+1,$stop,$gene,$genelen) . "\n"; }}总结
以上是内存溢出为你收集整理的如何从Perl中的DNA序列中提取起始和结束密码子?全部内容,希望文章能够帮你解决如何从Perl中的DNA序列中提取起始和结束密码子?所遇到的程序开发问题。
如果觉得内存溢出网站内容还不错,欢迎将内存溢出网站推荐给程序员好友。
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)