exonerate结果文件提取(蛋白序列)

exonerate结果文件提取(蛋白序列),第1张

近期在使用exonerate进行蛋白比对基因,对其结果log文件未找到方便提取的脚本,自己写了一个,python脚本(未进行优化,欢迎优化评论)
使用:python  脚本.py  log文件

思路就是:把结果Target行提取出,生成初步的三个字母的蛋白文件,再次对三个字母的蛋白文件处理转化为单个字母蛋白文件,如此即可

log文件如下:

C4 Alignment:
------------
         Query: test
        Target: Chr09a
         Model: protein2genome:local
     Raw score: 10528
   Query range: 0 -> 2034
  Target range: 4150993 -> 4157095

       1 : MetThrLeuSerGlyAspIleLysAlaLeuValAspAsnProGluSerPheLeuAr :      19
           ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
           MetThrLeuSerGlyAspIleLysAlaLeuValAspAsnProGluSerPheLeuAr
 4150994 : ATGACTCTCTCTGGCGATATTAAAGCGTTGGTGGACAATCCAGAATCCTTTTTAAG : 4151048

      20 : gAspAsnArgLeuGlyPheAsnLeuAsnArgAsnIleAlaArgLysAspGlnLeuV :      38
           ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
           gAspAsnArgLeuGlyPheAsnLeuAsnArgAsnIleAlaArgLysAspGlnLeuV
 4151049 : GGATAATCGTCTGGGCTTCAACCTCAATCGCAACATAGCGAGGAAAGACCAGCTTG : 4151105

      39 : alLysLeuValArgValThrAlaAsnSerTyrAspLeuLysPheSerGluThrGlu :      56
           ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
           alLysLeuValArgValThrAlaAsnSerTyrAspLeuLysPheSerGluThrGlu
 4151106 : TAAAACTGGTTCGAGTCACAGCGAACTCGTACGATCTTAAATTTTCCGAGACAGAG : 4151159

      57 : SerGluGluAsnThrIleSerSerTyrIleLeuGlyTyrLysThrAsnGluAlaAs :      75
           ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
           SerGluGluAsnThrIleSerSerTyrIleLeuGlyTyrLysThrAsnGluAlaAs
 4151160 : TCAGAGGAAAACACGATATCCAGCTACATCCTTGGATACAAGACGAACGAAGCAAA : 4151216

      76 : nAspAlaValPheLeuAspIleProSerArgGlyValLysGluGlyThrPheLeuP :      94
           ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
           nAspAlaValPheLeuAspIleProSerArgGlyValLysGluGlyThrPheLeuP
 4151217 : TGATGCCGTGTTTCTGGACATCCCGAGCAGAGGCGTGAAGGAGGGAACATTTTTGT : 4151273

      95 : heThrSerGluLeuSerGlyCysSerLeuValValThrArgLeuLysAspAspThr :     112
           ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
           heThrSerGluLeuSerGlyCysSerLeuValValThrArgLeuLysAspAspThr
 4151274 : TCACATCTGAACTCTCCGGCTGCTCCCTCGTCGTCACACGGCTGAAAGATGATACA : 4151327

代码如下:

import os
import re
import sys
aa_codes = {
    'Ala':'A','Cys':'C','Asp':'D','Glu':'E',
    'Phe':'F','Gly':'G','His':'H','Lys':'K',
    'Ile':'I','Leu':'L','Met':'M','Asn':'N',
    'Pro':'P','Gln':'Q','Arg':'R','Ser':'S',
    'Thr':'T','Val':'V','Tyr':'Y','Trp':'W'
} #转换字典列表

#下面是从log文件中提取出Target结果
t = open("pro-three.fa", "w")
with open(sys.argv[1], 'r') as f:
    a =[]
    for num, line in enumerate(f):
        if '|' in line or '!'  in line:
            a.append(num + 1)
        elif 'Query:' in line:
            print ("\n>" + line.strip().split()[1] + " ", end= "", file = t)
        elif 'Target:' in line:
            print (line.strip().split()[1] + " ", end = "", file = t),
        elif 'Target range:' in line:
            print (line.strip().split()[2] + "——>" + line.strip().split()[4], file = t),
        elif num in a:
            b = re.sub(r'[^A-Za-z]','', line[1:-1])
            print (b, end="", file = t)
t.close()

#下面是对结果文件进行三字符转换
fout_tmp = open('pro-tmp.fa', 'w')
with open("pro-three.fa", 'r', encoding='utf-8') as fin:
    D =[]
    for num, line in enumerate(fin):
        if '>' in line:
            D.append(num + 1)
            print("\n", line, sep="", end= "", file = fout_tmp)
        elif num in D:
            e = re.sub(r"([A-Z])", r" \1", line).split()
            for i in range(len(e)):
                print(aa_codes.get(e[i]), end='', file = fout_tmp)
fin.close()
fout_tmp.close()

#下面是将最终结果剔除空行
file1 = open('pro-tmp.fa', 'r', encoding='utf-8') # 要去掉空行的文件
file2 = open('pro-one.fa', 'w', encoding='utf-8') # 生成没有空行的文件
try:
    for line in file1.readlines():
        if line == '\n':
            line = line.strip("\n")
        file2.write(line)
finally:
    file1.close()
    file2.close()
    os.remove("pro-tmp.fa")
print   ("提取结束\npro-one.fa为单字母氨基酸序列\npro-three.fa为三字母氨基酸序列")

欢迎交流!
作者邮箱:Luanxins@163.com

学习了作者:msw521sg的脚本

exonerate结果整理,获取target序列_msw521sg的博客-CSDN博客_exonerate

欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/langs/715024.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-04-25
下一篇 2022-04-25

发表评论

登录后才能评论

评论列表(0条)

保存