位于细胞核内的lncrna可以与mirna相互作用吗_工具

位于细胞核内的lncrna可以与mirna相互作用

生物学功能：LncRNA与表观遗传调控、转录调控、转录后调控、miRNA调控、细胞分化及发育等密切相关；

应急功能：LncRNA可作为细胞内各种信号招募蛋白形成复合物参与免疫反应和宿主防御。

LncRNA与疾病：LncRNA与人类的许多疾病，尤其是与衰老相关的疾病有密切关系，例如心血管疾病、阿尔兹海默症、糖尿病、癌症等。

因此，lncRNA未来能否作为分子靶标成功应用于临床诊断和癌症治疗，将是其日后发展的难点与热点。

由于大多数长链非编码RNA在物种之间没有明显的序列保守性，对lncRNA进行碱基替换、插入或缺失部分序列时仍能表现出其原有的生物学活性，不像mRNA的翻译需要严格按照三联体密码子的使用法则一样，单个密码子的移码突变就会导致蛋白功能的丧失，lncRNA的保守区段可能仅在一段较短的区域内，这些较短区域对于结构或序列特异性相互作用较为关键。因此lncRNA功能的缺失需要通过删除这段保守的区域来完成。

技术原理

LncRNA敲除原理：在lncRNA的基因组序列两端各设计1个gRNA，致使整个lncRNA区段或大部分片段序列缺失，从而实现lncRNA的敲除。

基因表达紊乱是癌症的一个主要标志。事实上，转录因子活动的改变已被证明是一些癌症最常见亚型的驱动因素。RNA对基因表达至关重要，无论是以蛋白编码RNA（mRNAs）的形式，还是以参与和调节转录的非编码RNA形式（lncRNAs或snRNA）、剪接（snRNAs）和翻译（核糖体RNAs、tRNAs和microRNAs）。最近的证据表明，RNA的加工在癌症中被系统改变，证明RNA对肿瘤发生、生长和进展的重要影响。

2020年10月，来自澳大利亚的研究人员在《 Nature Reviews Cancer 》发表题为“RNA in cancer”的综述，讨论了编码和非编码RNA的加工或活性改变如何促进肿瘤的发生、生长和进展，强调了RNA在癌症中的既定角色（miRNA和lncRNA）和新兴角色（选择性mRNA加工和circRNA）以及它们对癌症的作用机制。

一旦RNA聚合酶II合成了 mRNA ，它必须首先剪接并进一步加工成成熟的转录物，然后从细胞核输出到细胞质，转化为蛋白质。这些相互连接的处理步骤是由许多大分子复合物完成的，例如剪接体和转录-输出复合物TREX和TREX2。

在生理条件下，基因表达也可以通过一些非编码RNA ，包括miRNAs、lncRNAs和circRNAs来调节。通常，miRNAs通过加速靶基因的去乙酰化和降解来负调控基因的表达，而lncRNAs则通过作为调节蛋白复合物的支架、定位到基因组DNA或改变基因组结构来调节顺式或反式的基因表达。

许多miRNAs被发现与癌症相关，要么作为肿瘤抑制因子，要么作为癌基因。

miRNA的作用：人类细胞中大多数蛋白质的表达水平受到一个或多个miRNA的某种程度的调控。单个miRNA可以具有许多mRNA靶标，而单个mRNA可以被多个miRNA靶向。尽管miRNA可以共同作用，以抑制在3'非翻译区（UTR）中具有多个miRNA结合位点的靶标的表达，仅一种类型的miRNA与靶标mRNA的结合导致相对温和减少靶基因表达。通过RNA测序已经检测到1000多种不同的miRNA。一些miRNAs，如肿瘤抑制因子let-7，在几乎每种细胞类型中都有大量表达，而另一些miRNAs具有高度的细胞类型特异性表达，或者在某些细胞类型中以非常低的水平存在或不存在。因此在检测低表达的miRNAs的可能影响时，需要谨慎。

致癌和抑癌的miRNA：

1 靶向致癌途径负调控因子的miRNAs在失调时可能通过多个靶点抑制RAS-MEK-ERK信号和miR-155/miR-221，它们分别针对SHIP1（也称为INPP5D）和PTEN，这两个都是AKT信号的负调节器。

2 在癌症中最常见减少的miRNA是let-7 miRNA突变体，它通过靶向强效癌基因，包括MYC、KRAS和HMGA2作为主要的肿瘤抑制因子。因此，let-7 miRNAs被认为是一个重要的治疗靶标。

3 大量miRNAs也被报道通过限制或逆转上皮-间质转化（EMT）来限制转移和/或化疗耐药，其中最有效的是miR-200家族。

miRNA失调的机制： miRNA基因由RNA聚合酶II转录，因此受到与蛋白质编码基因相同类型的表观遗传调控。事实上，许多miRNA基因都来自于蛋白质编码基因的内含子。在癌症中有许多关于miRNAs表观遗传失调的报道。癌症中miRNA表达水平广泛下调的一种模式是源于缺氧诱导的癌细胞中Drosha和Dicer表达水平的降低，以及AGO2的磷酸化，进而降低了Dicer与AGO2并抑制miRNA从前体到成熟miRNA的加工。然而，并不是所有的miRNAs都会受到缺氧的下调，例如，miR-210的转录诱导可以覆盖缺氧诱导的加工减少，并且可以抑制免疫缺陷小鼠肿瘤生长的启动，但也可以促进细胞在肿瘤缺氧的应激环境中的适应和生存。 miRNAs下调的另一个机制可能是由于基因突变或前miRNAs转运蛋白exportin 5（XPO5）磷酸化水平的变化而减少核的输出。

lncRNAs已经被发现具有致癌或肿瘤抑制功能。

lncRNAs的作用： lncRNAs是指长度超过200个核苷酸不编码蛋白质的RNA。与mRNAs一样，它们由RNA聚合酶II转录，但与mRNAs不同，许多lncRNAs优先定位于细胞核。它们具有不同的功能，包括核作用，如调节顺式或反式中的基因表达，调节剪接以及亚单位透明结构域的成核。2010年，lncRNA HOTAIR通过参与染色质重塑促进乳腺癌转移，随后发现许多lncRNA具有影响癌症发展或进展的功能。一些lncRNAs可能具有多种看似不相关的功能。例如，lincRNA-p21最初被鉴定为p53诱导的肿瘤抑制因子lncRNA80，并被证明介导异质性核糖核蛋白K（HNRNPK）与其邻近基因CDKN1A（编码p21）的结合并增加其转录。

致癌和抑癌的lncRNA：

1 最近的一项研究揭示了lncRNA-REG1CP在结直肠癌中的表达经常上调。REG1CP通过将解旋酶FANJ与相邻基因REG3A86的启动子连接，促进结直肠癌异种移植瘤的生长。

2 PCAT19是一种致癌的lncRNA，它激活反式基因，促进前列腺癌的生长、侵袭和转移。

3 细胞质lncRNAs也可能是癌基因。在MYCN扩增的神经母细胞瘤中过度表达的lncRNA linc0255，通过与核糖体蛋白RPL35的相互作用特别激活E2F1的翻译。

4 lncRNAs也可以作为肿瘤抑制剂。核lncRNA DIRC3影响局部染色质结构，激活编码肿瘤抑制因子IGFBP5的邻近基因的转录。

5 lncRNAs也可以通过调节细胞质中的信号来抑制肿瘤。细胞质lncRNA-DRAIC在去势抵抗的晚期前列腺癌中下调，并通过干扰NF-κB激酶（IKK）活性抑制剂抑制核因子-κB（NF-κB）激活来抑制其进展。

6 一些lncRNAs仍然有可能编码小蛋白。事实上，lncRNA LINC00908可以产生一种60个氨基酸的多肽，与正常组织样本相比，该多肽在三阴性乳腺癌组织中下调，并且与整体生存率差有关。

lncRNAs的多重对立效应：关于lncRNA基因在癌症中的影响，最能说明问题的一个例子是考虑lncRNA基因在强效癌基因表达中的作用，也可能反映了MYC在驱动对增殖和生长信号的转录反应中所起的关键作用，MYC基因的转录受多个邻近lncRNA基因转录的调控。这也凸显了lncRNA基因座可以产生具有不同甚至相反功能的RNA。通过对小鼠体内大量MALAT1 lncRNA进行基因缺失研究的对比解释，进一步强调了lncRNA对基因表达影响的复杂性。

circRNA的新角色： circRNAs基本上在所有细胞和组织中都有表达，并且在癌症中可能被错误调节。circRNA主要是反向剪接事件的产物，它将外显子拼接到前一个外显子而不是下游外显子上，从而形成共价闭合的circRNA分子。有报道称，一些circRNA位于细胞核内并调节转录，但大多数circRNAs位于细胞质中。单个细胞可以表达数千个circRNAs，通过对患者肿瘤和癌细胞系RNA的深度测序，总共检测到超过200000个不同的circRNAs。一些circRNAs被发现在癌症中与相应的正常组织相比过度表达，增加了它们作为疾病生物标志物的可能性。 circRNAs有可能作为癌基因或肿瘤抑制因子发挥作用，可能是通过充当miRNAs的海绵，而一项敲除筛选表明，前列腺癌细胞中一些高度丰富的circRNAs对细胞的最大增殖至关重要，虽然还需要更多的工作来确定致癌或肿瘤抑制circRNAs。 circRNA可能还充当多蛋白复合物的核因子或组分。

失调的circRNAs：什么导致癌症中的细胞周期失调？基因拷贝数或circRNA前体转录的改变无疑改变了它们在某些癌症中的水平。然而，由于大多数circRNAs是来自蛋白质编码基因的选择性剪接产物，因此需要仔细区分这些变化的影响与同源蛋白水平变化的影响。circRNA水平变化的另一种方式是通过参与circRNA生物合成的剪接因子水平的改变。

mRNA前体的剪接以去除内含子并以不同的方式连接外显子是基因表达的基础。事实上，选择性剪接可以通过产生选择性蛋白质亚型来促进转录组和蛋白质组的多样性。这个过程是由主要的剪接体完成的，它执行大多数的RNA剪接反应，并且与300多种不同的蛋白质相关。

一旦mRNAs被剪接和多聚腺苷酸化，它们必须从细胞核中的转录和加工部位输出到细胞质中进行翻译。有效的mRNA输出是通过将基因表达途径中的上游过程（即转录、剪接和多聚腺苷酸化）与mRNA输出耦合来实现的。mRNA不断地通过核孔复合体的内部通道运输，使蛋白质和分子能够穿过核膜。转录、RNA剪接和多聚腺苷酸化与mRNA输出之间存在广泛的耦合，对肿瘤的发生具有重要意义。

mRNA剪接的新角色： mRNA剪接在历史上被认为是一个内控过程，对多外显子基因的表达至关重要，但最近的研究结果显示了RNA剪接机制的调控潜力。改变的mRNA剪接机制如何促进肿瘤的发生？SRSF2、SF3B1和U2AF1的突变都不同程度地影响3′剪接位点识别。这种改变的剪接可能会影响编码促进转化的蛋白质转录物的稳定性。

选择性裂解和聚腺苷酸化：在肿瘤中也广泛观察到下游mRNA处理步骤的改变，如前体mRNAs的裂解和多聚腺苷酸化。例如，3′UTR区在肿瘤细胞系和肿瘤标本中均发生缩短。

选择性mRNA输出的新兴作用：基因表达途径的末端步骤之一，mRNA的核输出，在癌症中也发生了改变。虽然mRNA输出被认为是基因表达中的一个普遍的、默认的途径，但是特定的生物途径可以通过选择性的mRNA输出来调节，使某些mRNAs优先于其他的。选择性mRNA输出可以调节对癌症发展至关重要的生物学过程，如细胞增殖和基因组完整性。这种mRNA输出机制的调节潜力可被癌细胞利用以维持增殖。

在过去的几年里，大量的研究已经非常详细地揭示了RNA在癌症中发生系统性改变的程度。癌症中编码和非编码RNA的广泛改变影响了肿瘤发生的多个方面。

这些不同的RNA亚型和处理它们的蛋白质参与癌症发生的机制特性，为治疗干预提供机会。例如，一些以核心剪接体机制为靶点的化合物，如与SF3B复合物结合的E7107，在体内影响RNA剪接，但在I期临床试验中静脉注射时表现出显著的毒性。最近的研究表明，在具有剪接体突变的晚期血液恶性肿瘤中，使用SF3B复合物H3B-8800的可口服调节剂，在耐受剂量良好的小鼠模型中显示了优先抗肿瘤活性。其他研究试图通过使用介导其蛋白酶体降解的化合物作为干扰剪接的替代药理学手段来调节选择性和调节性剪接因子，如RBM39，在小鼠急性髓系白血病模型中获得成功。RNA在癌症中的广泛改变将为治疗提供大量的新机会。进一步阐明RNA加工改变促进肿瘤发生、生长和进展的基本机制，对于确保癌症疗法专门针对RNA加工过程且对正常细胞的影响最小至关重要。

首发公号：国家基因库大数据平台

参考文献

Goodall, GJ, Wickramasinghe, VO RNA in cancer Nat Rev Cancer (2020) >

全转录组的数据分析我们一直没有分享过笔记，因为确实也没有这方面直接项目机会，仅仅是跟公众号粉丝交流过一些小问题。全转录组不是全长转录组，全转录组说的是检测普通mRNA，加上 lncRNA，miRNA，CircRNA这样的3种常规非编码基因，而全长转录组说的是测序的时候采取三代测序等技术这样可以把基因的转录产物的全部长度的碱基一次性测序到，这样很方便知道不同可变剪切转录本的区别。

那，为什么我们很少涉及到全转录组的数据分析，主要是因为它有 lncRNA，miRNA，CircRNA这样的3种常规非编码基因，而众所周知，非编码基因的名声比较差，都知道很重要，但是它的重要性又不是直接证据，也没有系统性的go和kegg等生物学数据库的整理，所以大家研究它和交流它的时候通常是一个符号而已。

但无论是普通mRNA，还是 lncRNA，miRNA，CircRNA这样的3种常规非编码基因，它们最后都是会得到表达量矩阵，其实就是常规差异分析啦，相关流程的公众号推文在：

解读GEO数据存放规律及下载，一文就够

解读SRA数据库规律一文就够

从GEO数据库下载得到表达矩阵一文就够

GSEA分析一文就够（单机版+R语言版）

根据分组信息做差异分析- 这个一文不够的

如果是普通mRNA可以直接去映射到go和kegg等生物学数据库，如果是非编码基因需要先定位到它的靶基因，然后去给靶基因进行go和kegg等生物学数据库注释。

全转录组的测序

比如NPJ Breast Cancer 2021 Dec 的文章：《Plasma extracellular vesicle long RNA profiles in the diagnosis and prediction of treatment response for breast cancer 》，是两个队列的全转录组的测序：

队列1：纳入患者172例，包括乳腺癌患者112例、乳腺良性疾病患者19例和健康对照组41例。（肿瘤诊断模型）

队列2：纳入接受新辅助治疗的患者58例，pCR（病理完全缓解）组24例，non-pCR组34例。（疗效预测模型）

其转录组测序在 >

实际上，真正可用的基因只占人类基因组的3%，其余97%都是非编码序列，但是非编码序列也是可以表达的，表达产物就是非编码RNA(ncRNA)。

人类基因组中约93%的DNA是能转录为RNA的，其中2%是mRNA，98%是非编码RNA(ncRNA)。

RNA转录本分类

非编码RNA(ncRNA) 可以分为调控RNA 和管家RNA 两种。

调控RNA

miRNA: 微RNA (microRNA)，18-25 nt( nt =nucleotide核糖核苷酸)，单链

siRNA: 小干扰RNA (smallinterfering RNA)，21-23 nt，双链

piRNA: piwi相互作用RNA (piwi-interacting RNA)，26-35 nt，单链，这是动物生殖细胞所特有的小RNA，转座子沉默

lncRNA: 长非编码RNA (long non-coding RNA)，>500 nt，比如Xist、PCGEM1等

管家RNA

rRNA: 核糖体RNA (ribosome RNA)，26-35 nt，单链，是构成核糖体的组成成分，有多种不同的大小，如28S、18S、5S等

tRNA: 转运RNA (transfer RNA)，70-80 nt，单链，三叶草构型，在蛋白质合成过程中起到转运氨基酸的作用，对于不同的物种，其rRNA分子的大小和种类都可能有所不同

snoRNA: 核仁小RNA (smallnucleolar RNA)

sacRNA: Small Cajal body-specific RNAs，是一种特殊的核仁小RNA，专一位于卡哈尔体(Cajal body)上，可以催化核糖核蛋白的生成

Telomerase RNA: 端粒酶RNA，是端粒酶的一部分，在端粒延伸过程中，作为端粒继续延伸的模板，由端粒酶催化实现端粒的延长

热门ncRNA——lncRNA、miRNA、circleRNA

目前研究最热门的ncRNA主要集中在lncRNA、miRNA、circleRNA三种。

IncRNA : lncRNA可通过折叠形成一定的空间结构与多种蛋白互作，也可通过碱基互补配对与其它核酸进行识别，这种识别又可将蛋白引导至特定序列位点，这些特点使得lncRNA在发育和癌症中的功能发挥得更加丰富。

lncRNA

作为RNA诱饵，结合转录因子，干扰其与基因promoter区域的结合，从而调控转录；作为分子海绵，吸附miRNA，抑制其与mRNA的结合，使得mRNA免于降解；作为蛋白互作的支架或桥梁，影响蛋白多聚物的形成，调控蛋白活性；招募染色质修饰因子，改变染色质的修饰水平，从而影响基因的转录和表达；与mRNA配对结合，抑制翻译；与mRNA配对结合，影响剪切；与mRNA配对结合，影响mRNA的稳定性。

circleRNA : circRNA分子呈封闭环状结构，无游离5‘和3’末端，不易被核酸外切酶RNaseR降解，比线性RNA更加稳定。长度约200-2000bp，主要长度分布在500bp左右。

circleRNA

circleRNA大多数来源于外显子，少部分由内含子直接环化形成。其形成有四种模式: 套索驱动的环化、内含子碱基配对驱动环化、单个内含子成环、RNA结合蛋白驱动环化。

它可以通过竞争性结合miRNA、线性亲本基因的转录，甚至是编码多肽来发挥生物学功能。

circRNA作为ceRNA(内源竞争性RNA)竞争性结合miRNA；circRNA结合RNA结合蛋白(RBP)以形成RNA-蛋白复合物(RPC)，调控线性亲本基因的转录；编码功能，circRNA具有内部核糖体进入位点(IRES)，能合成多肽。

miRNA : miRNA一类由内源基因编码的非编码单链RNA分子，其长度约为19-25nt，其在肿瘤发生发展、生物发育、器官形成、病毒防御、表观调控以及代谢等方面起着极其重要的调控作用。

miRNA

RNA-seq结果解读

目前在生信里面应用最为广泛和成熟的RNA-seq技术就是转录组测序，狭义上也就是指的全部mRNA的表达水平，而RNA-seq完成后会生成很多的数据和，如火山图、韦恩图、聚类热图等。

火山图(Volcano Plot) 显示了两个重要的指标: fold change和校正后的p value，利用t检验分析出两样本间显著差异表达的基因后，以log2(fold change)为横坐标，以t检验显著性检验p值的负对数-log10(adj p-value)为纵坐标。

红色代表基因上调，绿色代表基因下调。

横轴: fold change代表检测样本对对照样本(TS vs CK)的RNA表达量倍数(商)。图中当横轴为1时，代表表达量为2倍关系(log2(2)=1)。

纵轴: padj就是adj p-value(调整p值)，代表差异是否具有显著性，统计学中，以p<005代表差异具备显著性，由于-log10(005)=13，所以图示中13以上的点代表差异具有显著性。

韦恩图(Vene PLot) 用于显示元素集合重叠区域的图示。

在RNA-seq项目中，每个椭圆表示一个比较集合(处理组 vs 对照组)中的差异基因，椭圆重叠区域的数字表示对应的多个比较集合之间的共有差异基因个数。如图示，集合A、B、C、D共有差异基因有44个。

聚类热图(Clustered HeatMap) 可用于判断不同实验条件下差异基因的表达模式，热力值表示该点的基因表达。

红色: 表示基因表达水平高；蓝色: 表示基因表达水平低。

横轴代表不同的实验处理条件/样本(cell)，纵轴代表差异基因(gene)，并且差异基因已经进行了聚类分组，表达模式或相近的差异基因会被聚类为一组。

GDCRNATools is an R package which provides a standard, easy-to-use and comprehensive pipeline for downloading, organizing, and integrative analyzing RNA expression data in the GDC portal with an emphasis on deciphering the lncRNA-mRNA related ceRNAs regulatory network in cancer

Competing endogenous RNAs (ceRNAs) are RNAs that indirectly regulate other transcripts by competing for shared miRNAs Although only a fraction of long non-coding RNAs has been functionally characterized, increasing evidences show that lncRNAs harboring multiple miRNA response elements (MREs) can act as ceRNAs to sequester miRNA activity and thus reduce the inhibition of miRNA on its targets Deregulation of ceRNAs network may lead to human diseases

The Genomic Data Commons (GDC) maintains standardized genomic, clinical, and biospecimen data from National Cancer Institute (NCI) programs including The Cancer Genome Atlas (TCGA) and Therapeutically Applicable Research To Generate Effective Treatments (TARGET) , It also accepts high quality datasets from non-NCI supported cancer research programs, such as genomic data from the Foundation Medicine

Many analyses can be perfomed using GDCRNATools, including differential gene expression analysis ( limma ( ), edgeR ( ), and DESeq2 ( )), univariate survival analysis (CoxPH and KM), competing endogenous RNA network analysis (hypergeometric test, Pearson correlation analysis, regulation similarity analysis, sensitivity Pearson partial correlation( )), and functional enrichment analysis(GO, KEGG, DO) Besides some routine visualization methods such as volcano plot, scatter plot, and bubble plot, etc, three simple shiny apps are developed in GDCRNATools allowing users visualize the results on a local webpage All the figures are plotted based on ggplot2 package unless otherwise specified

This user-friendly package allows researchers perform the analysis by simply running a few functions and integrate their own pipelines such as molecular subtype classification, weighted correlation network analysis (WGCNA) ( ), and TF-miRNA co-regulatory network analysis, etc into the workflow easily This could open a door to accelerate the study of crosstalk among different classes of RNAs and their regulatory relationships in cancer

The R software for running GDCRNATools can be downloaded from The Comprehensive R Archive Network (CRAN) The GDCRNATools package can be installed from Bioconductor

In GDCRNATools , some functions are built for users to download and process GDC data efficiently Users can also use their own data that is processed by other tools such as the UCSC Xena GDC hub , TCGAbiolinks ( ), or TCGA-Assembler ( ), etc

Here we use a small dataset to show the most basic steps for ceRNAs network analysis More detailed instruction of each step is in the Case Study section

In this section, we use the whole datasets of TCGA-CHOL project as an example to illustrate how GDCRNATools works in detail

To provide users a convenient method for data download, by default, we used the API method developed in the GenomicDataCommons package to download data automatically by specifying datatype and projectid arguments An alternative method using the gdc-client for automatic download is also provided in case that the API method fails

Users can also download data manually by providing the manifest file that is downloaded from the GDC cart

Step1: Download GDC Data Transfer Tool on the GDC website

Step2: Add data to the GDC cart, then download manifest file and metadata of the cart

Step3: Download data using gdcRNADownload() function by providing the manifest file

Metadata can be parsed by either providing the metadata file (json) that is downloaded in the data download step, or specifying the projectid and datatype in gdcParseMetadata() function to obtain information of data in the manifest file to facilitate data organization and basic clinical information of patients such as age, stage and gender, etc for data analysis

Only one sample would be kept if the sample had been sequenced more than once by gdcFilterDuplicate() Samples that are neither Primary Tumor (code: 01) nor Solid Tissue Normal (code: 11) would be filtered out by gdcFilterSampleType()

gdcRNAMerge() merges raw counts data of RNAseq to a single expression matrix with rows are Ensembl id and columns are samples Total read counts for 5p and 3p strands of miRNAs can be processed from isoform quantification files and then merged to a single expression matrix with rows are miRBase v21 identifiers and columns are samples

By running gdcVoomNormalization() function, raw counts data would be normalized by TMM method implemented in edgeR ( ) and further transformed by the voom method provided in limma ( ) Low expression genes (logcpm < 1 in more than half of the samples) will be filtered out by default All the genes can be kept by setting filter=TRUE in the gdcVoomNormalization()

Usually, people are interested in genes that are differentially expressed between different groups (eg Primary Tumor vs Solid Tissue Normal) gdcDEAnalysis() , a convenience wrapper, provides three widely used methods limma ( ), edgeR ( ), and DESeq2 ( ) to identify differentially expressed genes (DEGs) or miRNAs between any two groups defined by users Note that DESeq2 ( ) maybe slow with a single core Multiple cores can be specified with the nCore argument if DESeq2 ( ) is in use Users are encouraged to consult the vignette of each method for more detailed information

All DEGs, DE long non-coding genes, DE protein coding genes and DE miRNAs could be reported separately by setting geneType argument in gdcDEReport() Gene symbols and biotypes based on the Ensembl 90 annotation are reported in the output

Hypergenometric test is performed to test whether a lncRNA and mRNA share many miRNAs significantly

A newly developed algorithm spongeScan is used to predict MREs in lncRNAs acting as ceRNAs Databases such as starBase v20 , miRcode and mirTarBase release 70 are used to collect predicted and experimentally validated miRNA-mRNA and/or miRNA-lncRNA interactions Gene IDs in these databases are updated to the latest Ensembl 90 annotation of human genome and miRNAs names are updated to the new release miRBase 21 identifiers Users can also provide their own datasets of miRNA-lncRNA and miRNA-mRNA interactions

p=1−∑k=0m(Kk)(N−Kn−k)(Nn)

here m is the number of shared miRNAs, N is the total number of miRNAs in the database, n is the number of miRNAs targeting the lncRNA, K is the number of miRNAs targeting the protein coding gene

Pearson correlation coefficient is a measure of the strength of a linear association between two variables As we all know, miRNAs are negative regulators of gene expression If more common miRNAs are occupied by a lncRNA, less of them will bind to the target mRNA, thus increasing the expression level of mRNA So expression of the lncRNA and mRNA in a ceRNA pair should be positively correlated

We defined a measurement regulation similarity score to check the similarity between miRNAs-lncRNA expression correlation and miRNAs-mRNA expression correlation

Regulation similarity score=1−1M∑k=1M[|corr(mk,l)−corr(mk,g)||corr(mk,l)|+|corr(mk,g)|]M

where M is the total number of shared miRNAs, k is the kth shared miRNAs, corr(mk,l) and corr(mk,g) represents the Pearson correlation between the kth miRNA and lncRNA, the kth miRNA and mRNA, respectively

Sensitivity correlation is defined by Paci et alto measure if the correlation between a lncRNA and mRNA is mediated by a miRNA in the lncRNA-miRNA-mRNA triplet We take average of all triplets of a lncRNA-mRNA pair and their shared miRNAs as the sensitivity correlation between a selected lncRNA and mRNA

Sensitivity correlation=corr(l,g)−1M∑k=1Mcorr(l,g)−corr(mk,l)corr(mk,g)1−corr(mk,l)2‾‾‾‾‾‾‾‾‾‾‾‾‾‾√1−corr(mk,g)2‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾√

where M is the total number of shared miRNAs, k is the kth shared miRNAs, corr(l,g), corr(mk,l) and corr(mk,g) represents the Pearson correlation between the long non-coding RNA and the protein coding gene, the kth miRNA and lncRNA, the kth miRNA and mRNA, respectively

The hypergeometric test of shared miRNAs, expression correlation analysis of lncRNA-mRNA pair, and regulation pattern analysis of shared miRNAs are all implemented in the gdcCEAnalysis() function

Users can use the internally incoporated databases of miRNA-mRNA (starBase v20, miRcode, and mirTarBase v70) and miRNA-lncRNA (starBase v20, miRcode, spongeScan) interactions to perform the ceRNAs network analysis

gdcCEAnalysis() can also take user-provided miRNA-mRNA and miRNA-lncRNA interaction datasets, such as miRNA-target interactions predicted by TargetScan , miRanda , and Diana Tools , etc for the ceRNAs network analysis

lncRNA-miRNA-mRNA interactions can be reported by the gdcExportNetwork() and visualized in Cytoscape edges should be imported as network and nodes should be imported as feature table

shinyCorPlot() , a interactive plot function based on shiny package, can be easily operated by just clicking the genes in each drop down box (in the GUI window) By running shinyCorPlot() function, a local webpage would pop up and correlation plot between a lncRNA and mRNA would be automatically shown

Downstream analyses such as univariate survival analysis and functional enrichment analysis are developed in the GDCRNATools package to facilitate the identification of genes in the ceRNAs network that play important roles in prognosis or involve in important pathways

Two methods are provided to perform univariate survival analysis: Cox Proportional-Hazards (CoxPH) model and Kaplan Meier (KM) analysis based on the survival package CoxPH model considers expression value as continous variable while KM analysis divides patients into high-expreesion and low-expression groups by a user-defined threshold such as median or mean gdcSurvivalAnalysis() take a list of genes as input and report the hazard ratio, 95% confidence intervals, and test significance of each gene on overall survival

The shinyKMPlot() function is also a simply shiny app which allow users view KM plots (based on the R package survminer ) of all genes of interests on a local webpackage conveniently

gdcEnrichAnalysis() can perform Gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) and Disease Ontology (DO) functional enrichment analyses of a list of genes simultaneously GO and KEGG analyses are based on the R/Bioconductor packages clusterProfilier ( ) and DOSE ( ) Redundant GO terms can be removed by specifying simplify=TRUE in the gdcEnrichAnalysis() function which uses the simplify() function in the clusterProfilier ( ) package

shinyPathview() allows users view and download pathways of interests by simply selecting the pathway terms on a local webpage

有免费的在线数据库可以预测，比如 StarBase，这是一个高通量实验数据CLIP-Seq（或称为HITS-CLIP,PAR-CLIP,iCLIP）和mRNA降解组测序数据支持的microRNA靶标数据库，整合多个预测软件，预测miRNA靶基因，包含了miRNA-mRNA，miRNA-lncRNA，miRNA-circRNA，miRNA-ceRNA 和RNA-protein等的调控关系。整合和构建多个流行的靶标预测软件的交集和调控关系。构建了最全面的包含了14种癌症类型（>6000个样本）Pan-Cancer（泛癌）表达图谱和互作网络。

如何找与转录因子可能存在相互作用的lncrna

研究人类miRNA转录因子及靶基因之间的相关关系。方法利用生物信息学方法预测miR-NA的上游转录因子和下游靶基因,并对预测结果做基因本体分析,得到参与各生物学过程及分子功能的比例,用统计学软件PASW做相关性分析。结果人类382个miRNA参与应激、代谢、发育等10个生物学过程和行使转录调控、翻译调控、催化活性等12个分子功能,miRNA上游转录因子之间、下游靶基因之间以及上下游之间存在着广泛的正负相关关系。结论基因在参与生物学过程及行使分子功能的过程中,通过miRNA实现协同作用或隔离效应。

以上就是关于位于细胞核内的lncrna可以与mirna相互作用吗全部的内容，包括:位于细胞核内的lncrna可以与mirna相互作用吗、Nature子刊综述帮你总结知识点：癌症中的RNA，每个都是研究热点、全长转录组测序技术算不算生物信息学等相关内容解答，如果想了解更多相关内容，可以关注我们，你们的支持是我们更新的动力！

欢迎分享，转载请注明来源：内存溢出

原文地址: https://outofmemory.cn/sjk/10132086.html

位于细胞核内的lncrna可以与mirna相互作用吗

发表评论

评论列表（0条）