科研成果详情

发表状态已发表Published
题名Deep learning for HGT insertion sites recognition
作者
发表日期2020-12-01
发表期刊BMC Genomics
ISSN/eISSN1471-2164
卷号21
摘要

Background: Horizontal Gene Transfer (HGT) refers to the sharing of genetic materials between distant species that are not in a parent-offspring relationship. The HGT insertion sites are important to understand the HGT mechanisms. Recent studies in main agents of HGT, such as transposon and plasmid, demonstrate that insertion sites usually hold specific sequence features. This motivates us to find a method to infer HGT insertion sites according to sequence features. Results: In this paper, we propose a deep residual network, DeepHGT, to recognize HGT insertion sites. To train DeepHGT, we extracted about 1.55 million sequence segments as training instances from 262 metagenomic samples, where the ratio between positive instances and negative instances is about 1:1. These segments are randomly partitioned into three subsets: 80% of them as the training set, 10% as the validation set, and the remaining 10% as the test set. The training loss of DeepHGT is 0.4163 and the validation loss is 0.423. On the test set, DeepHGT has achieved the area under curve (AUC) value of 0.8782. Furthermore, in order to further evaluate the generalization of DeepHGT, we constructed an independent test set containing 689,312 sequence segments from another 147 gut metagenomic samples. DeepHGT has achieved the AUC value of 0.8428, which approaches the previous test AUC value. As a comparison, the gradient boosting classifier model implemented in PyFeat achieve an AUC value of 0.694 and 0.686 on the above two test sets, respectively. Furthermore, DeepHGT could learn discriminant sequence features; for example, DeepHGT has learned a sequence pattern of palindromic subsequences as a significantly (P-value=0.0182) local feature. Hence, DeepHGT is a reliable model to recognize the HGT insertion site. Conclusion: DeepHGT is the first deep learning model that can accurately recognize HGT insertion sites on genomes according to the sequence pattern.

关键词Deep residual model DNA sequence feature HGT insertion site
DOI10.1186/s12864-020-07296-1
URL查看来源
收录类别SCIE ; CPCI-S
语种英语English
WOS研究方向Biotechnology & Applied Microbiology ; Genetics & Heredity
WOS类目Biotechnology & Applied Microbiology ; Genetics & Heredity
WOS记录号WOS:000605610300008
Scopus入藏号2-s2.0-85098260752
引用统计
被引频次:1[WOS]   [WOS记录]     [WOS相关记录]
文献类型期刊论文
条目标识符https://repository.uic.edu.cn/handle/39GCC9TT/9045
专题个人在本单位外知识产出
通讯作者Li, Shuaicheng
作者单位
Department of Computer Science,City University of Hong Kong,Kowloon,Hong Kong
推荐引用方式
GB/T 7714
Li, Chen,Chen, Jiaxing,Li, Shuaicheng. Deep learning for HGT insertion sites recognition[J]. BMC Genomics, 2020, 21.
APA Li, Chen, Chen, Jiaxing, & Li, Shuaicheng. (2020). Deep learning for HGT insertion sites recognition. BMC Genomics, 21.
MLA Li, Chen,et al."Deep learning for HGT insertion sites recognition". BMC Genomics 21(2020).
条目包含的文件
条目无相关文件。
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Li, Chen]的文章
[Chen, Jiaxing]的文章
[Li, Shuaicheng]的文章
百度学术
百度学术中相似的文章
[Li, Chen]的文章
[Chen, Jiaxing]的文章
[Li, Shuaicheng]的文章
必应学术
必应学术中相似的文章
[Li, Chen]的文章
[Chen, Jiaxing]的文章
[Li, Shuaicheng]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。