Details of Research Outputs

Status已发表Published
TitleDeep learning for HGT insertion sites recognition
Creator
Date Issued2020-12-01
Source PublicationBMC Genomics
ISSN1471-2164
Volume21
Abstract

Background: Horizontal Gene Transfer (HGT) refers to the sharing of genetic materials between distant species that are not in a parent-offspring relationship. The HGT insertion sites are important to understand the HGT mechanisms. Recent studies in main agents of HGT, such as transposon and plasmid, demonstrate that insertion sites usually hold specific sequence features. This motivates us to find a method to infer HGT insertion sites according to sequence features. Results: In this paper, we propose a deep residual network, DeepHGT, to recognize HGT insertion sites. To train DeepHGT, we extracted about 1.55 million sequence segments as training instances from 262 metagenomic samples, where the ratio between positive instances and negative instances is about 1:1. These segments are randomly partitioned into three subsets: 80% of them as the training set, 10% as the validation set, and the remaining 10% as the test set. The training loss of DeepHGT is 0.4163 and the validation loss is 0.423. On the test set, DeepHGT has achieved the area under curve (AUC) value of 0.8782. Furthermore, in order to further evaluate the generalization of DeepHGT, we constructed an independent test set containing 689,312 sequence segments from another 147 gut metagenomic samples. DeepHGT has achieved the AUC value of 0.8428, which approaches the previous test AUC value. As a comparison, the gradient boosting classifier model implemented in PyFeat achieve an AUC value of 0.694 and 0.686 on the above two test sets, respectively. Furthermore, DeepHGT could learn discriminant sequence features; for example, DeepHGT has learned a sequence pattern of palindromic subsequences as a significantly (P-value=0.0182) local feature. Hence, DeepHGT is a reliable model to recognize the HGT insertion site. Conclusion: DeepHGT is the first deep learning model that can accurately recognize HGT insertion sites on genomes according to the sequence pattern.

KeywordDeep residual model DNA sequence feature HGT insertion site
DOI10.1186/s12864-020-07296-1
URLView source
Indexed BySCIE ; CPCI-S
Language英语English
WOS Research AreaBiotechnology & Applied Microbiology ; Genetics & Heredity
WOS SubjectBiotechnology & Applied Microbiology ; Genetics & Heredity
WOS IDWOS:000605610300008
Scopus ID2-s2.0-85098260752
Citation statistics
Cited Times:1[WOS]   [WOS Record]     [Related Records in WOS]
Document TypeJournal article
Identifierhttp://repository.uic.edu.cn/handle/39GCC9TT/9045
CollectionResearch outside affiliated institution
Corresponding AuthorLi, Shuaicheng
Affiliation
Department of Computer Science,City University of Hong Kong,Kowloon,Hong Kong
Recommended Citation
GB/T 7714
Li, Chen,Chen, Jiaxing,Li, Shuaicheng. Deep learning for HGT insertion sites recognition[J]. BMC Genomics, 2020, 21.
APA Li, Chen, Chen, Jiaxing, & Li, Shuaicheng. (2020). Deep learning for HGT insertion sites recognition. BMC Genomics, 21.
MLA Li, Chen,et al."Deep learning for HGT insertion sites recognition". BMC Genomics 21(2020).
Files in This Item:
There are no files associated with this item.
Related Services
Usage statistics
Google Scholar
Similar articles in Google Scholar
[Li, Chen]'s Articles
[Chen, Jiaxing]'s Articles
[Li, Shuaicheng]'s Articles
Baidu academic
Similar articles in Baidu academic
[Li, Chen]'s Articles
[Chen, Jiaxing]'s Articles
[Li, Shuaicheng]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Li, Chen]'s Articles
[Chen, Jiaxing]'s Articles
[Li, Shuaicheng]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.