Status | 已发表Published |
Title | On triangle inequalities of correlation-based distances for gene expression profiles |
Creator | |
Date Issued | 2023-12-01 |
Source Publication | BMC Bioinformatics
![]() |
ISSN | 1471-2105 |
Volume | 24Issue:1 |
Abstract | Background: Distance functions are fundamental for evaluating the differences between gene expression profiles. Such a function would output a low value if the profiles are strongly correlated—either negatively or positively—and vice versa. One popular distance function is the absolute correlation distance, d= 1 - | ρ| , where ρ is similarity measure, such as Pearson or Spearman correlation. However, the absolute correlation distance fails to fulfill the triangle inequality, which would have guaranteed better performance at vector quantization, allowed fast data localization, as well as accelerated data clustering. Results: In this work, we propose dr=1-|ρ| as an alternative. We prove that d satisfies the triangle inequality when ρ represents Pearson correlation, Spearman correlation, or Cosine similarity. We show d to be better than ds=1-ρ2, another variant of d that satisfies the triangle inequality, both analytically as well as experimentally. We empirically compared d with d in gene clustering and sample clustering experiment by real-world biological data. The two distances performed similarly in both gene clustering and sample clustering in hierarchical clustering and PAM (partitioning around medoids) clustering. However, d demonstrated more robust clustering. According to the bootstrap experiment, d generated more robust sample pair partition more frequently (P-value < 0.05). The statistics on the time a class “dissolved” also support the advantage of d in robustness. Conclusion: d, as a variant of absolute correlation distance, satisfies the triangle inequality and is capable for more robust clustering. |
Keyword | Clustering Correlation Distance Gene expression analysis Single cell Triangle inequality |
DOI | 10.1186/s12859-023-05161-y |
URL | View source |
Indexed By | SCIE |
Language | 英语English |
WOS Research Area | Biochemistry & Molecular Biology ; Biotechnology & Applied Microbiology ; Mathematical & Computational Biology |
WOS Subject | Biochemical Research Methods ; Biotechnology & Applied Microbiology ; Mathematical & Computational Biology |
WOS ID | WOS:000934967300002 |
Scopus ID | 2-s2.0-85147722871 |
Citation statistics | |
Document Type | Journal article |
Identifier | http://repository.uic.edu.cn/handle/39GCC9TT/11098 |
Collection | Faculty of Science and Technology |
Corresponding Author | Li, Shuaicheng |
Affiliation | 1.Department of Computer Science,City University of Hong Kong,Hong Kong, China 2.Department of Computer Science,Beijing Normal University - Hong Kong Baptist University United International College,Zhuhai,China 3.State Key Laboratory of Pathogen and Biosecurity,Beijing Institute of Microbiology and Epidemiology,Beijing,100071,China |
First Author Affilication | Beijing Normal-Hong Kong Baptist University |
Recommended Citation GB/T 7714 | Chen, Jiaxing,Ng, Yen Kaow,Lin,, Luet al. On triangle inequalities of correlation-based distances for gene expression profiles[J]. BMC Bioinformatics, 2023, 24(1). |
APA | Chen, Jiaxing, Ng, Yen Kaow, Lin,, Lu, Zhang, Xianglilan, & Li, Shuaicheng. (2023). On triangle inequalities of correlation-based distances for gene expression profiles. BMC Bioinformatics, 24(1). |
MLA | Chen, Jiaxing,et al."On triangle inequalities of correlation-based distances for gene expression profiles". BMC Bioinformatics 24.1(2023). |
Files in This Item: | There are no files associated with this item. |
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment