Details of Research Outputs

TitleInformation splitting for big data analytics
Creator
Date Issued2017-02-23
Conference Name2016 INTERNATIONAL CONFERENCE ON CYBER-ENABLED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY PROCEEDINGS - CYBERC 2016
Source PublicationProceedings - 2016 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2016
Pages294-302
Conference DateOCT 13-15, 2016
Conference PlaceChengdu
CountryPEOPLES R CHINA
Abstract

Many statistical models require an estimation of unknown (co)-variance parameter(s). The estimation is usually obtained by maximizing a log-likelihood which involves log determinant terms. In principle, one requires the observed information-The negative Hessian matrix or the second derivative of the log-likelihood-To obtain an accurate maximum likelihood estimator according to the Newton method. When one uses the Fisher information, the expect value of the observed information, a simpler algorithm than the Newton method is obtained as the Fisher scoring algorithm. With the advance in high-Throughput technologies in the biological sciences, recommendation systems and social networks, the sizes of data sets-And the corresponding statistical models-have suddenly increased by several orders of magnitude. Neither the observed information nor the Fisher information is easy to obtained for these big data sets. This paper introduces an information splitting technique to simplify the computation. After splitting the mean of the observed information and the Fisher information, an simpler approximate Hessian matrix for the log-likelihood can be obtained. This approximated Hessian matrix can significantly reduce computations, and makes the linear mixed model applicable for big data sets. Such a spitting and simpler formulas heavily depend on matrix algebra transforms, and applicable to large scale breeding model, genetics wide association analysis.

KeywordBreeding model Fisher information matrix Fisher scoring algorithm Geno-wide-Association Linear mixed model Observed information matrix Variance parameter estimation
DOI10.1109/CyberC.2016.64
URLView source
Indexed ByCPCI-S
Language英语English
WOS Research AreaComputer Science
WOS SubjectComputer Science, Interdisciplinary Applications ; Computer Science, Theory & Methods
WOS IDWOS:000401467600052
Scopus ID2-s2.0-85015868026
Citation statistics
Cited Times:9[WOS]   [WOS Record]     [Related Records in WOS]
Document TypeConference paper
Identifierhttp://repository.uic.edu.cn/handle/39GCC9TT/11512
CollectionResearch outside affiliated institution
Affiliation
Laboratory of Computational Physics,Institute of Applied Physics and Computational Mathematics,Beijing,P.O.Box 8009,100088,China
Recommended Citation
GB/T 7714
Zhu, Shengxin,Gu, Tongxiang,Xu, Xiaowenet al. Information splitting for big data analytics[C], 2017: 294-302.
Files in This Item:
There are no files associated with this item.
Related Services
Usage statistics
Google Scholar
Similar articles in Google Scholar
[Zhu, Shengxin]'s Articles
[Gu, Tongxiang]'s Articles
[Xu, Xiaowen]'s Articles
Baidu academic
Similar articles in Baidu academic
[Zhu, Shengxin]'s Articles
[Gu, Tongxiang]'s Articles
[Xu, Xiaowen]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Zhu, Shengxin]'s Articles
[Gu, Tongxiang]'s Articles
[Xu, Xiaowen]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.