Because the compositional data are constrained by the simplex (sum to 1) and are not unconstrained in the Euclidean space, many standard methods of analysis are not applicable. Here, we evaluate how these challenges impact the performance of existing normalization methods and differential abundance analyses. ResultsĮffects on normalization: Most normalization methods enable successful clustering of samples according to biological origin when the groups differ substantially in their overall microbial composition. Rarefying more clearly clusters samples according to biological origin than other normalization techniques do for ordination metrics based on presence or absence. Alternate normalization measures are potentially vulnerable to artifacts due to library size.Įffects on differential abundance testing: We build on a previous work to evaluate seven proposed statistical methods using rarefied as well as raw data. Our simulation studies suggest that the false discovery rates of many differential abundance-testing methods are not increased by rarefying itself, although of course rarefying results in a loss of sensitivity due to elimination of a portion of available data. DESeq2, without addition of a constant, increased sensitivity on smaller datasets (20 samples per group) but also critically the only method tested that has a good control of false discovery rate.įor groups with large (~10×) differences in the average library size, rarefying lowers the false discovery rate. These findings guide which normalization and differential abundance techniques to use based on the data characteristics of a given study.Īlthough data produced by high-throughput sequencing has been proven extremely useful for understanding microbial communities, the interpretation of these data is complicated by several statistical challenges.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |