On the bias of H-scores for comparing biclusters, and how to correct it


The H-score (or Mean Squared Residue score) underlies Cheng and Church’s (2000) biclustering algorithm, one of the best-known and most widely employed algorithms in bioinformatics and computational biology, and many subsequent algorithms (e.g. FLOC, Yang et al., 2005 and CBEB, Huang et al., 2012). Cheng and Church’s algorithm has ∼2600 citations to date, 650 since 2015 and 230 in 2018–2019 alone. It was the first to be applied to gene microarray data, and it is one of the main tools available in biclustering packages (e.g. the ‘biclust’ R library) and in gene expression data analysis packages (e.g. IRIS-EDA, Monier et al., 2019). In addition, it is widely used as a benchmark: almost all published biclustering algorithms include a comparison with it. Squared residue measures such as H-scores have a double role in biclustering methods. On the one hand, they are employed by many algorithms as merit functions to guide the discovery of biclusters (see e.g. the reviews in Madeira and Oliveira, 2004; Pontes et al., 2015). On the other hand, they are used to assess solutions—in particular, H-scores are used to assess the ‘homogeneity’ of the discovered biclusters. Both uses involve the comparisons of biclusters which may have different numbers of rows and columns. Our findings document a bias that can distort biclustering results. We prove, both analytically and by simulation, that the average H-score increases with the number of rows/columns in a bicluster—even in the ‘ideal’ (and simplest) case of a single bicluster generated by a constant model plus a white noise. This biases the H-score, and hence all H-score based algorithms, toward small biclusters. Importantly, our analytical proof provides a straightforward way to correct this bias.