In statistics, biweight midcorrelation (also called bicor) is a
measure of similarity between samples. It is
median-based, rather than
mean-based, thus is less sensitive to outliers, and can be a robust alternative to other similarity metrics, such as
Pearson correlation or
mutual information.
[1]
Derivation
Here we find the biweight midcorrelation of two vectors
and
, with
items, representing each item in the vector as
and
. First, we define
as the
median of a vector
and
as the
median absolute deviation (MAD), then define
and
as,
![{\displaystyle {\begin{aligned}u_{i}&={\frac {x_{i}-\operatorname {med} (x)}{9\operatorname {mad} (x)}},\\v_{i}&={\frac {y_{i}-\operatorname {med} (y)}{9\operatorname {mad} (y)}}.\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/b2ca4c57568329d98b0620726c2c8a0b272c792f)
Now we define the weights
and
as,
![{\displaystyle {\begin{aligned}w_{i}^{(x)}&=\left(1-u_{i}^{2}\right)^{2}I\left(1-|u_{i}|\right)\\w_{i}^{(y)}&=\left(1-v_{i}^{2}\right)^{2}I\left(1-|v_{i}|\right)\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c1b7ca9174b5a26fcc2a8ac55152d7b859d99954)
where
is the identity function where,
![{\displaystyle I(x)={\begin{cases}1,&{\text{if }}x>0\\0,&{\text{otherwise}}\end{cases}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/e02345db8cebf51e8e4a8801a857765c83863b2e)
Then we normalize so that the sum of the weights is 1:
![{\displaystyle {\begin{aligned}{\tilde {x}}_{i}&={\frac {\left(x_{i}-\operatorname {med} (x)\right)w_{i}^{(x)}}{\sqrt {\sum _{j=1}^{m}\left[(x_{j}-\operatorname {med} (x))w_{j}^{(x)}\right]^{2}}}}\\{\tilde {y}}_{i}&={\frac {\left(y_{i}-\operatorname {med} (y)\right)w_{i}^{(y)}}{\sqrt {\sum _{j=1}^{m}\left[(y_{j}-\operatorname {med} (y))w_{j}^{(y)}\right]^{2}}}}.\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4ddaa71533c2d782bdd2b7812536299d3ccc19b9)
Finally, we define biweight midcorrelation as,
![{\displaystyle \mathrm {bicor} \left(x,y\right)=\sum _{i=1}^{m}{\tilde {x}}_{i}{\tilde {y}}_{i}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4a7f80da97d59f1a3e90d4f8eb49b0e3b4cdda0c)
Applications
Biweight midcorrelation has been shown to be more robust in evaluating similarity in gene expression networks,
[2] and is often used for
weighted correlation network analysis.
Implementations
Biweight midcorrelation has been implemented in the
R statistical programming language as the function bicor
as part of the WGCNA package
[3]
Also implemented in the
Raku programming language as the function bi_cor_coef
as part of the Statistics module.
[4]
References