R software provids function phyper and fisher.test for Hypergeometric and Fisher's exact test accordingly. However, it is tricky to get it right. I spent some time to make it clear.
Here is a simple example:
Five cards were chosen from a well shuffled deck
x = the number of diamonds selected.
We use a 2x2 table to represent the case:
Diamond Non-Diamond
selected x 5-x total 5 sampled cards
left 13-x 34+x total 47 left cards after sampling
13 Dia 39 Non-Dia total 52 cards
We 're asking if diamond enriched or depleted in our selected cards, comparing to the background.
Here are the different parameters used by phyper and fisher.test:
phyper(x, 13, 39, 5, lower.tail=TRUE);
# Numerical parameters in order:
# (success-in-sample, success-in-bkgd, failure-in-bkgd, sample-size).
fisher.test(matrix(c(x, 13-x, 5-x, 34+x), 2, 2), alternative='less');
# Numerical parameters in order:
# (success-in-sample, success-in-left-part, failure-in-sample, failure-in-left-part).It's obvious that hypergeometric test compares sample to bkgd, while fisher's exact test compares sample to the left part of bkgd after sampling without replacement. They will give the same p-value (because they assume the same distribution).
Here is the results:
x=1; # x could be 0~5
hitInSample = 1 # could be 0~5
hitInPop = 13
failInPop = 52 - hitInPop
sampleSize = 5
- Test for under-representation (depletion)
phyper(hitInSample, hitInPop, failInPop, sampleSize, lower.tail= TRUE);
## [1] 0.6329532
fisher.test(matrix(c(hitInSample, hitInPop-hitInSample, sampleSize-hitInSample, failInPop-sampleSize +hitInSample), 2, 2), alternative='less')$p.value;
## [1] 0.6329532
- Test for over-representation (enrichment)
phyper(hitInSample-1, hitInPop, failInPop, sampleSize, lower.tail= FALSE);
## [1] 0.7784664
fisher.test(matrix(c(hitInSample, hitInPop-hitInSample, sampleSize-hitInSample, failInPop-sampleSize +hitInSample), 2, 2), alternative='greater')$p.value;
## [1] 0.7784664
- Why hitInSample-1 when testing over-representation?
Becauseif
lower.tail
is TRUE (default), probabilities are P[X ≤ x], otherwise, P[X > x]. We subtract x by 1, when P[X ≥ x] is needed.
So are there any advantages fisher.test has over phyper, as they give the same p-values?
Yes, fisher.test can do two other jobs: two-side test, and giving confidence intervals of odds ratio. Please refer to its manual for details. For one-side p-value calculating, they don't have any difference if correct parameters were used.
Nice note!
ReplyDelete"We subtract x by 1, when P[X ≥ x] is needed." But you subtract one for both depletion and enrichment tests...
ReplyDeleteThanks. You're right, flies! This is a slip of keyboard. Apparently the p-value doesn't match if copy and run the code. I corrected it.
DeleteThanks for the note. Since you begin the post talking about GO term enrichment in a set of genes (reason why I landed here during a google search), I think the example should reflect this analysis instead of the how many diamonds from a set of cards (those of us who don't play cards might not even know how many diamonds should be in a deck to start with ;)). Also, using the same color for different meaning parameters of the two functions can be a bit misleading! All best, Pablo
ReplyDeleteThank you Pablo, for the great suggestions! I would made it better if I know people actually read this. Again, thanks!
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteYour post was actually top-ranked in my search, so people are surely reading it! I got into reading your other posts and the one explaining the PCA is fabulous. Thank you very much for putting this much time and effort on explaining these very important--but often misunderstood--concepts on bioinformatics. Please keep posting, I'll make sure to come back!
ReplyDeleteSuperb summary, Meng, thanks for that.
ReplyDeleteHi - thanks for posting, but please update
ReplyDeletefailInPop = 54-hitInPop
to
failInPop = 52-hitInPop
Good catch. I fixed it. Thank you!
DeleteGreat post but the colors make it a little hard to compare fisher to phyper. Slight re-order of the colors would make it more clear.
ReplyDeleteUseful post.
ReplyDeleteI have question regarding interpretation of p-values.
In both the cases (i.e. depletion and enrichment), the p-value >0.05 (if 0.05 is my threshold), what would I interpret? I understand that it is neither enriched nor depleted in the above example!! Is that correct? In other words, if p-value would have been less than 0.05, I would say that this sampling is enriched or depleted? Thanks
Traditionally we interpret p-value > threshold as INCONCLUSIVE, since null-hypothesis can only be rejected but never be proved. I think the trend is to abandon p-value once for all: http://www.sciencemag.org/news/2017/07/it-will-be-much-harder-call-new-findings-significant-if-team-gets-its-way . http://www.stat.columbia.edu/~gelman/research/unpublished/abandon.pdf
DeleteThank you for reply.
DeleteMy question is: Had P-value been < threshold in your example, would we call sampling a) depleted and b) enriched, respectively?
OR,
What is the null hypothesis in both the cases in above example?
When we set parameter lower.tail=TRUE in phyper (or alternative='less' in fisher.test), p-value <= threshold suggests significant depletion. An interpretation of p-value is the probability of observing equal or more depletion when null hypothesis is true. Similar principle applies to enrichment test. Either way null hypothesis remains the same - samples are randomly chosen from population (thus no depletion nor enrichment). Does this make sense?
DeleteThis comment has been removed by the author.
DeleteYes, it easier to interpret now. Thank you.
ReplyDeleteThis has been really helpful to me, thank you!
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteGreat post! Thank you (:
ReplyDeleteThank you for this beautifully lucid blog entry.
ReplyDelete