mathjax

Wednesday, December 19, 2012

Simple Enrichment Test -- calculate hypergeometric p-values in R

Hypergeometric test are useful for enrichment analysis. For example, having a gene list in hand, people might want to tell which functions (GO terms) are enriched among these genes. Hypergeometric test (or its equivalent: one-tailed Fisher's exact test) will give you statistical confidence in \(p\)-values.

R software provids function phyper and fisher.test for Hypergeometric and Fisher's exact test accordingly. However, it is tricky to get it right. I spent some time to make it clear.


Here is a simple example:
Five cards were chosen from a well shuffled deck
x = the number of diamonds selected.
We use a 2x2 table to represent the case:

                Diamond     Non-Diamond
selected        x                     5-x               total 5 sampled cards
left               13-x                 34+x             total 47 left cards after sampling
                 13 Dia          39 Non-Dia         total 52 cards

We 're asking if diamond enriched or depleted in our selected cards, comparing to the background.


Here are the different parameters used by phyper and fisher.test:

phyper(x, 13, 39, 5, lower.tail=TRUE);
# Numerical parameters in order:
# (success-in-sample, success-in-bkgd, failure-in-bkgd, sample-size).
fisher.test(matrix(c(x, 13-x, 5-x, 34+x), 2, 2), alternative='less');
# Numerical parameters in order:
# (success-in-sample, success-in-left-part, failure-in-sample, failure-in-left-part).
It's obvious that hypergeometric test compares sample to bkgd, while fisher's exact test compares sample to the left part of bkgd after sampling without replacement. They will give the same p-value (because they assume the same distribution).

Here is the results:
x=1; # x could be 0~5 
hitInSample 1  # could be 0~5
hitInPop 13 
failInPop 52 hitInPop 
sampleSize = 5
  • Test for under-representation (depletion)
phyper(hitInSamplehitInPopfailInPopsampleSizelower.tailTRUE);
## [1] 0.6329532
fisher.test(matrix(c(hitInSamplehitInPop-hitInSamplesampleSize-hitInSamplefailInPop-sampleSize +hitInSample), 22), alternative='less')$p.value; 
## [1] 0.6329532
  • Test for over-representation (enrichment)
phyper(hitInSample-1hitInPopfailInPopsampleSizelower.tailFALSE);
## [1] 0.7784664
fisher.test(matrix(c(hitInSamplehitInPop-hitInSamplesampleSize-hitInSamplefailInPop-sampleSize +hitInSample), 22), alternative='greater')$p.value; 
## [1] 0.7784664
  •  Why hitInSample-1 when testing over-representation?
Because if lower.tail is TRUE (default), probabilities are P[X ≤ x], otherwise, P[X > x]. We subtract x by 1, when P[X ≥ x] is needed.



So are there any advantages fisher.test has over phyper, as they give the same p-values?
Yes, fisher.test can do two other jobs: two-side test, and giving confidence intervals of odds ratio. Please refer to its manual for details. For one-side p-value calculating, they don't have any difference if correct parameters were used.


21 comments:

  1. "We subtract x by 1, when P[X ≥ x] is needed." But you subtract one for both depletion and enrichment tests...

    ReplyDelete
    Replies
    1. Thanks. You're right, flies! This is a slip of keyboard. Apparently the p-value doesn't match if copy and run the code. I corrected it.

      Delete
  2. Thanks for the note. Since you begin the post talking about GO term enrichment in a set of genes (reason why I landed here during a google search), I think the example should reflect this analysis instead of the how many diamonds from a set of cards (those of us who don't play cards might not even know how many diamonds should be in a deck to start with ;)). Also, using the same color for different meaning parameters of the two functions can be a bit misleading! All best, Pablo

    ReplyDelete
  3. Thank you Pablo, for the great suggestions! I would made it better if I know people actually read this. Again, thanks!

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. Your post was actually top-ranked in my search, so people are surely reading it! I got into reading your other posts and the one explaining the PCA is fabulous. Thank you very much for putting this much time and effort on explaining these very important--but often misunderstood--concepts on bioinformatics. Please keep posting, I'll make sure to come back!

    ReplyDelete
  6. Superb summary, Meng, thanks for that.

    ReplyDelete
  7. Hi - thanks for posting, but please update
    failInPop = 54-hitInPop
    to
    failInPop = 52-hitInPop

    ReplyDelete
  8. Great post but the colors make it a little hard to compare fisher to phyper. Slight re-order of the colors would make it more clear.

    ReplyDelete
  9. Useful post.
    I have question regarding interpretation of p-values.
    In both the cases (i.e. depletion and enrichment), the p-value >0.05 (if 0.05 is my threshold), what would I interpret? I understand that it is neither enriched nor depleted in the above example!! Is that correct? In other words, if p-value would have been less than 0.05, I would say that this sampling is enriched or depleted? Thanks

    ReplyDelete
    Replies
    1. Traditionally we interpret p-value > threshold as INCONCLUSIVE, since null-hypothesis can only be rejected but never be proved. I think the trend is to abandon p-value once for all: http://www.sciencemag.org/news/2017/07/it-will-be-much-harder-call-new-findings-significant-if-team-gets-its-way . http://www.stat.columbia.edu/~gelman/research/unpublished/abandon.pdf

      Delete
    2. Thank you for reply.
      My question is: Had P-value been < threshold in your example, would we call sampling a) depleted and b) enriched, respectively?
      OR,
      What is the null hypothesis in both the cases in above example?

      Delete
    3. When we set parameter lower.tail=TRUE in phyper (or alternative='less' in fisher.test), p-value <= threshold suggests significant depletion. An interpretation of p-value is the probability of observing equal or more depletion when null hypothesis is true. Similar principle applies to enrichment test. Either way null hypothesis remains the same - samples are randomly chosen from population (thus no depletion nor enrichment). Does this make sense?

      Delete
    4. This comment has been removed by the author.

      Delete
  10. Yes, it easier to interpret now. Thank you.

    ReplyDelete
  11. This has been really helpful to me, thank you!

    ReplyDelete
  12. This comment has been removed by the author.

    ReplyDelete
  13. Great post! Thank you (:

    ReplyDelete
  14. Thank you for this beautifully lucid blog entry.

    ReplyDelete