Tuesday, February 16, 2010

Playing with the GWAS Catalog


When I was looking at an article in PLoS Biology, I noticed that the abstract listed a comprehensive government database for genome-wide association studies (the “GWAS Catalog”). This database provides a lot of interesting information. In order to get a feel for the data in the GWAS Catalog, I looked at the data for four specific diseases (autism, prostate cancer, type I diabetes, and type II diabetes).

[If you are a non-scientist looking at this database, stronger genetic associations (which should more accurately predict genetic predisposition to a disease) should have low p-values and should be reproducible between different studies.]

1) Autism - There were 3 studies included in the GWAS Catalog. The first two studies identified the same exact region (but with slightly different variants), and the third study identified a different but nearby region. Although I think that there is probably something interesting going on in this region of chromosome 5, I don’t think it is worth getting very excited about the specific variants identified in these studies. For example, the p-values for the autism studies are the lowest out of the four diseases that I analyzed (meaning autism has the weakest genetic component and/or the genetic component of autism is the most complex to model). Furthermore, the most recent study showed that the expression levels of SEMA5A (one of the genes listed in the GWAS Catalog for autism) are very similar for autistic and normal people (see Fig 2. if you have access to this article). The authors of this study claim that gene expression in autistic patients is significantly lower than in normal patients, but I think the statistical significance may be due to an over-fitting problem because they only look at 20 autism patients and 10 control patients (and I have a hard time believing this was enough data to adjust for “age at brain acquisition, post-mortem interval and sex”). The genes with the strongest genetic association in the first study (CDH10 and CDH9) also have similar expression patterns in both autistic and control patents, and the authors of this first study report that this difference is not statistically significant. Of course, the autism variants may be non-functional yet retain similar gene expression levels, but I would still seriously question the strength of any of the specific variants listed in these studies.

2) Prostate Cancer – I looked at the data for prostate cancer, type I diabetes, and type II diabetes because variants for these three diseases are included in at least two of the three major genomic tests listed in “The Language of Life.”  More specifically, the three major genomic testing companies gave completely different predictions regarding Dr. Collins’ risk of getting prostate cancer. The GWAS Catalog lists 11 studies (10 of which have significant associations), and the genetic associations for prostate cancer were much stronger than for autism (p-values equal 3 x 10-33 vs. 2 x 10-10, respectively). Highly significant genetic associations were found within the 8q24.21 and 17q12 regions in several independent studies, but many associations are only found in individual studies. According to “The Language of Life”, deCODE has 13 variants for prostate cancer, Navigenics has 9 variants, and 23andMe has 5 variants. Based upon what I’ve seen in the GWAS Catalog, I think that there probably are at least 5 strong, reproducible variants that could be used to calculate genetic predisposition to prostate cancer, but I am not certain if there 13 variants with well-established genetic associations. However, calculating genetic association for several variants at the same time can be tricky, and the difference in test results may be a problem with the underlying models for calculating genetic association more so than the individual variants considered for the analysis.

3) Type I Diabetes – Type I diabetes has a very strong genetic component, and the molecular basis for this disease is well understood. In these respects, the data in the GWAS Catalog are a good reflection of what is known about this disease. The strongest associations had the lowest p-value out of all the diseases considered (5 x 10-134 for a variant within the Major Histocompatibility Complex, or MHC), and either MHC or HLA (which is part of the MHC) had the strongest genetic association for 4 out of the 8 studies in the GWAS Catalog. This makes a lot of sense because the MHC displays antigens to immune system (thereby telling the body which cells to attack) and type I diabetes is due to due to an autoimmune response where the immune system attacks and destroys the insulin-producing beta cells in the pancreas. It bothered me that some studies reported pretty different results, but that is why I think that it is necessary to only use reproducible associations for genetic testing.

4) Type II Diabetes – The GWAS Catalog contained 15 studies on type II diabetes (12 of which had significant results), which is the highest number of studies listed for the four diseases that I looked at. The strongest associations for type II diabetes had p-values similar to prostate cancer, but higher than type I diabetes. This makes sense because type I diabetes has a stronger genetic component than type II diabetes, so type II diabetes should have weaker associations than type I diabetes. The 8 genes listed as predictors of type II diabetes in “The Language of Life” (TCF7L2, IGF2BP2, CDKN2A, CDKAL1, KCNJ11, HHEX, SLC20A8, and PPARG) were pretty well represented among the different studies listed in the GWAS Catalog, so I bet the predictors of genetic predisposition to type II diabetes are pretty good.

1 comment:

 
Creative Commons License
My Biomedical Informatics Blog by Charles Warden is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 United States License.