|
MOUTAIN VIEW, Calif, October 27, 2005 — This year Perlegen genotyped 4.6 million SNPs in the second phase of the HapMap, almost twice as many as expected; the overwhelming success of the project shows the tremendous potential of leveraging high-density microarrays and large-scale collaborations, said Chief Scientific Officer David Cox.
The phase 2 grant allowed us to put into the public databases an even more dense SNP resource than Perlegen published on its own," said Cox. "It’s a perfect example of how this public and private collaboration was able to produce a much better product than either could have alone."
Perlegen used the same Affymetrix whole-wafer technology that had yielded 1.6 million SNPs in its initial 2002 haplotype study. That study was based on a set of 71 Americans of European, African, and Chinese ancestry (Hinds, Science 2005), and the data from that study was released to the public in 2005. Now, Perlegen is using that data and the HapMap to help scientists explain and predict the effects of prescription drugs in clinical trials.
Cox spoke with Dr. David Craig from the Translational Genomics Research Institute (TGen) about the current and future prospects of whole-genome associations studies using the data generated from the HapMap. The two discussed:
| |
The way HapMap data confirmed hypotheses underlying the project
Pooling strategies and association study design
The future role of genetics in patient care
|
Genome Structure Confirmed by HapMap
Craig: The HapMap is nearing one of its first major publications. I am wondering what you think the biggest |
surprise is thus far? Perhaps, something that maybe you didn’t expect a year or two ago?
Cox: The best news is that there are not very many surprises. The whole basis for beginning a human haplotype map, both originally at Perlegen and in the International HapMap Project, was the belief that a common set of variants would give you useful information about people from different geographic or ethnic origins. I think that is turning out to be the case.
Some SNPs are unique to a population, but it remains to be determined how important they will be as the basis for differences in disease or drug response
compared to SNPs that are shared across populations. The HapMap allows us to actually test common sets of SNPs instead of resequencing everybody’s genome each time we want to understand differences. It would be great to do complete resequencing, but right now that’s still cost prohibitive.
The second important assumption of the HapMap project was that due to the correlation structure, it would be possible to select subsets of SNPs that would give much of the information from all common SNPs. I think that the data in Perlegen’s publication in Science in February 2005, and now denser data from Phase 2 of the HapMap Project, suggest that is the case. With a
relatively small subset of all of the common human SNPs, you can encompass much of the information content of the complete set of SNPs; that’s good news because otherwise studies would be cost prohibitive.
Craig: One goal of the HapMap was to identify a minimal number of SNPs, or "tag SNPs" that when genotyped, can sufficiently characterize an individual given the overall genetic heterogeneity of the population.
|

There has been a lot of debate over the minimum number of tag SNPs you need to retrieve the majority of information in the genome. How many SNPs do you need to cover the underlying genomic structure in most populations?
Cox: Covering the entire genome is a relative issue. The SNPs studied in HapMap and studied by Perlegen now are common SNPs, these are variants with a minor allele frequency of five percent or greater, which means that they’re relatively common in the population. There are many, many more SNPs that are very rare. So when people are talking about coverage of the genome today, they are really talking about coverage of the common SNPs. It remains to be seen how important rarer SNPs are
|