This routine, which Jonathan calls ‘metaclustering,’ has become a primary tool in our work. Likewise, the Genescreen tool he created, which compares gene expression value vs. outcome for all genes as a group has allowed us to readily identify important prognosis genes that might have therapeutic value in the future.

Broudy: What about visualization tools?

Triche: Genetrix has all of the standard visualization tools such as principal component analysis, multi-dimensional scaling, hierarchical clustering with or without expectation maximation, support vector machines, similarity matrices, scatter analyses, and meta-clustering, which uses these tools to define class in a more robust, repetitive manner. We use them all.
We’ve taken the approach that all these tools are useful, but none is perfect. We like to compare results from different methods and choose those methods and gene lists that appear to be the most reproducible.

More and more, we are using MetaCluster which creates a metagene. A metagene is a group of genes that act as a cancer signature. For example, you can set an arbitrary cutoff rate based on P-values and then test the power of that group to predict important parameters, such as prognosis. The group of genes is treated as a single gene predictor, and incorporates both up- and down- regulated genes. We have found this approach to be very powerful.

Will we miss some important genes? Probably. Will we miss a lot? I doubt it. Will we miss the most important ones? I would be very surprised if we did. We all recognize these are imperfect tools, but the pleasant surprise is that biologically important genes are being identified by virtually all groups using these tools for purposes such as these. Too much has been made of the lack of correspondence between the gene list from one study when compared to another. People forget that many factors determine whether a specific gene will be included in a given list. Typically the most important genes will ap-

pear in most if not all lists. That is the real importance of these studies. We were pleased to see that a European group validated most of the genes we found in RMS in their recent publication, for example. This couldn’t have happened if the results weren’t grounded in biologic reality.

Broudy: How do you go from a large set of microarray data to a double-digit signature?

Triche: If we start off with the U133A chip that contains about 22,000 genes, we typically find less than 10,000 expressed at significant levels. From that group of expressed genes, many classification tools will extract dozens to a few hundred genes significantly associated with a class or outcome. We typically use a cutoff of p < 0.001. When we use a reiterative testing algorithm like metaclustering, the number is further reduced, typically to double digits. Depending on the patient dataset (size, complexity, accurate class distinction, and so forth), this number may drop further. In one prognostic analysis, we found little difference between 50 genes and as few as 10. Further analysis on the same dataset, and on separate datasets tests using leave-n-out analysis tests the reproducibility of those genes to make a distinction such as a diagnosis or prognosis.

Signature validation
Broudy: How do you go about validating your signature?

Triche: We have done quantitative polymerase chain reaction (QPCR) with sequence validation, tissue immunohistochemistry on individual tumors and tissue microarrays (TMAs) on over a hundred cases at a time. We identify the most reproducible genes from the expression signatures, find appropriate antibodies from commercial vendors whenever possible, validate the antibodies on Western blots as well as on fresh frozen and paraffin-embedded tissue, and analyze completely unrelated cases from a different source to determine that


the proteins are present in these cases. So far, we have found a good correlation with the Affymetrix data using quantitative real-time PCR and histochemistry data from the 3-way comparison.

We are also interested in adapting proteomics methods for our purposes. These mass spectrometry methods offer more quantitation and would free us from the need to identify and validate antibodies, which are themselves

©2007 Affymetrix. All Rights Reserved.