SNP Copy Number and Loss of Heterozygosity Estimation
Compute SNP copy number and loss of heterozygosity (LOH) based on
Affymetrix SNP chip data for paired target/normal samples.
In cancer genomics, copy number change is one of the hallmarks of the genetic instability common
to most human cancers and LOH of tumor suppressor genes is a crucial step in the development of
sporadic and hereditary cancer (Monti, 2005).
Before You Begin
- CEL files from the Affymetrix 500K Array Chip Set (250K Sty, 250K NSP) or
100K Array Chip Set (50K Xba, 50K Hind).
Example file: GISTIC_Hind_subset.zip.
- Optionally, for each CEL file, a TXT file containing the genotype calls for the SNP array.
Example file: GISTIC_Hind_subset.zip.
- A tab-delmited text file (sample information file format) that describes the SNP array. The array must include target/normal paired samples for copy number and LOH determination.
Example file: sample_info_subset.txt.
Step 1: SNPFileCreator
SNPFileCreator converts the CEL files from an array into a GenePattern .snp file.
Raw data for the probes in each SNP probe set are converted to a
single intensity value per SNP using one of four modeling algorithms:
Average Difference, PM/MM Difference Model (dChip, the default), Median Probe, or Trimmed Mean.
20-30 minutes: Processing this example on the GenePattern public server takes time.
The example source data and resulting SNP file are provided here for your convenience:
GISTIC_Hind_subset.zip, GISTIC_Hind_subset.snp.
Considerations
- SNPFileCreator accepts CEL files from the 500K Array Chip Set (250K Sty, 250K NSP) or
100K Array Chip Set (50K Xba, 50K Hind). Each chip set uses two unique high density arrays
to genotype over 500,000 and 100,000 SNPs in one experiment, respectively.
The module converts the CEL files for one array into a .snp file.
To create a .snp file for a chip set,
use the MergeRows module to combine the .snp files for the two arrays.
- SNPFileCreator can transfer the CEL files to the GenePattern server for processing or
read the files from a network directory. Due to the size of the files, best practice is to
store the CEL files in a network directory and process them from that directory.
- SNPFileCreator writes the generated .snp file to a network directory or to the
GenePattern server. Typically, writing the file to the GenePattern server provides
greater flexibility and makes the file available for use in GenePattern pipelines.
- SNPFileCreator creates a .snp file in one of two formats:
Non Allele-Specific (default) or Allele-Specific.
For each sample, the Non Allele-Specific format contains an intensity value
and a genotype call; the Allele-Specific format contains an
intensity value for allele A, intensity value for allele B, and genotype call.
All GenePattern modules accept the Non Allele-Specific format; many do not yet accept the Allele-Specific format.
- SNPFileCreator uses the Human Genome of May 2004 (hg17) to include
Chromosome and Physical Location columns in the .snp file. By default, it sorts the SNPs
by chromosome and physical location, as required by the SnpViewer module.
Step 2: XChromosomeCorrect
For gender-specific samples, run the XChromosomeCorrect module to correct intensity values for SNPs on the X chromosome.
For each sample from a male donor, the module doubles the intensity value for SNPs on the X chromosome.
The sample information file must
include a column labeled Gender that contains a value of M or F for each sample.
Step 3: CopyNumberDivideByNormals
CopyNumberDivideByNormals computes the raw copy number of each target SNP by dividing its intensity value
by the mean intensity value of all normal SNPs. This calculation is referred to as
copy number normalization or normalization with respect to normals.
CopyNumberDivideByNormals creates one of two files:
- .cn (default) does not include genotype calls.
- .xcn includes genetype calls. The SnpViewer module requires genotype calls to detect and display LOH.
Step 4: SnpViewer
SnpViewer displays SNP copy number and loss of heterozygosity (LOH) data using heat maps.
By default, the viewer displays all chromosomes. To zoom in on a chromosome, select it from the
Chromosome drop-down list in the viewer tool bar. To zoom in on a region of the chromosome,
select the zoom in tool and click on the heat map.
Considerations
- To view a .xcn or .cn file that is on your local drive, run the SnpViewer without specifying
any parameters. When the viewer starts, use the File menu to load the file into the viewer.
If you specify a local file as a parameter to the SNPViewer module,
GenePattern copies the file to the GenePattern server, which can be time consuming for a large
.xcn or .cn file.
- To detect and display LOH calls, the
SnpViewer module requires genotype calls (.xcn file format). The LOH calls are:
- Loss: AB in normal and A or B in target.
- Retained: AB in both normal and target or No Call in normal and AB in target.
- Conflict: A or B in normal and AB in target.
- Non-informative/no call: A or B in normal or No Call in normal or target.
- By default, the viewer displays all chromosomes. To zoom in on a chromosome, select it from the
Chromosome drop-down list in the viewer tool bar. To zoom in on a region of the chromosome,
select the zoom in tool and click on the heat map.
- To sort samples by copy number amplification or deletion: zoom in on a chromosome; select the
define a region of interest tool; in the heat map, click the start of the region of interest and
then the end of that region; a red bar to the left of the heat map defines the region of interest;
right-click the red bar and select sort of amplification or sort of deletion.
Reference
Monti, S. 2005. Class slides: SNP microarrays and high-density genotyping.
http://www.chip.org/teaching/hst950/slides/class6.pdf.