HGM2002 Poster Abstracts: 1. Genome Informatics and Annotation
POSTER NO: 12
The Sequence and Comparative Analysis of Human Chromosome 20
Chromosome 20 represents ~2.2% of the human genome and is the first metacentric chromosome to be 'finished' (Nature, December 2001). The clone map, which was assembled by fingerprinting and STS content analysis, is in 6 contigs of which one spans the entire p arm. The four gaps in the q arm have all been sized by fibre FISH and together account for less than 320 Kb of DNA. We sequenced a set of 629 overlapping clones and generated 59,421,637 bp of sequence representing 99.4% of euchromatic DNA. The finished sequence was analysed on a clone by clone basis using a combination of similarity searches against DNA and protein databases, as well as a series of ab initio gene predictions. We annotated 727 genes, and 168 pseudogenes on the basis of human interpretation of the combined supportive evidence. Excluding pseudogenes, chromosome 20 has a gene density of 12.18 genes/Mb, which is intermediate to 6.71 (low) and 16.31 (high) reported for chromosome 21 and 22, respectively. We annotated 660 CpG islands in the sequence and predicted 1,432 transcriptions start (TS) sites using Eponine (T. Down, unpublished), which is a probabilistic TS site detector program. Through analysis of sequence overlaps we identified 6,085 new SNPs which combined with 26,678 previously reported SNPs resulted in 32,763 unique SNPs on chromosome 20. Comparative analysis of chromosome 20 to circa 16 million mouse shotgun reads (Mouse Sequencing Consortium) and to a 2X sequence coverage of the genome of the pufferfish T. nigroviridis (Genoscope) indicated that the reported analysis may account for over 95% of all coding exons and almost all genes. Since publication we have closed one of the four gaps in the clone map and identified a mouse BAC contig which on the basis of homologous BAC end-sequences seems to span the syntenic region of chromosome 20 harbouring the other three gaps. We established an experimental pipeline to verify/extend all annotated gene structures. In addition, we are starting the PCR-amplification of every annotated exon from 48 individuals to identify SNPs by sequencing.
Other abstracts in same session