Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing
Ryan D. Morin, Matthew Bainbridge, Anthony Fejes, Martin Hirst, Martin Krzywinski, Trevor J. Pugh, Helen McDonald, Richard Varhol, Steven J.M. Jones, and Marco A. Marra
BioTechniques 45:81-94 (July 2008) doi 10.2144/000112900
Sequence-based methods for transcriptome characterization have typically relied on generation of either serial analysis of gene expression tags or expressed sequence tags. Although such approaches have the potential to enumerate transcripts by counting sequence tags derived from them, they typically do not robustly survey the majority of transcripts along their entire length. Here we show that massively parallel sequencing of randomly primed cDNAs, using a next-generation sequencing-by-synthesis technology, offers the potential to generate relative measures of mRNA and individual exon abundance while simultaneously profiling the prevalence of both annotated and novel exons and exon-splicing events. This technique identifies known single nucleotide polymorphisms (SNPs) as well as novel single-base variants. Analysis of these variants, and previously unannotated splicing events in the HeLa S3 cell line, reveals an overrepresentation of gene categories including those previously implicated in cancer.
INTRODUCTION Numerous methods exist to characterize transcriptomes and to measure gene expression at both the transcript and exon level (1–5), and recent work (6–8) indicates that much more of the genome is transcribed than previously thought. Techniques to map transcriptional start sites (9–11), quantitatively measure splicing events (12,13), and discover mutations in the transcriptome (14) have been described. Analysis of full-length cDNA data generated from capillary sequencing generally assists in gene discovery and refinement of gene annotations (15), but this approach is limited by the high costs and throughput