Home About Articles 2011 Biomedical Informatics Training (BIT) Program Symposium

2011 Biomedical Informatics Training (BIT) Program Symposium

This years topic is about High–throughput sequencing coupled to chromatin immuno–precipitation (ChIP–Seq) is widely used in characterizing genome–wide binding patterns of transcription factors, cofactors, chromatin modifiers, and other DNA binding proteins. A key step in ChIP–Seq data analysis is to map short reads from high–throughput sequencing to a reference genome and identify peak regions enriched with short reads. Although several methods have been proposed for ChIP–Seq analysis, most existing methods only consider reads that can be uniquely placed in the reference genome, and therefore have low power for detecting peaks located within repeat sequences. Here we introduce a probabilistic approach for ChIP–Seq data analysis which utilizes all reads, providing a truly genome–wide view of binding patterns. Reads are modeled using a mixture model corresponding to K enriched regions and a null genomic background. We use maximum likelihood to estimate the locations of the enriched regions, and implement an expectation–maximization (E–M) algorithm, called AREM, to update the alignment probabilities of each read to different genomic locations.

Program Schedule and Abstract