Privacy preserving data sharing for genome-wide association studies

Abstract

The protection of privacy of individual-level information in genome-wide association study (GWAS) databases has been a major concern of researchers following the publication of “an attack” on GWAS data in Homer et al. (2008). Traditional statistical methods for the confidentiality protection do not scale well to deal with GWAS databases and external information on them. The more recent concept of differential privacy provides a rigorous definition of privacy with meaningful privacy guarantees in the presence of arbitrary external information. Building on such notions, we propose new methods to release aggregate GWAS data without compromising an individual s privacy. In this talk, we give a brief overview of challenges associated with protecting confidential data, and present methods for releasing differentially private minor allele frequencies, chi-square statistics and p-values. The proposed methods are compared on simulated data and on a GWAS study of canine hair length. Time permitting, we may also discuss preliminary results on the risk-utility analysis on a dataset consisting of DNA samples collected by the Wellcome Trust Case Control Consortium (WTCCC), and a privacy-preserving method for finding genome-wide associations based on a differentially private approach to penalized logistic regression.

Date
Location
519 Wartik Lab, with video to Hershey Room CG628
Event
Seminar