Secure Cloud Computing for Genomic Data

Figure from: Peer reviewed commentary on Nature Biotechnology 34, 588–591 (2016)

Figure from: Peer reviewed commentary on Nature Biotechnology 34, 588–591 (2016)

Large scale genomics studies involving thousands of whole genome or exome sequences are underway on Cloud. What makes the Cloud security landscape discussion challenging is that security recommendations differ across regulatory bodies, besides being inconsistent between on-premise and Cloud requirements. For example, Institutional Review Board (IRB) often require Health Insurance Portability and Accountability Act (HIPAA) level Cloud security even for non Protected Health Information (PHI) data. In another example, Database of Genotypes and Phenotypes (dbGAP) has different encryption requirements for on-premise and Cloud environments. This peer reviewed commentary provides the genomics community with a set of Cloud security guidelines that will meet a wide range of regulatory requirements. Although the Cloud technology stack will continue to evolve rapidly, thus changing the specifics of implementation, these guidelines will be applicable for the foreseeable future.

While security is a necessary pre-requisite for genomic privacy, it is not sufficient. Privacy researchers have shown time and again that availability of de-identified partial genomic data can result in patient re-identification. Several studies suggest that algorithmic methods such as partial homomorphic encryption , secure multi-party computation or differential privacy can provide the necessary privacy protecting layer within such an architecture. The extent to which these algorithmic methods can be integrated with genomic workflows, statistical and machine learning tools are under active investigation.