Biomedical Big Data and associated research is rapidly finding home on public clouds. Part of this comes from maturation of cloud technologies and part comes from the desire to playing well in a research community. Cloud democratizes availability of affordable tools to the broader community. As little as ten years ago, doing large scale computing required building dedicated data centers. These data centers are still a lot more cost effective than Cloud but Cloud takes away the need for upfront investment thus making is possible to explore first.
In this publication, (originally published on bioRxiv pre-print server) use of interactive analytics using Google BigQuery is demonstrated for terabyte scale genomic data. What makes this approach different? Firstly, interactive analysis mode brings unprecedented power compared to batch mode analysis. Data exploration, by definition, is iterative. New exploration strategies are often based on intuition gained from previous exploration. If you can reduce the cost sufficiently and make the exploration real time (or near real time), then one explores faster and further.
Due to nature of Big Data, it may not possible to move data around unless you are affiliated with universities that are on fast internets. So the authors here demonstrate use of Cloud environment for a range of genomic data exploration, all the way from variant calling to QA, GWAS and use of machine learning methods.
The following quote was part of a Google Cloud blog that summarized adoption of Google Cloud at Stanford University.