10 Thousand Genomes and Counting

By Eran Schenker

President Clinton is scientifically remembered for his announcement of the start of the genomic era by declaring the achievement of sequencing the 1st human genome for the ‘modest’ cost of 3 billion dollars. 15 years later, on January this year, President Obama contributed to what seems to be the natural progression of that initiative, announcing the new goal of sequencing 1 million genomes. Since it was first sequenced, the price of genomic sequencing has rapidly decreased, but now scientists have reached a price tag much lower than anyone expected.

Invited to give a talk in front of Biotechnology students at Columbia University, Dr. Joe Pickrell, from the New York Genome center, asserted that by approaching sequencing as a task of achieving specific goals, sequencing could cost as little as 5$. Pickrell who has started his scientific career in statistical genetic analysis of evolutionary biology, has witnessed the progression of genomic sequencing and decided to add ‘forward-genetics’ (the search for the genetic source of phenotypes like traits and diseases) to his research efforts. Pickrell explained that research in some genomic evolutionary biology areas (that tracts pre-historic demographics and migrations) can rely on a relatively small numbers of sequenced genomes. On the other hand, genome wide association studies (GWAS) of human traits that is used in the biomedical space, need many more sequenced genomes. As it happens, genomic sequencing finally got to the critical point in time where there are just enough genomes (~10,000) that the biomedical field can use to get meaningful results.

The true power of studies in the genomic era is demonstrated in Pickrell’s GWAS of 18 human traits paper(1). “most of the time spent on this paper was e-mailing people again and again, nagging them to send me their data” Pickrell pointed out with a smile. At the dawning of the age of genomes, it’s not surprising to see ‘traditional’ biologists shifting their efforts from the “hardware” (the wet lab) to data analysis, a job that could be done on the laptop. Pickrell, who was always on the computational side of things, combined data from academic labs and several open data banks like ’23 and me’ and ‘genome unzipped’. The scope of the study overruns the ‘one trait, one gene’ old framework. SNP variations were used to find correlations between them and a number of traits like height and diseases. Subsequently, traits could be correlated to each other, with the hope of maybe revealing a mutual mechanism.

Pickerel exceeded with the significance of this type of studies and argued that they don’t just show correlation, but they can infer causality: “genetic variation can be a leading cause to a disease, but a disease can’t cause genetic variation”. In order to ‘prove his concept’, Pickerel did his study on a known mechanism: High LDL as a cause for CHD (coronary heart disease). His Results have shown that variants of high LDL were correlated with CHD, yet the reverse analysis (CHD to LDL) did not show correlation. This relationship could be clarified If we think about it in terms of Simple formal logics: If all A->B, it doesn’t necessarily mean that all B->A.

The new methods used in statistical and computational biology look very promising. But could some fundamental questions be missed along the way? One of the definitions of a disease is – ‘a disruptive biological mechanism that leads to disruptive physiology’. But what if we don’t know the mechanism, like in AD (Alzheimer’s Disease)? It could be that different individuals with AD (that have different genomes) have different, independent mechanisms that cause the same pathology. If that is the case then not only that we might find different genes causing AD in different patients, but maybe we would have to redefine what AD is. Another question might be: ‘By using these methods, somewhere along the way, could we tackle somatic mutations in specific tissues? Coming back to AD as an example, somatic mutations in brain tissue could very well be the most significant attribute to the disease. Indeed, only a minority of AD cases analyzed today have a “genetic factor”, and most cases are defined as sporadic. Hopefully these questions could easily be answered once we reach the 1 million genomes mark.

1 Pickrell (2013) Joint analysis of functional genomic data and genome-wide association studies of 18 human traits.