Sequencing Life for the Future of Life — Earth Biogenome Project
What makes you, you? This a question which has puzzled philosophers, scientists and probably every person who has ever lived since the origin of cognitive thought….and the answer you get probably depends on who you ask… But in the 1990s a global network of genetic researchers and experts planned to answer this question in the most literal way possible, rewriting the book on how we understand humanity, and creating a new field of research in the process.
The Human Genome Project was tasked with decoding and translating the entirety of the human genetic code, the information locked away within our DNA. To find out how just four letters (A, T, G and C) can dictate how all human life functions. With the technology and expertise researchers had at the time, this task was completed in 13 years, with the final results being published in 2013, well almost all 92% to be precise… For a long time that was deemed “good enough”, but they actually got around to decoding the final 8% a few months ago.
Now, almost 20 years later, genome sequencing technology has come a long way, and the Earth Biogenome Project (EBP) hopes to take a few leaves out of this book and do something that would have been unimaginable back then. The EBP wants to map the genetic data of all of the known eukaryotic organisms on planet Earth. That includes every plant, animal, and fungi species in existence today — 1.5 million species. Although even they recognise the enormity of this task, they are confident that this is indeed an achievable goal, even within the TEN YEAR time frame they have given themselves!
“To sequence everything in the world — that is the reason we are here.”
— HUANMING YANG, BIOGENOMICS 2017 CONFERENCE, FEBRUARY 24, 2017
But why has the EBP set themselves this massive challenge; and what do they hope to achieve by doing so?
Launched in 2018, the EBP was established as a global and international research effort, and they hope that this research and fount of genomic data will help humanity in three ways;
● to better understand the science behind biology and evolution, and how life on Earth came to be,
● to help in the fight to conserve and protect threatened species and environments around the world,
● and to help drive innovations in biomedical sciences and bioprocessing.
These research efforts will be shared with 25 institutional partners and 19 affiliated organisations. With a massive collaboration with networks of genomic scientists, biobank organisations (such as The European Reference Genome Atlas project- ERGA) that have preserved DNA samples of a wide variety of organisms, as well as teams of skilled field workers, the boots on the ground, collecting the remainder of the estimated 1.5 million known eukaryotic species. Through these combined efforts they strive to digitise this genetic information, the product of 3.5 million years of evolution, and make this data available online in a digital repository of life. An exabyte of information and the largest data storage project ever undertaken in scientific history.
Is this even possible?
This task is by any definition, very ambitious; even project chair Prof Harris Lewin from the University of California describes these targets as a “moonshot”. However, they are keen to say that with current technology and resources available to them, it is indeed possible. The time and resource costs associated with genetic sequencing have dropped dramatically since the birth of this discipline. In fact, the monetary cost of sequencing a single genome has dropped almost a millionfold, from almost 3 million dollars that was spent by the human genome project 20 years ago, down to less than 1000 dollars today. In part, this reduction in associated costs is down to increasing efficiency and streamlining of the genome sequencing process, with larger run lengths (the size of each chunk of DNA) that are able to be processed at a single time, meaning that fewer numbers of individual analyses need to be done and subsequently stitched back together in order to completely map an entire genome, increasing the accuracy and decreasing overall processing time.
So, in 2022, how far are they from achieving this goal? In a 2019 report, the EBP team announced that during these early stages of the project, they have managed to sort out the majority of the back-end governance needed for such a large-scale project, but also have started analysing a number of genomes. Working closely with biobanks (such as the ERGA) they have been able to prioritise their efforts so far, and work their way through the samples which are currently available to them, species that they deem as of more scientific or conservational significance to the academic community. As of 2019, a total of 3,300 species have been analysed and their genomes documented. With 800 new records being done since the start of the project. However impressive this is, it is still only a drop in the ocean compared to the projected 1.5 million species they hope to ultimately analyse. But as the project progresses, the EBP hopes that the number of genomes that will be sequenced will ramp up greatly from this number. They also hope that this project will drive increased innovation in research and technology, including improvements in the capabilities of remote labs, autonomous vehicles for sampling, robotics and AI, as well as advancements in sequencing technology and computational power.
But why has the EBP put themselves under so much pressure to digitise these genomes in such a narrow timeframe? Well, unfortunately, time is not on our side! The planet is in the midst of a sixth mass extinction event, with many experts laying the blame solely at humanity’s feet. As a result of decades of habitat destruction and the runaway effects of climate change, the environment has continued to take one for the team. With 52% of all vertebrate populations showing signs of decline in the past 40 years, and 20,000 species currently on the IUCN endangered species list, (1000 x more than a decade ago), we need to act now. If nothing is done, the wealth of this planet’s biodiversity and the knowledge locked up within its DNA will be lost, causing irreparable damage to the environment, and the loss of all the potential information found within, the product of 3.5 billion years of evolution, that could have been used by humanity to drive innovations within the fields of bioservices, bioscience and biomedicine: the birth of a new industrial age, the age of biosciences. But even if the EBP manages to preserve this data for future generations of researchers, this is just a small chapter of a much greater tome. These 1.5 million eukaryotic species equate to just 10% of the total projected biodiversity that calls the Earth their home, and it may be too late for humanity to preserve many of the organisms, as they become lost to time and to us.
Take a deeper dive into the European Reference Genome Atlas below!
The European Reference Genome Atlas — Reference genomes for biodiversity research and conservation
Written by: Camila Mazzoni, Ann Mc Cartney and Jan Zwilling
As the Anthropocene unfolds, processes of environmental change accelerated to become crises: extinction crises, climate crises, and landscape degradation crises.
Europe is no exception to this, with approximately one-fifth of the 200,000 European species at risk of extinction. This alarming trend is being addressed by European policies, such as the European Green Deal and the European Biodiversity Strategy for 2030. Yet, it is urgently required to also deepen our knowledge about the fading biological diversity and to learn and understand what we need to protect — and how to achieve it.
To do this, a group of almost 700 scientists from 36 European countries — including all 27 European Union members — has formed the European Reference Genome Atlas (ERGA). It focuses on some of the smallest components of life that we should urgently know more about, the genome. The goal of ERGA is to generate reference genomes representing all European biodiversity, creating what is essentially an encyclopaedia of construction plans of life. This resource will be key to understanding demographic trends of European species, predicting how species may cope with habitat disruption, and assessing their level of resilience to pathogens and climate change.
ERGA as a pan-European collaborative structure is embedded in an even larger “network of networks”: It is the European hub of the Earth BioGenome Project and aligns with its goals, guidelines and principles. ERGA strives to promote, among other things, scientific excellence in genome construction and analysis, an increase in taxonomic, geographical and habitat representation of genomes in a balanced manner, and inclusion of scientists in all socially diverse aspects. All of ERGA’s principles can be viewed here:
In fall 2022, the first phase of the ERGA network ends and transitions into the next leap. Finalising the Pilot Project means that the ERGA members achieved — in a grassroots manner — in building a pan-European distributed genomics infrastructure that was accessible and inclusive to all. Research labs, bioinformatics groups, European sequencing centres, computational infrastructures, and commercial companies worked for hand in hand building the resources necessary and sequencing the first 95 reference genomes from species from all over Europe. This success will be taken to the next level starting in September 2022 with an expansion of the distributed infrastructure, the sequencing of additional ~500 genomes and the support of projects across Europe, providing state-of-the-art guidelines, training, and access to genomics infrastructures. With this clear plan and a joint spirit and motivation to push forward, the ERGA members intend to multiply the outcome of the project and move closer to sequencing the European branch of the tree of life — to understand it and to protect it.