That’s a question at the root of the study of the origin of life. It’s a lot to do with information theory – which is a whole academic field that I don’t really understand – but we can relate it back to some of the experiments which are dealing with the spontaneous emergence of biology from chemistry, which is really what the study of the origin of life is.
"What genes and genomes do is reproduce themselves by coating themselves in an enormously complicated husk, which is us – an organism."
The question is: how do you get to the first gene, to the first informational molecule? Very cleverly designed experiments by people like Jack Szostak at Harvard have managed to simulate what effectively is the origin of information in biology by modelling not DNA, but RNA – which is DNA’s cousin, of great biological significance in all organisms – and seeing molecules of RNA that have a singular ability, which is simply to reproduce themselves.
What genes and genomes do is reproduce themselves by coating themselves in an enormously complicated husk, which is us – an organism. But, fundamentally, what they are is replicators. What we now think is that the very first replicators simply were replicators – they didn’t have any other biological information in them, other than the simple instruction to replicate themselves. Once you have a molecule which can replicate itself ad infinitum, it will quickly come to dominate a pool which contains similar molecules which can’t do the same thing. When that starts to happen, you can feasibly add functions to that molecule, so it continues to replicate itself, but also does other things.
Once you have variation, it will continue replicating itself, but it will create multiple versions, some of which will behave in a different way to others. Once you have that, you have Darwinian natural selection. When it is simulated in experiments, that is precisely what you see – the molecules which are most efficient at replicating themselves quickly dominate the test tube. When you introduce imperfection into the mix, they do exactly the same again, but with added variation.
So the first bit of information in biology was: replicate yourself, and all the other bits of information were stapled on top of that basic instruction, and that’s where the information in DNA comes from.
There’s loads we still don’t know about DNA, though. Having sussed out the basic rules – replication, evolution – over the course of the 20th century, the big revelation during the course of the Human Genome Project was that when the result came through, it didn’t account for all of the sophistication and complexities of humans. We didn’t appear to have enough genes to allow for the sophistication of humans. Plus, across the human genome – which is three billion base pairs long – almost none of it, less than two percent of it, is actually genes, those units of information.
We do know a lot of what the rest of that stuff is doing – some of it is architecture, scaffolding to make sure that chromosomes behave in the way they ought to. Some of it is control mechanism – like on/off switches, or dimmer switches. But there’s still absolutely tonnes of the genome about which we have no idea what it’s doing. These are bits which are clearly important, because they look similar in different species, but we literally have no idea why they exist.