Cat videos and retrotransposons


I recently needed to explain some aspects of Dawkins’ The selfish gene , and that was when the idea of this post occurred to me. In this post, I aim to explain the concept of “junk DNA” while pointing out similarities to internet memes.

Prior to the Human Genome Project, it was believed that humans had about a million genes. Today, we know that this number is somewhere around 25,000-35,000, depending on the definition. The terribly wrong estimate about the number of genes was based on the known size of the genome (3.2Gb) and the average size of genes (3-5kb). The false assumption, however, was that genes comprise most of our genome. After having sequenced the human genome, we know that the protein coding parts of the genes only take up about 1-2% of the genome. Even if we add the regulatory parts of the genes, we only explain the function of maybe 5-10% of the genome. The rest of the genome is frequently referred to as “junk DNA” .

For most people, including myself, it is hard to digest that the majority of the DNA, the material that codes for all of our biology, is completely meaningless. In the past decades, we discovered that some of the sequences previously deemed junk DNA are actually important and play a role in maintaining the stability or the positioning of the chromosomes. Knowing this, it is easy to subscribe to the point of view that eventually, after we have understood everything about our genetics, we will see that all of what we considered junk DNA has a precise function are part of our genome for a reason. Nevertheless, there are several features of the genome that make one question whether the entire genome really has to make sense. Retrotransposons are such features. Retrotransposons are mobile elements of the genome which can “copy and paste” themselves into multiple parts of the genome. The way they do this is quite simple: they have their sequence copied into RNA, then this RNA is reverse-transcribed into DNA which subsequently integrates into the genome. They are almost like little viruses replicating inside our genome. In fact, they share common features and in some cases common origins with endogenous retroviruses. Retrotransposons take up almost 50% of the human genome. Most of the retrotransposons are inactive either because mutations have eroded their functions or because the regulatory mechanisms of the organism keep them in check. Nevertheless, retrotransposons sometimes manage to create copies of themselves in our genome. When they do, they may contribute to cancer formation or they might disrupt the function of other genes. In some rare cases, the changes induced by retrotransposons may bring about changes that are beneficial to the organism. But in most instances, retrotransposons are detrimental or at best neutral. The most successful retrotransposon, the so called Alu element , is found in over 1 million copies in the human genome.

The spread of retrotransposons in the genome. The DNA of a retrotransposon is transcribed into RNA and translated into into proteins. Retrotransposons code for proteins that facilitate their integration into the genome. (Certain retrotransposons don’t code for such proteins, they use such proteins of other retrotransposons.) Source

Dawkins’ book theoretizes that genes (or better: replicator elements) in our genome have the sole “purpose” of replicating. (Disclaimer: not because they are conscious beings that actively think about replication but because the ones that are not optimized for replication have already been lost.) The main novelty of this theory is that it considers genes instead of genomes as competing units. This means that individual genes don’t have to contribute positively to the survival of the organism, it is enough for them to ensure that they are passed on. In sexually reproducing organisms, such as humans, half of the gene variants in a genome are passed on in each reproductive cycle. One way a gene can increase the chance that it is inherited, is by contributing to the success of the vehicle (host organism), however, if the reproductive success of the vehicle is not likely to be jeopardized, then creating multiple copies in the genome is also a viable strategy. That is the strategy that retrotransposons follow. They create many copies of themselves, which don’t or only mildly decrease the fitness of an organism thus achieving high copy numbers in our genomes.

Another interesting concept in Dawkins’ book is to think about ideas ( memes ) as genes. As replicators which can spread not over generations but in memory, be it individual or collective memory. Memes, unlike genes are passed on not by reproduction but by communication. Useful ideas (e.g. the invention of a wheel) can be expected to spread quickly, whereas useless ideas (e.g. “triangles have four sides”) are soon forgotten. However, useful ideas may be complicated and therefore spread slower and may be outcompeted by memes which are easier to digest and spread quicker. That is how the internet, which is now our collective memory, contains more cat videos than works of art or science. And this is how cat videos on the internet are like the retrotransposons of the genome. The “junk” spreads fast merely because it can, superseding the useful slow. Retrotransposons are so prevalent because they can spread, the same way that cat videos are all over the internet because people like them and spread them, even if they have no further meaning.

The author of this post is himself an avid viewer of cat videos and his genome also harbours millions of retrotransposons. Therefore, this post was in no way to judge readers who indulge in either of these. The aim was merely to draw a parallel between these seemingly distant concepts.

Zsolt Balázs

Bioinformatics in Oncogenomics.