Monthly Archives: February 2015

Encode your digital data into DNA, then keep it on ice

The Global Seed Vault on the Arctic island of Svalbard. The vault presently stores almost half a million seed samples. Image source: Gizmodo.

Recent articles by Gizmodo and New Scientist follow the publication of an interesting paper by a Swiss team from Zurich’s Swiss Federal Institute of Technology. The paper highlights the challenges of physically preserving digital data for periods of greater than 50 years. It suggests, as an alternative to current methods, sequencing digital data as DNA specifying “we translated 83 kB of information to 4991 DNA segments, each 158 nucleotides long, which were encapsulated in silica”.

The 83kB in question were from the Swiss federal charter from the 13th century and the 10th century Archimedes Palimpsest – both items of heritage deemed worthy of being incorporated into the experiment (and unlikely to present copyright issues). In terms of performance, the article states that accelerated aging was undertaken at 70°c for a week and showed considerable promises of longevity, “thermally equivalent to storing information on DNA in central Europe for 2000 years”. The data was recovered without error, and the team even integrated error-correcting codes similar to traditional approaches to the archiving of digital data. With this experimental evidence in hand, the team’s paper notes that “The corresponding experiments show that only by the combination of the two concepts, could digital information be recovered from DNA stored at the Global Seed Vault (at -18°C) after over 1 million years”.

CyArk uses a bespoke LTFS setup as a practical stable-media solution in the archiving of its terabytes of data, so while side-by-side 83kB may not sound like a lot for our purposes, potentially “just 1 gram of DNA is theoretically capable of holding 455 exabytes”. The economics of this may not add-up yet, as it is indicated that it costed around £1000 to encode this small excerpt of data. Ultimately, if we want to see this level of DNA data storage and access become feasible or an everyday reality, it will need among other things strong market forces behind its adoption and development. I am reminded of this picture of the 1956 IBM RAMDAC computer which included the IBM Model 350 disk storage system (seen below), storing 5MB of data and hired at $3200/month, equivalent to the purchase price of $160,000.

1956 IBM RAMDAC computer included the IBM Model 350 disk storage system. Image source: Faber.se