Skip to content Skip to footer

The Researcher’s Dilemma: Protecting DNA Data in a Data-Driven World

For many consumers, at-home DNA tests can be an entertaining way to unearth clues about their heritage. Many are thrilled by the opportunity to discover hidden family secrets such as a long lost relative. However, few stop to think about the privacy consequences of spitting into a tube and sending their genetic material to an unknown lab. 

In the year 2018, consumers purchased enough at-home-DNA kits to match the number purchased in the previous six years combined, according to the MIT Review. If that trend continues, genetic testing companies could be sitting on the information of around a third of the entire US population in the next two years. As the interest in genetic testing companies increases, so does the desire to keep personal genetic information out of unwanted hands. 

Genetic data can reveal such private information as: 

  • Predispositions and disease risks
  • Potential medical conditions
  • Information about family members or future children
  • Information that might not be presently understood but could be known in the future
  • Culturally sensitive information

So how do the DNA giants go about protecting this information?

How was genetic data protected in the past?

Computer code that is out of focus on the edges and clear in the center.
One of the biggest issues in genetic research has always been sharing DNA data safely. How was sharing genetic data approached in the past?

To better understand how DNA data was handled in the past, the most remarkable example comes from the first person to have their entire genome sequenced. That person was DNA visionary and geneticist James Watson. In 2007, all 6 billion base pairs of his DNA were sequenced and made public for other researchers. However, he did leave some information out, at least, so he thought. 

He purposely left one part of his sequence blank. That spot was the identifiable long arm of chromosome 19. That is also the exact spot where the APOE gene resides. Certain variations of the APOE gene will increase one’s chances of developing Alzheimer’s, and Watson wanted to keep that information to himself. 

However, other researchers were quick to point out that Watson’s APOE variant could be predicted by looking at the surrounding DNA signatures. Researchers chose to redact an additional two million base pairs to prevent the APOE gene from being predicted because of these possible predictions.

The Researcher’s Dilemma

While Watson’s contributions to genetic research paved the way for modern breakthroughs, it also pointed out one of the biggest catch 22’s of contemporary medicine. That is, in order to benefit the greater scientific good, some amount of private information needs to be obtained. 

To make the process completely anonymous, you would have to eliminate the same identifying details that make the information valuable to scientific research. Instead of eliminating identifying data, researchers are working on perfecting encryption methods to keep that information safe and secure. 

Common Genetic Encryption

A computer generated closeup of a DNA spiral, showing each individual chromosome.
There are many forms of encryption, but when it comes to DNA, are all created equally?

Early genetic encryption began with the goal of computing a group of outputs without ever knowing their inputs. While this sounds like an impossible task, it became one of the most well known mathematical algorithms for encrypting DNA. Using what later became known as “genome cloaking,” scientists were able to compare patients’ data and associate mutations based on symptoms, all while keeping 97% of the genetic information wholly hidden. 

They were able to do this by assigning each variation within the genome a linear set of values. Using only this data, they were able to conduct their analysis by using only the genes that were relevant to their study. 

The process can be related to trying to spot errors or “bugs” in lines of computer code. These disease-causing traits can be singled out, just like a flawed section of code. The difference is, genetic data is much more sensitive, and people would be more likely to worry if rogue hackers obtained it. 

Fully Homomorphic Encryption

A series of DNA spirals in the background with a futuristic computer generated skeleton in the foreground.
Fully Homomorphic Encryption, or FHE is commonly used by genetic companies to encrypt DNA.

Even the standard methods of encryption used today do not make our data 100% secure. If a computation needed to be made to perform medical genetic testing on a genome, data would have to be decrypted first. That means that raw data would be exposed to leaks or theft, even for just a brief moment. 

Enter Fully Homomorphic Encryption, which has been in development since the 1970s. Using the FHE method, the data never gets encrypted, thus is never open to theft. The information remains encrypted at all times, whether it is being stored or transferred. This method handles data so that it is not even known to those who are processing it. The only way to decode the information is by using a secret key held by the recipient. 

FHE’s methodology should mean that even when quantum computers gain the ability to break cryptography, they will not be able to break through this encryption without the key. The FHE technology basis relies on lattice encryption that distances the data as a point in a collection. Calculating the exact location of the lattice point is extremely difficult, even with quantum computing power. 

However, even such a powerful tool as FHE has its limits when it is applied commercially. As it stands, the “Privacy Best Practices for Consumer Genetic Testing Services” states: 

“We note that currently, Genetic Data held at the individual-level that has been de-identified cannot be represented as strongly protecting individuals from re-identification, based upon existing de identification tools and standards.”

So how could you further guarantee that an individual isn’t re-identified from their encrypted data? Surprisingly, the answer is based on a technology that many of us use every day. 

Genomic GPS

Using genetic data from an individual can significantly benefit researchers by comparing genetic traits to a public reference. By making this comparison, they can determine useful information such as the sample’s geographical origins or conduct a more extensive population’s genetic analysis. 

However, many are hesitant to offer up such personal information due to privacy concerns, making sharing research quite tricky. This sensitive information can be obtained and transferred with Genomic GPS (Genomic Global Positioning System) while keeping the source safe and secure.

Genomic GPS uses a method called “multilateration,” which is already utilized in modern GPS systems. This technique allows the source data to be converted into a data point that is only useful when compared as a “distance” to the reference point. In other words, this converted data makes it impossible to reconstruct individual DNA datasets and is only useful when compared within the calculations.

Genomic GPS makes it possible to share genomic data without worrying about privacy breaches or ethical concerns. By producing sound and secure data, this technology could lead to a better understanding of how genes and regional origins contribute to an individual’s overall health and even benefit entire populations’ well-being. 

Genealogy Care hopes to bring Genomic GPS technology into the world of DNA data security. However, this technology is not limited to customer-facing applications. We hope to adapt this technology to various research and commercial applications to make sharing data more secure.

What legal protections are in place?

A judge's gavel used in the courtroom to determine legal cases.
Legal protections are put in place on both the state and federal levels. In the past, these legal protections had gaps that allowed companies to sell private data.

To help fill in the gaps in the current law and protect the American public, California has recently passed a piece of legislation known as the Genetic Information Privacy Act (or GIPA for short). Many were unaware of the need for such a law, but there existed a glaring hole in U.S. healthcare policies that allowed genetic companies to play fast and loose with private DNA data. 

The GIPA is aimed at placing more obligations on consumer companies who handle the DNA samples of their customers. It requires these companies to offer protection and only allows them to use this information for limited purposes. 

HIPAA Gaps

Many are aware of HIPAA regulations, which were put in place to protect the privacy of patients. Still, many are surprised to learn that this law only applies to companies that offer medical care. Providing information to a testing company to determine your genetic chances of having B.O. is not protected by state or federal laws. 

That means that your valuable information can be used by law enforcement or sold to discriminating insurance companies. 

GIPA effectively addresses some of these privacy issues. It requires that companies have express consent from the consumer to collect and use DNA information but also separate permission if they wish to transfer or market DNA data. Additionally, consumers have the right to revoke their consent. If they choose to withdraw consent, companies must destroy any biological samples in 30 days or less.

Genealogy Cares About Privacy

Genealogy Care is an advocate for consumer privacy protections and security measures. We believe in full transparency because no customer has time to read through 20 pages of fine print. 

While the thought of sharing your personal genetic data may seem like a risky proposition, we strive to offer the latest and most secure security measures to keep our customers safe and to give them peace of mind.

Maintaining a healthy lifestyle shouldn’t be stressful. You can trust that when you submit your DNA data, you are fully protected by the latest encryption technologies.

Show CommentsClose Comments

What's on your mind?