Dissecting statistical genomics and psychiatric epidemiology...

Are at Home DNA Tests Scamming You?

At-home, direct to consumer DNA tests are based on good science but even better, yet potentially unethical, marketing. In this post, I want to reflect on the science of at-home DNA testing and what are realistic expectations to have from the resulting information.

Home DNA tests, or direct-to-consumer (DTC) DNA tests, are a fairly lucrative industry, expected to hit $2.3 billion market value in 2030. In 2019 alone, MIT Technology Review estimates over 26 billion people have taken a DTC DNA test. About 52.8% of those tests were supplied by Ancestry.com and 34.0% by 23andMe ($ME).

There are several well-established companies in this market, including Ancestry.com, 23andMe ($ME) with a current market capitalization at $440M, Living DNAFamilyTreeDNAMyHeritage, etc are some of the dozens start-ups in the DNA space. 

Majority of these companies operate on a very simple model: consumers purchase a saliva kit which they receive in mail and return to the company, the company extracts DNA from these saliva kits and analyzes it using DNA microarray chips or sequencing, the company builds a profile using the extracted genetic data and currently available research, and the final deliverable from the company is the summary of their analyses.

There are two main trait categories that consumers are interested in and companies analyze: ancestry and health. Ancestry reports give us information about the origins — helps consumers construct a profile on their global family and/or ethnic descent. Health reports give us information about health traits like disorders or characteristcs rooted in or influenced by DNA. Let’s dig deeper into these two concepts.

Ancestry

Ancestry is a genetic and sociocultural crossing. From a sociocultural perspective, ancestry reflects our social and cultural heritage — our cultural identities and practices, our social norms and tendencies, our religious and political philosophies, etc are all influenced by our ancestors.

From a genetic perspective, ancestry reflects our recent biological origins — since people tend to mate with those within their proximity, groups of individuals that form clusters within which most of mating happens also form characteristic genetic profiles. This is ultimately what we geneticists call a population.

Strictly genetically speaking, the definition of the term population is a bit more complex — ecologists will focus on a particular set of characteristics that evolutionary biologists won’t and vice versa. The one I personally gravitate towards was outlined in a textbook Evolution by Futuyma:

A group of conspecific organisms that occupy a more or less well-defined geographical region and exhibit reproductive continuity from generation to generation.

Of course, the definition above is strictly biological, but it encompasses the lurking features discussed above. Take, for example, my origins: southwestern Bosnia-Herzegovina. It’s a well-defined geographical region positioned between Dinaric Alps in the north and Adriatic sea in the south. This population is fairly (but not completely) geographically isolated and experiences limited immigration, and is substantial in size to facilitate reproductive continuity.

Due to such isolation, this population has also had time to develop characteristic and distinct sociocultural norms and practices. Thus, this is not only my ancestry as reflected in my genetics, it also my ancestry as reflected by my culture.

Figure 1: Screenshot of the summary of my 23andMe ancestry report summary. Captured on 11/6/2023.
Figure 2: Screenshot of the top-scoring country of my 23andMe ancestry report summary. Captured on 11/6/2023.

This is also reflected in my 23andMe report (see Figure 1). Entirety of my genetic ancestry is captured by southern European (90.6%) and eastern European (9.4%) populations. In fact, the heat map of the top-scoring country of my 23andMe report (see Figure 2) pinpoints the county I should be most closely related to. 

Notice, the top-scoring country was Croatia and the top-scoring county was Split-Dalmatia. This is actually very accurate, as the north-eastern portion of this county is within the same geographical region as my home in southwestern Bosnia-Herzegovina, and I am indeed ethnically a Croat. 

This was quite an impressive performance of the 23andMe’s ancestry algorithm, though, my case is fairly simple — especially for the 21st century. Due to the advancement of cheap, accessible, and fast travel, as well as mass global migrations set off by the colonial era this is not the case for most people. 

It is, however, those complicated cases that have led to success of the companies like 23andMe. I have always known what my ancestry would be, I have grown in an area that is fairly ethnically homogenous and hasn’t experienced mass immigration. For me, genetics is a pretty much 1-to-1 mapping of my sociocultural ancestry. 

Where these tests become useful and interesting is in admixed populations. Admixed populations are those populations which have different proportions of global ancestries. For example, African-Americans who may have African and European ancestry, or Latine who may have African, European, and Indigenous American ancestry. 

Indeed, due to the effects of colonial-era genocides and ethnic cleansing, many African-Americans and Latine people have had difficulties tracing their ancestry to pre-colonial populations. Because of the colonial cultural imperialism and forced cultural assimilation, African-Americans cannot trace their genetic ancestry the same way that I can, using sociocultural practices or language they engage in everyday as a proxy. 

For them, DNA testing services like 23andMe or Ancestry.com were essential tools to understand their biological origins and an avenue to reconnect with long-lost cultural heritage. 

How do we determine ancestry?

The way DTC DNA testing companies determine your ancestry is in its core fairly simple, but mathematically it is somewhat complex.

In order to determine your ancestry, DTC DNA testing companies take your DNA which is of unknown ancestry and compare it to the people whose DNA ancestry is known. They do this using publicly available data like 1,000 Genomes Project or proprietary data consisting of individuals like me — who come from well-defined and known populations. This is something we call reference data.

The traditional way of doing this is by using so-called principal component analysis (PCA), which is a technique you’d learn in linear algebra. As it turns out, when applied to genetic data, PCA results in pretty solid ancestry breakdown for a group of samples. So, throw in your unknown sample with a bunch of known ones, run PCA, and based on the proximity to other samples we can determine ancestry of yours. 

The actual process is a bit more complicated and involves fancy processes like phasing and support vector machine (SVM) processing which you can learn more about in 23andMe’s rough breakdown of ancestry composition method. It is definitely a topic I will write about in the future.

How accurate are ancestry reports?

The short answer: it depends.

Statistical methods like PCA, SVM, and phasing all depend on quantity and quality of data, particularly the reference data. The more reference data we have, the better results we’ll get. However, due to a pervasive bias in genetic studies which results in over-representation of European and affluent individuals, the importance of diversifying sources of said reference data cannot be overstated.

Due to this bias, ancestry estimates for well-studied populations like northern Europeans and Ashkenazi Jewish, are much more precise than less-studied populations like Central Asian, Egyptian, Eastern European, or North Asian. Due to some bias-correcting efforts, precision has improved for historically understudied populations like various African populations and Indigenous Americans. 

Nonetheless, 23andMe reports >90% precision and >90% recall rates for majority of their tested populations. Thus, my general inclination is that their ancestry results are trustworthy, particularly for highest-ranking ancestry results. I will discuss the 23andMe ancestry algorithm in depth in a future post.

It is important to note that genetic ancestry and sociocultural ancestry are each-other’s proxies, but for the reasons listed above are not always concordant.

Health

Science that goes into health reports in DTC DNA testing companies is my bread and butter, and I could probably write a whole book about it. In its essence, health reports summarize risks of developing certain medical conditions based on your genetic profile. 

It is important to note that some medical conditions are entirely genetic in origin— think cystic fibrosis, and some medical conditions are entirely nongenetic in origin— think traumatic brain injury. However, most of medical conditions are somewhere in between entirely genetic and entirely nongenetic in origin. And nearly all medical conditions are affected by both genetic and nongenetic factors.

When it comes to medical conditions with genetic components, there are several categories that we should consider:

  1. Mendelian genetic disorders — these are disorders that occur due to specific mutations in our DNA and their inheritance patterns follow Mendel’s laws. Examples include cystic fibrosis and sickle cell anemia.
  2. Non-Mendelian genetic disorders — these are disorders that occur due to specific mutations in our DNA but their inheritance patterns do not follow Mendel’s laws. Examples include Angelman and Prader-Willi syndromes.
  3. Complex disorders — these are disorders that occur due to several or many mutations throughout the genome, which alone do not cause the disorder, but compounded together increase the risk of developing the disorder. Complex disorders are usually also influenced by nongenetic and environmental factors. Examples include heart disease, diabetes, various forms of cancers, etc.
  4. Nongenetic disorders — these are disorders caused by nongenetic or environmental factors like viral and bacterial infections, some cancers, traumatic injuries, etc. While not caused by genetic factors, our genetic makeup may play a role in the disorder severity, efficacy of treatments, or their outcomes.

Due to numerous concerns like privacy and accuracy, and subsequent involvement of FDA, health reports are not as common as ancestry in DTC DNA testing services. Nonetheless, companies like 23andMe have begun slowly clearing compliance issues with the FDA and reporting back individual condition DNA results.

Figure 3a: Screenshot of the alpha-1 antitrypsin deficiency section from my 23andMe health report summary. Captured on 11/6/2023.
Figure 3b: Screenshot of the anxiety section from my 23andMe health report summary. Captured on 11/6/2023.

Examples in Figures 3a-b include a typical genetic disorder (alpha-1 antitrypsin deficiency) and a typical complex complex disorder (anxiety). Note that for simple genetic disorders, 23andMe reports the number of variants detected whereas for complex disorders a likelihood (or risk) of being diagnosed is reported. 

The choice of words here matters. 23andMe understands that in terms of simple genetic disorders, detected variants may (but don’t always) result in diagnosed condition. This is known as penetrance. Likewise, for complex disorders, instead of reporting results for thousands of variants, an outcome of risk model is reported. 

How do we tell if a genetic variant is associated with disease? Simply put, decades of publicly-funded human genetics research has resulted in identification of millions of variants and associated risk of disease development. Some of these variants were catalogued in depth through functional studies in cell lines and animal models, and are determined to be causative, but most of them were catalogued through genome-wide association studies (GWAS) and are determined to be correlated or associated, but not necessarily causative. 

How do we determine health risks?

Well, GWASes are usually performed on thousands to millions of people, to determine associations of particular genetic variants with risk of disease development. Based on these studies, a model can be derived which can score each individual’s genetic profile and estimate a risk of developing a disease. These are called polygenic scores (PGS).

PGS briefly and statistically, are linear models which sum up all variants weighted by their associated risk across our genome. Based on these scores, we can determine which people belong to high-risk groups and which belong to low-risk groups. PGS, sometimes called polygenic risk scores (PRS), are still very much suboptimal and they should be used with care and nondeterministically.

In other words, a 23andMe report indicating an increased likelihood of Alzheimer’s disease should not be a cause for alarm or a reason to see a neurologist (assuming absent symptoms). However, if you do have a concern regarding a particular health report in your 23andMe, talking to your physician or a genetic counsellor might be beneficial.

How accurate are health reports?

The short answer: it depends. 

I know, I know. In my opinion, reports on simple genetic disorders are very accurate. If you are indicated to be a carrier of one of the cystic fibrosis or sickle cell anemia variants, you are very likely a carrier. And if you are indicated to have two of such variants, you are very likely already diagnosed or will be diagnosed with that disorder.

On the other hand, the reports of complex disorders are not very accurate. Especially if you have non-European ancestry. Remember my comment above on the biases in existing genetic studies that favor affluent people and Europeans? The effect of those biases are most painfully present in health association tests.

Simply put, majority of available GWASes focus exclusively on European people, and as such are terrible models for people with non-European ancestry. This, the section of your health reports that deals with complex disorders should not be taken extremely seriously.

Look, I’m not saying you shouldn’t modify your lifestyle after DNA health report, but do so within reason. Increasing exercise is beneficial to your health regardless of your heart disease risk DNA report, limiting Sun exposure and using sunscreen is beneficial to your health regardless of your melanoma risk DNA report. 

However, never drinking coffee again because you got an increased likelihood of irritable bowel syndrome on your DNA report, despite not having any related symptoms is unnecessary. Taking DTC DNA testing reports on complex disorders at a face value is ill-advised. Nonetheless, if you do have a concern regarding a particular health report in your 23andMe, talking to your physician or a genetic counsellor might be beneficial.

Ethical Concerns and Risks

Participating in DTC DNA testing is not risk-free. In this section I will briefly discuss ethics of DTC DNA testing and risks associated with it, using the real-world examples (because, sadly, they do exist).

Privacy

Privacy is an important consideration when choosing to purchase a DTC DNA test. Several things you should consider: how is the company protecting your data, will the company sell your data, and how much control do you have over sharing your data.

A recent example of 23andMe data leak is a clear evidence that your data might not be 100% safe. Hackers may exploit a security weakness to breach servers’ and websites’ security measures to access your data, or they may exploit the fact that people re-use their login information to access your data. 

My assessment: ensure you are well protected by using unique username/password combination and activating 2-factor authentication on your accounts; make sure the company you are submitting your data to is trustworthy; examine their privacy policy to ensure your data will be kept safe and private.

Accuracy

While majority of the DTC DNA testing companies do indeed follow a fairly rigorous scientific methodology in most of their reports, they are not shying away from embellishments and reports that do not have real-world values. 

While ancestry and some of the health DNA reports are indeed accurate, informative and useful. Some of the health DNA reports hold nothing more than entertainment value and these traits would be better predicted by a well-cataloged family history.

My assessment: carefully evaluate results you are provided — even assuming all the tests perform magically with 100% accuracy, mistakes in processing do happen and results are may be incorrect; do consult you primary care provider or a genetic counsellor if you are troubled by a result.

Exploitation

While you submit your data to DTC DNA testing companies, you might not be the only customer they have. Some companies might use your data in their internal research, share your data with other academic institutions for non-profit research, or even sell your data to corporations and organizations for for-profit research.

It is important to keep in mind that these activities will be outlined in their privacy policy documents (if not, they should be reported to FTC and FDA), and you may be able to opt-out of these activities. However, these are not always necessarily bad — I, for example, enthusiastically allow 23andMe to use my data for scientific research.

My assessment: make sure the company you are submitting your data to is trustworthy; examine their privacy policy to ensure your data will be kept safe and private.

Law Enforcement

This one is interesting, as law enforcement has used ancestry database in the past to apprehend a serial rapist and killer. The concern here, in addition to violation of privacy, is potentially over-zealous policing of already over-policed communities, such as communities of color. 

My assessment: make sure the company you are submitting your data to is trustworthy; examine their privacy policy to ensure your data will be kept safe and private.

Quiet Changes

Sadly, companies also don’t always behave ethically and sometimes update their privacy policies without notifying their users of doing so. Such was the case of FamilyTreeDNA which failed to disclose subtle terms of service changes once they started working with the FBI.

My assessment: it is prudent to periodically check the privacy policy and terms of service for changes, though it would be very difficult to note any subtle changes unless you’re specifically looking for them. This one is rather tough, because there are no good mitigating or prophylactic measures one can take to prevent quiet changes in terms of service of privacy policy. 

Consent

In addition to informed consent, it is important to note that your DNA is shared with your close relatives but also with seemingly unrelated people. So when submitting your DNA to a DTC DNA testing company, you are also submitting partial information about your relatives (or complete information about your monozygotic twin if you have one).

My assessment: there is no way to really mitigate this risk, but it is a question you should consider before purchasing a DNA test kit.

Discrimination

Some people might have concerns about discrimination due to genetic information. It is important to note that in the United States of America, discrimination based on genetic information is illegal under Genetic Information Nondiscrimination Act (GINA).

Specifically, the law forbids discrimination on the basis of genetic information when it comes to any aspect of employment. This includes hiring, firing, pay, job assignments, promotions, layoffs, training, fringe benefits, etc.

It also prohibits group and individual health insurers from using a person’s genetic information or requiring genetic testing to determine eligibility or premiums.

However, GINA does not protect from genetic-based discrimination in areas such as life, long-term care and disability insurance.

My assessment: when it comes to employment or health insurance, you are protected by GINA, so any genetic information they may have access to is non-actionable; when it comes to non-covered entities, you may be a subject to discrimination and should weigh that possibility before purchasing a DNA test kit.

Cloning

Some individuals are concerned that a potential leak of their DNA data may result in eventual production of their clones. While it might sound like science fiction to some, it is completely understandable that some people are having this concern — you are, after all, sharing your DNA.

However, there are couple of reasons why this should not concern you. The most obvious being that cloning of humans is illegal. Second most obvious being that cloning of humans would be extremely expensive and if it did happen, choice of a cloned person would be very intentional. The amount of DNA required to do this would be more than what’s provided for a DTC DNA test.

But, most importantly, cloning a person from data that can be obtained in DTC DNA test kits is, in fact, impossible. Your DNA is composed of 6,017,847,976 to 6,109,647,513 base pairs, lower one representing typical male and higher one representing typical female base pair counts because Y chromosome is smaller than X chromosome. 

DNA microarray chip tests, used by most companies, including 23andMe and Ancestry.com, usually test only about 700,000 base pairs. That’s 0.01% of your entire genome. Even in cases where whole genome sequencing is used, less than 95% of your entire genome would actually be covered. This means that cloning from genetic data, regardless how they’re obtained, would be next to impossible.

My assessment: I would not be very concerned about this, given the legal, financial, and technical limitations.

But am I being scammed?!

Short answer: no.

When it’s all said and done, at this particular point in time and our current understanding of human genetics, the data provided by DTC DNA testing companies are mostly accurate. I would say that ancestry report and health report on simple genetic traits are more accurate than complex trait health reports. 

Does this mean they’re error-proof? No. Errors can certainly happen, even something as silly as a technician swapping samples could result in completely wrong results report. But these errors are not unique to DTC DNA testing.

It helps to point out that ancestry and health related traits are some of the most extensively researched traits in genetics, substantially more than lifestyle or even nutrition related traits.

DTC DNA testing companies still have a long way to go as we raise ethical concerns and discover issues that need to be addressed. I think, this is one of the biggest reasons why public discourse in bioethics is of such a great value — diversity of experiences allow for better identification of current and potential ethical considerations in biomedicine, and dealing with such ethical issues before people suffer should be of utmost importance.

Share on Social Media:
Franjo Ivankovic, PhD
Franjo Ivankovic, PhD

When I'm not focused on studying genetic underpinnings and phenotypic variability of psychiatric disorders, I love to read and write science fiction and fantasy, or explore one of the hundreds state and national parks in the United States. Some of those musings in academic, fictional, and recreational world make it to this blog.

Articles: 8

Leave a Reply