Genome Analytics: The Battle Between Science and Privacy
The application of deep-learning data analytics to genomics (the study and mapping of gene sequences) can achieve amazing feats that were impossible just a few years ago. Genome analytics can allow doctors to spot life-threatening diseases long before they become a threat, farmers to engineer better crops, and nutritionists to fashion precise diets tailored to a person’s specific dietary needs. But there’s a rub — privacy rights.
Healthcare, pharmaceutical, and agricultural industries now hire data analytics specialists to study massive amounts of genome data to derive insights that will revolutionize almost every aspect of research and development – all while avoiding potential privacy violations in the process.
Prospective analytics candidates, especially those with an MS in Business Data Analytics, are poised to launch an influential and potentially lucrative career in the genome analytics field.
The Myriad Benefits Of Genome Analytics
Before one can truly understand the weight that deep-learning artificial intelligence analysis carries in the field of genomics, a better grasp of the entire process is necessary.
Genomics analytics takes place in three stages, according to commercial strategists Mahni Ghorashi and Gaurav Garg in Tech Crunch’s “The Genomics Intelligence Revolution:”
- Primary – Sequencers perform raw chemistry and initial conversion of a physical sample into raw data sequence.
- Secondary – The base pairs from the sequencer are out of order, and must be put together in the correct order through a compute-intensive process.
- Tertiary – Meaning is extracted from genetic data and becomes an analytics-fueled applied science capable of incredible discovery, such as matching specific genetic mutations to known diseases.
“We can extract meaning from an individual’s genetic data by comparing it to other reference genomes,” Ghorashi and Garg explain, “And the more reference genomes we have to work with the better our software and our processes can become. This is why plans to build giant databases of genetic data are fundamental to the future of this work.”
Analysis of these giant databases of genetic data promises to make the field of nutritional genomics, or nurtigenomics, a reality in the near future. “Based on one’s ancestry, clinicians may one day tailor each person’s diet to her or his genome to improve health and prevent diseases,” writes business expert Joe Schwartz in Science Daily’s “Healthy Diet? That Depends On Your Genes.”
Schwartz’s article also illustrates this point by detailing several experiments on the differing nutritional requirements of people from different ancestral backgrounds. For example, the FADS1 gene (prevalent in those who hail from European ancestors) developed after long years of vegetarian farming and plays a role in the biosynthesis of polyunsaturated fatty acids. Analyses of ancient hunter-gatherer DNA, on the other hand, revealed an opposite version of the same gene.
As nurtigenomics evolves as a field of study, it will give nutritionists, clinicians, and even food manufacturers and nutraceutical companies valuable insight into individuals’ unique nutritional health. These benefits may also extend to the agricultural industry.
“Genome-editing techniques will drive new improvements in agriculture through a broad range of solutions that could help farmers deliver better harvests,” writes Monsanto strategist Dr. Sherri Brown in “Plant Science And Agri-Genomics: The Importance Of Collaboration.” “In plant breeding, genome-editing technology could enable plant breeders to deliver better hybrids and varieties more efficiently, allowing them to combine specific plant characteristics in initial crosses between plants, as opposed to breeding for such combinations over multiple years.”
The Privacy Hurdle
The benefits of genome analytics are undeniable, but in order to build large genome datasets for the deep learning algorithms to analyze and compare, real human genetic sequencing is necessary on a massive scale.
Some websites, like 23andMe and AncestryDNA, offer gene-sequencing services starting at around $100 (down from tens of millions of dollars just a few short years ago) for anyone wishing to learn more about their genetic background. Results are sent to each client detailing their ethnic background and ancestry, and for a little more money they will provide health information.
Genetic sequencing companies, however, retain all of their clients’ genetic data even after they provide their service. They are then free to sell this data to third-party medical and pharmaceutical companies. This is where privacy rules become an issue.
“Another enormous challenge in genome research is generating and sharing data that result in impactful discoveries without compromising the confidentiality and rights of patients,” says Shannon Behrman, PhD and Jessica Mazerik, PhD, in their “Exploring The Cancer Genome” paper on the National Cancer Institute’s website, “This issue is further complicated because each country has its own laws for patient protection, informed consent, and institutional review board (IRB) approval processes.
[The International Cancer Genome Consortium] has taken into account these legal and regulatory differences and developed suggested guidelines for informed consent, data access, and ethical oversight that minimize the risk of individual patient identification without impeding important research opportunities.”
The privacy rule established by the Health Insurance Portability and Accountability Act (HIPAA) requires any healthcare data that can be used to identify a patient or client to be deleted or encrypted through a process called anonymization or de-identification. Typically, this information consists of names, addresses, social security numbers, phone numbers, account numbers, or anything else that can be used to discover a person’s protected identity.
The problem with de-identifying genetic data is that genetic data itself is the very essence of a person’s identity. Programs have been developed that can take genetic information and use it to build a model of what that person’s face would look like. The results are incredibly accurate when compared to that person’s actual face.
“Regardless of the methods, there is always a possibility of re-identification,” writes consultant Barbara L. Filkins, et al., in “Privacy And Security In The Era Of Digital Health: What Should Translational Researchers Know And Do About It?” on the National Institutes of Health government website, “Identifiable markers can be used to determine the presence of an individual in a dataset, even without explicit personal information or when the genomic data has been aggregated.”
What this means is that genetic data can be re-identified through comparing the anonymized data with public datasets and other accessible sources of information. Genomics analytics organizations are constantly trying to foresee these threats and prevent them so that the business of genomics can continue to bring productive insight to researchers and product developers.
Maryville University’s Master Degree In Business Data Analytics
The demand for business analytics experts lies at the heart of Maryville University’s online Master’s of Science in Business Data Analytics degree. Graduates of this online program can be fully prepared to enter the workforce as a statistician, data scientist, data analyst, or a genomics analytics specialist.
At Maryville University, students learn how to handle datasets, orchestrate multiple infrastructures, monetize data, and make decisions based on valuable analytics insights. Graduates will be exposed to the training they need to combine business operational data with the latest analytical tools, making them invaluable to employers.
Sources
The Genomics Intelligence Revolution
Healthy Diet? That Depends On Your Genes
Plant Science And Agri-Genomics: The Importance Of Collaboration