MSBDA Resources


HIPPA, Big Data, And De-Identification

The healthcare industry has been slow to adopt business analytics as a research tool. The reason for this standoffish behavior is the Health Insurance Portability and Accountability Act (HIPAA). Any identifying details of patient-related data are strictly protected by HIPAA from being made available to the public.

To overcome this hurdle and benefit from the valuable insights made possible through data and business analytics, industry leaders are now leaning toward a process called de-identification. If successfully implemented, de-identification would make healthcare data usable for analytics purposes while stripping it of identifying characteristics.

If the healthcare industry is going to succeed at securing its data, IT personnel with a masters in business analytics or a masters in data analytics will be needed. The privacy of healthcare data is complicated and will continually evolve as new vulnerabilities are discovered and repaired.

Healthcare And The Cloud

Organizations that handle large amounts of data are familiar with the availability of “as-a-service” cloud technology vendors. These cloud-based services include everything from simple web-based email clients to entire infrastructures for large organizations.

Software-as-a-service (SaaS) is common among healthcare businesses and is typically used for email communication and Electronic Health Records (EHRs). Platform-as-a-service (PaaS) offers the healthcare institution a custom data management application rather than requiring the constant use of a web browser to access cloud features.

Finally, infrastructure-as-a-service (IaaS) offers to take care of the entire business infrastructure, including storage of data and networking while leaving control of the operating system and other custom details to the client’s IT team. IaaS will mostly be found in hospitals and large medical groups.

Problems arise where cloud services coordinate with outside third-party vendors. “It is not uncommon for cloud providers to offer tools in collaboration with other vendors,” claims HIT Infrastructure in its “Understanding HIPAA-Compliant Cloud Options For Health IT” blog post, “But the primary vendor’s HIPAA-compliance does not necessarily extend to the other vendor.”

Covering Every Legal Angle

Some organizations that are required to be HIPAA-compliant stand to benefit greatly from Big Data, which could ultimately result in safer, more effective medical treatments, faster response times in epidemic scenarios, and better Medicare/Medicaid coverage. But the possibilities of data breaches and HIPAA violations outweigh the desire to move forward with data analytics.

“The many potential benefits from data analytics for the health care system and to the health of individuals must be balanced with protecting the privacy of individuals whose health information is used in those analytics,” states law firm Arnall Golden Gregory LLP in “Big Data Analytics Under HIPAA,” a 2016 legal alert published on the firm’s blog. “Data anonymization tools such as de-identification are useful, but cannot eliminate risks to re-identification.”

All the same, the healthcare and Big Data industries are moving forward with efforts to develop de-identification into a legitimate privacy protection.

The Need For De-Identification

One of the biggest problems facing healthcare data analytics is that data must be shared between multiple sources, compared and contrasted, and analyzed for trends, data clusters, and other useful insights. Privacy vulnerabilities are particularly pronounced in data sharing between healthcare and non-healthcare sources.

To combat these vulnerabilities, data scientists must de-identify information in such a way that it can still be of value to data analytics without revealing patient identity. For most medical data, de-identification processes are pretty secure, but the introduction of genomics has introduced a new level of complexity to the issue.

“Unlike a blood type or a cholesterol test result, an individual’s DNA sequence codes for unique combinations of physical traits may, collectively, create a fully or partially identifying profile,” write legal researchers Jennifer Kulynych and Henry T. Greely in their 2016 research paper, “Clinical Genomics, Big Data, And Electronic Medical Records: Reconciling Patient Rights With Research When Privacy And Science Collide” in the Journal of Law and the Biosciences. “The more scientists learn about genetic profiling, the more this profiling re-identification risk will escalate.”

Kulynych and Greely refer to de-identification as a moving target. Obviously, the nature of de-identification would have to be re-evaluated and tweaked to meet the newest threats on a regular basis. And on top of this already multi-faceted problem, quasi-identifiers must also be considered.

“While individual fields [in a data set] may not be identifying by themselves, the contents of several fields in combination may be sufficient to result in identification,” says biotech expert Sujay Jadhav in his article, “Is HIPAA A Barrier To Big Data In Biomedical Research?” on “An example of quasi-identifier could be a collection of fields taken together, such as gender, age, ethnic group, marital status, geography.” Each field narrows the list of possibilities up to a point where identification might be possible.

More Obstacles On The Horizon

The increasing popularity of Internet of Things (IoT) devices, especially those that track vital signs and real-time medical data, is going to increase the difficulty of maintaining HIPAA compliance in healthcare data analytics.

“The data collected from [IoT] devices now includes information classed as protected health information (PHI). While the data collected by HIPPA-covered entities must be protected from unauthorized access under the HIPAA Privacy and Security Rules, those rules only apply to healthcare providers, health plans, healthcare clearinghouses and business associates of covered entities,” reports HIPAA Journal in its “New Report Published On Privacy Risks Of Personal Health Wearable Devices” blog post. “Non-covered entities are not required to implement the safeguards demanded by HIPAA rules to keep ‘PHI’ secure.”

Business data analytics professionals will have their hands full adapting every potential HIPAA-related vulnerability to Big Data in the years to come. And as new technologies emerge or existing ones are improved, continual adjustments and security patches will have to be completed in a timely manner to avoid security breaches.

Maryville University’s Master Degree In Business Data Analytics

The demand for business analytics experts lies at the heart of Maryville University’s online Master’s of Science in Business Data Analytics degree. Graduates of this online degree program can gain the skills to enter the workforce as statisticians, data scientists, data analysts, or actuaries.

At Maryville University, students can learn how to handle data sets, orchestrate multiple infrastructures, monetize data and make decisions based on valuable analytics insights. Graduates will be exposed to the training and knowledge they will need to combine business operational data with the latest analytical tools, making them invaluable to employers.


Understanding HIPAA-Compliant Cloud Options For Health IT –
Big Data Analytics Under HIPAA –
Clinical Genomics, Big Data, And Electronic Medical Records: Reconciling Patient Rights With Research When Privacy And Science Collide –
Is HIPAA A Barrier To Big Data In Biomedical Research? –
New Report Published On Privacy Risks Of Personal Health Wearable Devices –