Data Science Ethics: Issues and StrategiesData Science Ethics: Issues and StrategiesData Science Ethics: Issues and Strategies

Organizations worldwide are recognizing the tremendous potential of data science. When professional services firm Deloitte surveyed more than 120 CEOs in 2022, it found that 91% planned to invest in artificial intelligence (AI) during the next year. Among those surveyed, 63% viewed AI as a tool to speed intelligent insight, and 53% believed AI could offer clarity to strengthen decision-making.

As the field of data science continues to expand its reach, ethical considerations are becoming more critical. Issues arising in areas such as data privacy and bias in data analysis have demonstrated a need for data science ethics. In approaching their work, data scientists and the organizations that employ them need to adhere to ethical principles.

Aspiring data scientists who may be considering enrolling in an online Master of Science in Data Science program can benefit from learning about critical ethical considerations in the field, as well as how to promote ethics in the work they conduct.

Defining Data Science Ethics

Data scientists need to have a firm understanding of the concept of data science ethics and the importance of upholding ethical principles.

What Is Data Science Ethics?

Definitions of data science ethics can vary, but they tend to focus on avoiding harm and building trust.

Data ethics involves evaluating practices that generate, collect, analyze, and disseminate data that could potentially affect people and society adversely. Ethics involves concepts of whether conduct related to data is right or wrong, transparent, and defensible. Further, data ethics attempt to maintain the trust of parties such as consumers, users, clients, patients, employees, and partners.

Principles of Data Ethics

Working with data ethically involves adhering to certain principles. For example, the Federal Data Strategy has developed the Data Ethics Framework, which requires individuals in federal government agencies who work with data to:

Act with integrity, humility, and honesty
Hold themselves and others accountable
Promote transparency
Remain informed about developments in data science and data management
Respect confidentiality and privacy
Respect individuals, communities, and the public
Uphold applicable ethical standards, professional practices, statutes, and regulations

In another example, the World Economic Forum has outlined ethical principles for AI systems in areas such as:

All stakeholders of AI systems are responsible for their use.
Data privacy. Individuals have the right to manage their data if an AI system uses it.
Compliance and lawfulness. All stakeholders of AI systems need to comply with laws and regulations.
AI systems shouldn’t compromise humans’ mental integrity or physical safety.
AI systems should respect the individuals behind the data the systems use, and they shouldn’t discriminate or show favoritism.
The decision-making and predictive capabilities of AI systems should be explainable.

Benefits of Data Ethics

While the benefits of ethics in data science might seem obvious, certain business advantages of adhering to ethical principles also exist. For example, businesses that follow ethical principles can:

Build trust and goodwill with their customers
Reduce the risk of unintended bias and show that their decision-making is fair
Ensure that they comply with legal requirements

Businesses that follow ethical principles can also:

Attract high-quality employees
Boost customer loyalty and profitability

Laws Related to the Protection of Data, Algorithms, and AI

Data scientists should be aware of legislation that makes adherence to ethical principles of data science even more important.

A 2023 Reuters report deemed 2023 as a pivotal year in data privacy law. While data privacy laws before 2023 were focused on the prevention of harm, laws effective in 2023 focus on the rights of individuals whose data an organization uses.
A 2022 report by Bloomberg Law noted a significant increase in proposed laws and regulations to prevent discrimination in algorithms.
Between 2017 and 2022, 60 countries enacted laws and regulations related to AI. In addition, the European Union’s General Data Protection Regulation and ePrivacy Directive are influencing ethical data science in many countries.

Efforts by Data Scientists to Refine Ethics

Individuals working in data science have put forth their own approaches to infusing ethics into data science.

The United States Data Science Institute (USDSI), which offers certifications in data science, has established a code of ethics and standards.
The Digital Analytics Association (DAA), an organization for analytics professionals, has proposed a web analyst code of ethics.
The Analytics Certification Board (ACB) has published a code of ethics for individuals who earn certification as a Certified Analytics Professional or as an Associate Certified Analytics Professional.

Specific Considerations for Ethics and Data Science

In practice, integrating ethics and data science involves several important considerations. Examples of those considerations are highlighted below.

Protecting Data Privacy

Protecting data privacy is critical because it helps to ensure the dignity and safety of the people who provided that data. The National Institute of Standards and Technology (NIST) has developed a framework for ensuring data privacy that incorporates actions such as:

Identifying all data in an organization’s possession and assessing risks to data privacy
Becoming aware of legal obligations regarding data privacy
Establishing governance over data privacy
Implementing controls to preserve data privacy
Developing approaches for communicating with internal and external stakeholders regarding data privacy

Ensuring Data Justice and Fairness

The Global Partnership on Artificial Intelligence (GPAI) explains that the concept of data justice moves beyond individual privacy rights into the realm of social justice. Specifically, data justice encompasses ideas of:

Fairness, equity, diversity, and parity
Adequate representation
Sharing in the benefits of data
Nondiscrimination and the ability to challenge bias

The concept of data justice need not be applied in a limited sense. Data justice can apply in a broad sense to how data affects and interacts with the ways in which humans flourish.

Promoting Transparency

Transparency is a cornerstone of building customer trust. To promote transparency in using data, organizations need to be clear about the data they collect, how they secure the data, and how they’ll use the data. Transparency also extends to disclosing information about data-sharing activities, the sale of data, and the algorithms on which organizations rely.

Controlling Confidential Data

While it may be natural to confuse privacy with confidentiality, data scientists need to know the distinction. The concept of privacy can be viewed as encompassing the rights of individuals to control how their data can be used. In contrast, confidentiality spans both the safeguarding of data and the proper classification of data in an organization’s field.

Confidentiality applies to both consumer data and an organization’s own data. For example, in addition to collecting data from customers, an organization may have data on internal processes or strategic planning that it considers confidential. Properly classifying all data is the first step in controlling confidentiality.

Ensuring the Accuracy of Data

The importance of data accuracy and quality in data science is clear: Reductions in accuracy and quality impair an organization’s ability to make good decisions based on data science. Therefore, ethical data science practices encompass assessing data accuracy and quality.

Data scientists can monitor the quality of data by examining data attributes such as:

Completeness
Timeliness
Validity
Integrity
Uniqueness
Consistency

Remaining Accountable

Data scientists are responsible for their work and how it affects the world. Ethical data scientists demonstrate their accountability in several ways. For example, they:

Maintain records of their design processes
Maintain records of their decision-making
Learn about and comply with organizational requirements, regulations, and laws related to their work
Understand the limits of their responsibility

Examples of Data Ethics Issues

Examining examples of data ethics issues helps to reinforce the importance of ethics in the field. Knowing what can go wrong when ethical lapses occur can be informative, as the examples below illustrate.

Bias in Data Science

Bias can emerge when errors in data are complex or even overlooked. Types of bias include the following:

Confirmation bias, in which data scientists can erroneously lean toward data that aligns with their personal opinions or views
Availability bias, in which data scientists make conclusions based only on the most readily available or recent data
Survivorship bias, in which data scientists rely on distorted data that includes only success stories and no examples of failures

Misuse of Data

Misuse of data is characterized by individuals or organizations using data in ways that differ from the intended use of that data. Of course, the theft of data and its subsequent use could be viewed as an extreme misuse of data, but misuse encompasses other situations.

An organization collects data from users for a specific purpose, then later uses that data for a different purpose. An example of this could be collecting data for research, then selling it to another company for marketing purposes.
An individual has authorized access to data but misuses the data for personal gain.
An individual who has authorized access to data copies that data to an unsecured laptop for ease of use, then cyber criminals access the data on the laptop.

The Effect of Predictive Analytics on Privacy

Ethical data science is also critical in predictive analytics, which has particular potential to impair privacy. For example, predictive analytics has been used to:

Identify pregnant customers based on their purchasing behavior, then use that information for targeted advertising
Predict illnesses such as depression or diabetes based on individuals’ social media posts

A person’s privacy can be violated if sensitive information about that individual is predicted without their knowledge or consent.

Overrepresentation or Underrepresentation of Populations in Data Sets

Data scientists must also be attentive to the risks of overrepresentation or underrepresentation in the data they use. The potential for underrepresentation of ethnic groups in health data leads to concerns that:

Analysis conducted on that data might not equitably benefit individuals who were underrepresented in the data
Algorithms could incorrectly interpret racial inequality in data as biological fact rather than as the result of racism

Strategies for Ensuring the Application of Ethics in Data Science

As outlined below, organizations and data scientists can pursue numerous strategies to practice ethics in data science.

Communicate Organizational Values

Organizations should communicate their values regarding data both internally and externally. This can include actions such as:

Posting information regarding company data science ethics throughout the workplace
Holding discussions regarding data ethics with various departments and tailoring those discussions to the ways in which departments use data
Posting information regarding company data science ethics publicly on its website

Include Data Champions in the C-Suite

Having an executive in the C-suite who champions ethical data science is beneficial in reinforcing the importance of ethics. In addition, the organization’s top executives and board members must be kept in the decision-making loop on matters of ethics. Educating board members about the significance of ethics in data science allows them to monitor whether an organization adheres to its values.

Establish Data Use Rules

Forming rules for data use is fundamental to achieving ethical data science and enforcing accountability. Data use rules specify the parties in an organization who are authorized to access various types of data, the purposes for which they can use that data, and where the data will be stored. Organizations need to place limits on the length of time data can be stored and develop specific processes for deleting data.

Ensure That Teams Include Members with Knowledge of Data Ethics

Embedding individuals with data ethics expertise into operational teams can help enforce an organization’s data science ethics. These individuals could be employees of the organization or external experts in the field. McKinsey & Co. cites an example in which a company asked an expert from academia to participate on an operational team to provide insight on the environmental effects of certain types of data use.

Examine Data and Algorithms for Bias and Employ Bias Safeguards

Organizations that actively review data and algorithms for bias improve their ability to conduct ethical data science. Implementing a safeguard to regularly review data used in AI models can ensure its timeliness and relevance.

The Data & Trust Alliance, an organization whose members include a diverse field of large corporations, has detailed safeguards to defend against algorithmic bias in vendors’ workforce decision-making. These safeguards include questions to ask when examining a vendor’s practices and assessing the risk of bias.

Use Ethics Checklists in Data Science Projects

Data scientists can use ethics checklists when conducting data science projects. For example, DrivenData, which hosts data science challenges on behalf of other organizations, has devised an ethics checklist that data scientists can use to help ensure that they’ve followed ethical best practices. The checklist is a list of questions that helps determine whether a data scientist has considered various aspects of ethics.

Make a Commitment to Follow Ethical Practices and Protocols in Machine Learning

The Institute for Ethical AI & Machine Learning has developed a set of responsible principles for machine learning that professionals in the field can elect to follow. These principles require professionals to:

Assess the effect of incorrect predictions from machine learning
Monitor machine learning for bias
Improve transparency in machine learning
Create machine learning that’s reproducible
Mitigate the displacement of workers due to machine learning

Follow a Data Code of Ethics

Organizations such as the USDSI, DAA, and ACB have crafted formal codes of ethics for data scientists to follow. Individual businesses and organizations can also develop codes of ethics that are tailored to their activities.

Organizational codes of ethics can:

Encourage discussions of ethics
Empower employees to address ethical dilemmas
Clarify an organization’s value
Serve as a reference document on where to locate other resources related to ethics

How Educational Programs in Data Science Can Promote Ethics

Educational programs in data science can provide a good foundation for ethical practice in the field. For example, Master of Science (MS) in Data Science degree programs offer the following:

Courses in predictive modeling that can offer students the opportunity to choose models to address real-world business problems while adhering to ethical best practices
Courses in machine learning and da ta mining that can provide students with expertise in avoiding bias and conducting ethical data science
The chance to complete capstone projects that enable students to apply data science to real-world scenarios while ensuring that their projects meet ethical best practices

Data science educational programs also offer students opportunities to enhance their skills in collaboration and communication. This gives students the chance to discuss ethical issues in data science with their peers and deepen their understanding of the importance of ethics.

Working on data science projects as part of an educational program represents an opportunity for students to strengthen their skills in ensuring data privacy, data justice, data accuracy, transparency, and accountability.

Charting an Ethical Path in Data Science

When based on a foundation of solid ethics, data science has much to offer the world. Professionals in the field who have knowledge of and consistently apply data science ethics can help organizations realize their objectives in a responsible manner.

Individuals who have an interest in data science can explore Maryville University’s online MS in Data Science degree program to learn how it can help them pursue their professional ambitions. Offering in-demand skills that prepare students for jobs in data science, Maryville’s program could put you on the path to a rewarding career. Take the first brave step on your career path today.

Be Brave

Bring us your ambition and we’ll guide you along a personalized path to a quality education that’s designed to change your life.