Ai and Consent
Home Resources Articles Artificial intelligence (AI), personal data and consent

Artificial intelligence (AI), personal data and consent

Training AI systems is just the latest use for users’ personal data that companies collect online. But information on how data is used, what consent is needed, or how it will be regulated is not always clear. There have already been strong concerns raised about data privacy and consent.
by Usercentrics
Aug 18, 2023
Ai and Consent
Table of contents
Show more Show less
Book a demo
Learn how our consent management solution can improve privacy and user experience for your users.
Get your free data privacy audit now!

Artificial intelligence (AI) seems to be everywhere, and has been getting almost as much investment funding as media attention. Is it the latest tech buzzword or is it changing—and will continue to change—nearly everything about how we create and work? Who owns the input data and the results?

 

AI development has been said to rest on the pillars of algorithms, hardware, and data. Data is the pillar that is the least “solved”, and user consent is an important part of the question.

 

The rapid advancement of AI training and uses of the technology has raised concerns about user consent and ethical implications with uses of personal data. If users’ data is used to train AI, do they have rights to the outputs? Should organizations needing AI training data have to obtain consent for data already published online? For how many granular purposes should AI tools or services providers have to obtain explicit consent from users?

What is artificial intelligence (AI)?

AI refers to the development of machines that can perform tasks that typically require human intelligence. This includes areas such as text or speech recognition, problem solving, and decision-making. Developing AI often requires input of large amounts of data to help the systems “learn”.

What is Machine Learning (ML)?

 

Machine learning is a subset of AI that focuses on developing algorithms and models that enable computers to learn from data and make predictions or decisions without being explicitly programmed. It’s a way for computers to “learn” from examples and improve their performance over time.

What are Large Language Models (LLM)?

 

Large Language Models are a recent breakthrough in AI research, designed to understand and generate human-like language. ChatGPT from OpenAI and Bard from Google are examples of publicly accessible LLMs. Some tools developed using them can be used for SEO, marketing content, and other business purposes.

 

The purpose of training an LLM is to enable it to understand the structure, meaning, and context of human language, which, for one use, enables more accurate responses when queried by people.

 

LLMs are trained on vast amounts of text from books, articles, websites, and other sources. To date there have been data privacy issues with content being scraped and analyzed without the creators’ or owners’ consent. There is the possibility that data accessed could be sensitive, in addition to having been used without consent.

What is AI training?

AI training, also known as machine learning training, refers to the process of teaching an AI system to learn patterns and make predictions or decisions based on data provided to it. Training is crucial to developing AI systems that can perform specific tasks, recognize patterns, provide accurate information, or make informed judgments.

 

The training process goes through a number of steps. In short, they begin with procuring relevant data and making it ready for use, to selecting what the model will be meant to do with AI training data sets, to data input and analysis. Then there’s working to make output or predictions match actual outcomes or improve in accuracy, and ensuring the AI model works well on any data set, including real world data, and not just on AI training data. AI models have to pass all of the steps before they’re ready for broader use.

Ambiguities in uses of AI training data sets

 

Organizations could raise questions about what defines “use” of personal data. How much change renders it no longer personal data? For example, to get the data into a format the training model could use, it may have to be transformed from the format in which when collected. Additionally, should an organization need to get consent to use data to train AI models even if it’s for research only and not commercial purposes? Maybe no one but the researchers will ever have access to it.

What data is AI trained on?

 

AI can be trained on many kinds of data. What the trainers need may depend on what the system is meant to be able to do, e.g. answer questions, make decisions, generate graphics or text, etc.

Some common types of training data for AI include:

  • text – e.g. from books, articles, websites, or social media; used for translation, sentiment analysis, chatbot development, etc.
  • images – from large numbers of labeled images; used for image recognition, object detection, and image generation
  • audio – e.g. from spoken words, sounds, or acoustic patterns; used for speech recognition, voice assistants, and audio analysis models
  • video data – from video sequences; used in video analysis, surveillance, video generation, and to learn temporal patterns
  • gaming data – from gameplay data and interactions; used to develop game play and strategy
  • structured data – e.g. from databases or spreadsheets; used for predictive analytics, recommendation systems, or fraud detection
  • sensor data – from cameras, lidar, radar, etc.; used for autonomous vehicle systems, industrial automation, etc.
  • healthcare data – from medical imaging like x-rays and MRIs, patient records, and clinic data; used for assistance in diagnoses, treatment, and research
  • financial data – from existing financial data from markets and transaction records; used for stock price prediction, credit scoring, and fraud detection
  • genomic data – from DNA sequences, genetic markers, and other related biological data; used for personalized medicine and improving understanding of genetics
  • simulation data – from data generated by simulations; used for learning how systems behave under different conditions

 

Many of these kinds of AI training data are explicitly referenced in data privacy laws. Many are types of personal data, and some are PII data, also called personally identifiable information. Some of these types of data are also categorized under privacy laws as sensitive, meaning they could do greater harm if accessed or used without authorization.

 

Healthcare, genomic, and financial information are particularly significant examples of sensitive personal data. Sensitive data usually requires user consent to collect or use under data privacy law, while data that is personal, but not sensitive, sometimes only requires consent before being sold or used for targeted advertising, profiling, etc.

 

It’s also important to note that not all batches of training data are equal. Quality, quantity, diversity, and permission for use can vary widely. That can have a significant impact on the “learning” and performance of the systems. It could also mean consent is required to use some types of data in the training batch, but not for others. Poorly balanced or non-diverse data can also produce skewed results, sometimes with offensive or legally precarious output, like systems that produce discriminatory recommendations or inaccurate identification.

 

Under many privacy laws, data subjects have the right to have their data corrected by the entity that has collected it, if it’s incomplete or inaccurate. What about if their data is correct, but used to produce inaccurate results? What are their rights then? Uses of these technologies pose many complex questions for regulators that include the ethics of automation.

AI and data protection

Research firm Gartner has predicted that by the end of 2023, 65% of the world’s population will have its personal data protected by data privacy regulations. By 2024 they predict the number will be 75%. The only things changing faster than privacy regulation coverage are technology itself and demands for data. It helps drive everything from scientific breakthroughs to marketing campaigns.

 

But data isn’t like air. It doesn’t just freely exist for anyone to use. A lot of the data that exists, and that organizations want access to, is generated by people, and thus they have rights regarding protection of it and access to it. Consumers are increasingly savvy these days about data privacy and their rights where their personal data is concerned. Even if they may not understand how AI systems and other functions work in detail.

 

With the passing of more privacy legislation around the world, organizations need to be increasingly careful about meeting their data privacy responsibilities. Potentially huge fines, like some of those levied under the European Union’s General Data Protection Regulation (GDPR), also highlight the importance of taking privacy regulations and consumers’ rights seriously.

Does it matter where AI training data sets come from?

 

There are ever more potential sources of user data, especially online, like from social platforms and apps. It can also be tricky for companies to determine their data privacy responsibilities when the company is headquartered in one place, but potentially has users around the world. This can make an organization responsible to comply with multiple different privacy regulations. Many such laws are extraterritorial, in which case it only matters where users are located with regards to rights and protections, not the companies.

 

A lot of consumers don’t focus too much on just how much data they create on a daily basis, who might have access to it, and how it could be used. Children may not pay attention or fully understand user data generation or processing at all, even though most data privacy laws require extra protections and consent for access to their data. That consent must typically be obtained from a parent or legal guardian if the child is under a certain age threshold determined by the specific law.

 

A number of data privacy laws do not cover personal data that people make publicly available, which could include that generated on social platforms. Perhaps posts, comments, and photos are not a big privacy concern to some. But what about private messages or chats? Those could contain far more sensitive material.

 

Once data has been collected, ideally with user consent, people should know what happens to it. It’s a condition of most privacy laws that the controller—the entity responsible for collecting and using the data—notify users about what data will be collected and for what purposes. If those purposes change, under many privacy laws the controller must notify users and get new consent. With AI training, this could require a lot of granular detail, and could change often.

 

Because AI systems are often still experimental and the results unpredictable, it can make some data privacy requirements tricky. Organizations can notify users about what they want to use data for, but it’s possible what the data actually gets used for, or how it may be changed, or the results from using it may be different.

 

While users are supposed to be notified before any new purpose is put in place, those doing the work may not know of the change until it’s happened. If data is being analyzed in vast quantities in real time, traditional mechanisms for obtaining user consent, like cookie banners, may not be fast or granular enough, or otherwise sufficient.

 

User-facing AI systems can be potentially manipulative, resulting in users providing information they didn’t anticipate. Systems may also surface more sophisticated and nebulous connections between data points, enabling identification and profiling at a level we have not seen before. This could potentially turn just about any data into personally identifiable or sensitive data. Current consent requirements may not adequately address this.

 

While manipulative user interface and user experience functions commonly known as dark patterns are increasingly frowned upon and, in some cases, have been regulated against, those tend to focus on tactics that are already familiar. Responsive design could enable the development of new and more sophisticated ways of manipulating users.

The EU Artificial Intelligence (AI) Act

In December 2023, the European Commission, Council of the European Union and European Parliament reached a political agreement on the AI Act, a proposal that was initially released in April 2021. Finalized legal text and translations (the EU has 23 official languages) of the Act’s contents still need to be drafted, and the EU Parliament and Council will need to formally adopt the AI Act for it to become EU law.

 

European Commission President Ursula von der Leyen noted the Act’s historic and global potential, “Our AI Act will make a substantial contribution to the development of global rules and principles for human-centric AI.”

 

The primary goals of the AI Act are two-fold, to respect and protect the fundamental rights of EU citizens, while also boosting innovation. Parliamentarians agreed that how the Act is implemented will be of key importance in achieving these goals.

What is the EU AI Act?

 

The EU AI Act is a law on artificial intelligence (AI) proposed by the European Commission. It is the world’s first comprehensive law to regulate AI. The aim is to balance positive uses of the technology while mitigating negative ones and codifying rights. There is also a goal to clarify many current and future questions about AI development and make the Act a global standard, as the GDPR has become.

 

The law would assign applications of AI technology to one of several categories:

 

Unacceptable risk – AI with unacceptable risks would be banned entirely, e.g. the Chinese government’s social scoring tool

 

High risk – AI with potential risks, permitted subject to compliance with AI requirements and forecasted conformity assessment, e.g. a tool that ranks job applicants by scanning resumes

 

Medium risk – AI with specific transparency obligations, permitted but subject to information requirements, e.g. bots that can be used for impersonation

 

Minimal or no risk – AI with no notable risks, permitted without restrictions

Political agreement on AI Act rules

 

The parties, including the Council of the European Union and European Parliament, have agreed on several main rule categories:

  • safeguards regarding general purpose artificial intelligence
  • limitations on law enforcement’s use of biometric identification systems
  • social scoring using AI is banned
  • manipulation or exploitation of users’ vulnerabilities using AI is banned
  • consumers have the right to launch complaints and receive meaningful responses

Banned AI applications

The legislators have agreed on banning certain applications of AI by corporations, governments, law enforcement, etc., with some exceptions, based on recognized potential threats to the rights of citizens and democracy more generally.

  • biometric categorization systems that use sensitive characteristics, aka sensitive data (e.g. political, religious or philosophical beliefs, sexual orientation, race, etc.)
  • untargeted scraping of facial images from the internet or closed-circuit television (CCTV) footage to create facial recognition databases (remote biometric identification)
  • emotion recognition in the workplace and educational institutions
  • social scoring based on social behavior or personal characteristics
  • AI systems that manipulate human behavior to circumvent their free will
  • AI used to exploit the vulnerabilities of people (due to age, disability, social or economic situation, etc.)

General purpose AI (GPAI), risks and obligations

General purpose AI includes tools and applications that tend to be widely available to academia, business, and consumers, e.g. ChatGPT and similar tools. There are further safeguards for more powerful AI models that pose greater systemic risks, including:

  • additional risk management obligations
  • monitoring of serious incidents
  • evaluation of models/modeling
  • red teaming (adopting an adversarial approach to rigorously challenge plans, policies, systems, etc.)

Codes of practice around these new requirements will be jointly developed by industry, the scientific community, the public, and others.
It is understood that GPAI systems can do a wide variety of tasks and analysis, and such systems’ capabilities are rapidly expanding. As a result, certain “guardrails” have been agreed upon as control mechanisms:

  • transparency requirements for what the systems are designed to do, how, with what data, and for what purposes are clear
  • detailed summaries about content used to train AI systems will need to be disseminated
  • adherence to EU copyright law
  • comprehensive technical documentation

GPAI models with potential high impact and systemic risks will have additional and more stringent requirements:

  • conducting model/modeling evaluations
  • assessing and mitigating systemic risks
  • conducting adversarial testing
  • reporting to the European Commission on serious incidents
  • ensuring strong cybersecurity
  • reporting on energy efficiency
  • reliance on codes of practice for regulatory compliance (until harmonized EU standards are published)

Transparency requirements for AI systems and use

The agreed-upon rules for general purpose AI include requirements for transparency about data sources, purposes, etc. But transparency will be a requirement for many systems and uses of AI. Users will have to be informed if they are interacting with a chatbot, for example. Digitally generated or edited content (deepfakes) must be labeled. And if biometrics categorization or emotion recognition systems are in use, users who may be affected must be informed.

 

Exemptions for AI use by law enforcement

AI-powered tools and systems can be extremely useful to law enforcement, but risks to personal privacy and human rights are also recognized. So a series of safeguards and well-defined exemptions have been agreed upon with regards to the use of real-time biometric identification systems, such as facial recognition, by law enforcement in public spaces.

 

Such access will require prior judicial authorization, and will be limited to strictly defined lists of crimes. Use of biometric identification systems after the fact, e.g. reviewing footage and analysis, would be done only in the case of a targeted search for a person who has been convicted of a serious crime or is suspected of having committed one.

 

Use of real-time biometric identification systems would be limited by time and location, and for the following purposes:

  • targeted searches of victims (e.g. abduction, trafficking, sexual exploitation)
  • prevention of a specific and present terrorist threat
  • localization or identification of a person suspected of having committed one of the specific crimes mentioned in the regulation (e.g. terrorism, trafficking, sexual
  • exploitation, murder, kidnapping, rape, armed robbery, participation in a criminal organization, environmental crime)

High-risk AI systems, obligations and restrictions

A wide variety of industries, systems and tools can and will be identified as high risk under the Act, including medical devices, critical infrastructure, and administration of justice and democratic processes.

 

Where AI is used in these areas, risk mitigation will have to be implemented or bolstered, datasets used will have to be of confirmed high quality, documentation and logging will have to be detailed, there will have to be human oversight, information for users will need to be clear, and strong cybersecurity measures will need to be taken and maintained. Regulatory sandboxes will be used where authorities can facilitate testing of organizations’ systems.

 

The classification of “high risk” for some AI systems means that they post significant potential harm to health, safety, fundamental human rights, democracy, rule of law, and/or the environment. The high risk designation can be applied to AI use in various sectors, including law enforcement, banking, and insurance, and AI that can be used to influence voter behavior and election outcomes are also included.

 

As a result, specific obligations have been agreed upon for AI systems that have had the high risk classification applied, like mandatory fundamental rights impact assessments. Individuals will be able to launch complaints about AI systems and have the right to receive explanations about decisions based on high-risk AI system activities that may impact their rights.

 

Support for innovation and SMEs with AI solutions

The parties understand that AI tools and systems can be strong drivers of innovation in business, and do not want companies, especially SMEs, to be hamstrung by excessive regulation, or be pressured by industry giants with outsized industry influence.
To help mitigate these possibilities, the agreement under the Act promotes the use of regulatory “sandboxes” for development, as well as real-world testing for innovations. National authorities will establish these environments and initiatives to develop and train AI before it is launched to the market.

AI Act governance

 

An AI Office will be established at the EU level, within the European Commission. It will work to coordinate national governance among member countries and supervise enforcement of general purpose AI rules. National authorities within the EU will govern the Act more directly, using qualified market surveillance.

AI Act enforcement and fines

 

Under the Act there will be multiple levels of fines based on risk and severity of the violation. There are caps on potential fines for startups and SMEs.

  • 1.5 percent of global annual turnover or up to € 7.5 million, whichever is higher, for disclosing inaccurate information
  • 3 percent of global annual turnover or up to € 15 million, whichever is higher, for violations relating to high-risk systems
  • 7 percent of global annual turnover or up to € 35 million, whichever is higher, for violations of unacceptable risk

 

The AI Act is currently in a draft state, and may change prior to becoming law. At present user consent and data privacy and protection are addressed in its statutes on a number of fronts:

 

High Risk – explicit consent is required for use of high-risk AI systems, e.g. critical infrastructure, employment, healthcare, and law enforcement.

 

Transparency – AI providers must provide clear information about systems’ intended purpose, capabilities, and limitations to ensure users are informed to make decisions and understand potential impacts on their rights.

 

Right to Explanation – users have the right to obtain meaningful explanations of AI systems’ decisions.

 

Right to User Control – users should have the ability to opt out, disable, or uninstall AI systems, particularly when fundamental rights or interests are at stake (under some privacy laws users have the right to opt out of “automated decision-making”).

 

Data Protection and Privacy – the AI Act emphasizes the need for data minimization, purpose limitation, and safeguards to protect personal data when using AI systems, and aligns with existing data privacy regulations like the GDPR.

Companies that acquire data for AI training or other uses can and should ensure that consent was obtained from the sources or users. In some cases it may be a requirement for doing business with partners or vendors.

 

Consent is also becoming important to monetization strategy. For example, increasingly, premium advertisers are insisting on proof of consent for collection of user data before partnering with app developers.

 

Companies that collect user data from their own platforms and users for AI training or other uses have direct responsibility for obtaining valid consent and complying with data protection laws. There are a number of ways companies can achieve compliance and valid consent.

 

Transparency – Privacy laws require clear, accessible notifications, and companies should provide understandable information to users about how user data will be used and processed, including for AI training. As the uses for personal data change, companies need to update their privacy notices, inform users, and, under many privacy laws, get new consent for the new uses of personal data.

 

Granular consent – Users must be able to accept or decline the collection and processing of their personal data, but they should be able to do it at a detailed level, e.g. approving some kinds of processing, like targeted advertising or AI training, but not others, like sale of the data. This also helps ensure people are informed, which is a requirement for consent to be valid under most privacy laws.

 

User-friendly mechanisms – Just as notifications must be clear and accessible, the way users accept or decline consent must be easy to understand and access. Information to inform users about data processing must be available there as well as the ability to consent or decline at a granular level. It must also be as easy to decline consent as it is to accept, and under many privacy laws users must also be able to easily change their consent preferences.

 

Regulatory familiarity – Different jurisdictions have different privacy laws with different requirements and consent models. It’s important for companies to know which laws they need to comply with, and how to do so. It can be important to consult with or appoint qualified legal counsel or a privacy expert, e.g. a data protection officer (DPO), which is also required by some privacy laws. Such a role helps to establish guidelines and processes, update operations, and manage security for data and processing.

What rights do users of online platforms have over their data?

Consumers’ rights regarding their personal data depend on a number of factors, including where the user lives and what privacy laws are in place, what the platform is for and what data the user is providing to or generating on it, and what the platform’s terms of service are.

 

In the European Union, companies collecting and processing personal data must obtain user consent before doing so. This applies equally to social media platforms, a blog, a government website, or an ecommerce store. Users’ data may be collected to learn how people use a site and improve how it works. Or to enable fulfillment when they buy something online, or to show them ads, or to train AI models.

 

Platforms around the world that are used for financial activities or healthcare have stronger requirements for privacy and security under multiple regulations because of the kinds of information they handle.

 

In some jurisdictions, it is still allowed to display a cookie banner that says you consent to collection and use of your personal data by continuing to use the site or service. But in the EU and other jurisdictions, this is not acceptable and granular consent is required.

AI and cookies

Use of cookies online has been declining as there are newer and better technologies to accomplish what cookies are used for. The question today and going forward is less how AI uses cookies, or may do so, and more how AI could accelerate the replacement of cookies.

 

Apple and Mozilla have blocked third-party cookies, and Google plans to deprecate them entirely. New tools and methods also enable better data privacy and consent, and can result in higher quality user data.

 

Current cookie consent models may not be sufficient to cover AI use, since AI systems may analyze large amounts of data in real-time, rather than tools analyzing data from active cookies over time. For consent to be obtained before data collection or use begins, with current pop-ups the user would have to be bombarded with consent banners faster and more often than a human could process them.

 

AI models can enable more effective ads or personalized user experiences without relying on collection of personally identifiable information, as they can analyze large amounts of data very quickly to group people into audiences based on behaviors. If the system doesn’t need to collect user data, then consent may not be needed, at least for the data collection.

 

Laws and best practices would likely still require users to be notified of how their behaviors could be tracked and analyzed, and what that analysis could be used for, e.g. personalized ads or shopping experiences. But people’s personal data couldn’t be sold if it was never collected.

AI technology is here to stay. Its capabilities and potential use cases will continue to evolve rapidly. This is a challenge for regulation, as creating and updating laws tends to happen much more slowly than the speed at which technologies develop.

 

However, users should not be faced with “caveat emptor”, especially online, with regards to new uses for their personal data and challenges to their privacy. Regulators must craft and update laws that are clear and comprehensive, but flexible enough to be interpretable and enforceable today and in the future.

 

Organizations need to be clear on what privacy regulations they must comply with, what they stipulate, and what that means for their operations. This needs to be regularly reviewed as operations change, and clearly communicated. Trying to sneak in changes to terms of use or use collected data for new purposes without getting new user consent is an easy way to damage brand reputation. It is also illegal in many jurisdictions. As consumers get even more savvy about their data and privacy, companies will have to be more clear, not less, about data collection and use.

 

Companies should also implement best practices like privacy by design to ensure they are respecting people—the source of their data—and complying with the law. This will also help ensure that consent is obtained and data collection and use limited to legal allowances for all operations, whether fulfilling ecommerce orders or training new AI models.

 

AI is just the latest technology to bring new challenges to consumers, companies, and regulators alike. It’s not the first and won’t be the last. But best practices for achieving compliance, building trust with users, and successfully growing businesses (or doing science) continue to be the same and serve both organizations and consumers well.

 

To learn more, talk to our experts today.

Frequently Asked Questions

What is artificial intelligence (AI)?

Artificial intelligence is the development of machines that can perform tasks that typically require human intelligence. This includes areas such as text or speech recognition, problem solving, and decision-making. Developing AI often requires input of large amounts of data to help the systems “learn”.

What is machine learning (ML)?

Machine learning is a subset of AI that focuses on developing algorithms and models that enable computers to learn from data and make predictions or decisions without being explicitly programmed. It’s a way for computers to “learn” from examples and improve their performance over time.

What are Large Language Models (LLMs)?

Large Language Models are a recent breakthrough in AI research, designed to understand and generate human-like language. ChatGPT from OpenAI and Bard from Google are examples of publicly accessible LLMs. Some tools developed using them can be used for SEO, marketing content, and other business purposes.

How are AI systems trained?

AI training is the process of teaching an AI system to learn patterns and make predictions or decisions based on data provided to it. Training is crucial to developing AI systems that can perform specific tasks, recognize patterns, provide accurate information, or make informed judgments.

Here’s a breakdown of the AI training process:

  1. Data Collection: The first step involves collecting relevant and representative data. This data serves as the input for training the AI model. The quality and diversity of the data have a direct impact on the model’s performance.
  2. Data Preprocessing: Raw data often requires cleaning, transformation, and structuring to be suitable for training. This step involves removing noise, handling missing values, and standardizing the data.
  3. Feature Engineering: Feature engineering involves selecting and transforming the relevant attributes (features) in the data that the model will use to make predictions. Effective feature engineering can significantly influence the model’s performance.
  4. Model Selection: Depending on the problem, a suitable machine learning algorithm or model is chosen. Different models have different capabilities and are better suited for specific types of tasks, such as regression, classification, or clustering.
  5. Training: This is the heart of the process. During training, the model is presented with the input data along with the corresponding desired outputs. The model adjusts its internal parameters iteratively to minimize the difference between its predictions and the actual outcomes.
  6. Loss Function: A loss function is used to quantify how well the model’s predictions match the actual outcomes. The goal of training is to minimize this loss function, essentially teaching the model to make better predictions over time.
  7. Optimization: Optimization techniques, such as gradient descent, are employed to fine-tune the model’s parameters in a way that minimizes the loss function.
  8. Validation: To ensure that the trained model generalizes well to new, unseen data, a separate validation dataset is used to assess its performance. This step helps prevent overfitting, where the model performs well on the training data but poorly on new data.
  9. Hyperparameter Tuning: Many models have hyperparameters, which are settings that influence the learning process. These need to be adjusted to find the optimal balance between underfitting and overfitting.
  10. Testing and Deployment: Once the model performs well on both the training and validation data, it can be tested on a separate test dataset to assess its real-world performance. If the results are satisfactory, the model can be deployed for use.

The AI training process involves a combination of data, algorithms, and iterative optimization to create a model that can make accurate predictions or decisions. It’s important to note that training an AI model requires expertise, careful evaluation, and an understanding of the domain-specific problem to ensure effective and reliable results.

What personal data are AI systems trained on?

AI can be trained on many kinds of data, depending on what the system is meant to be able to do, e.g. answer questions, make decisions, generate graphics or text, etc.

Some common types of training data for AI include:

  • text – e.g. from books, articles, websites, or social media; used for translation, sentiment analysis, chatbot development, etc.
  • images – from large numbers of labeled images; used for image recognition, object detection, and image generation
  • audio – e.g. from spoken words, sounds, or acoustic patterns; used for speech recognition, voice assistants, and audio analysis models
  • video data – from video sequences; used in video analysis, surveillance, video generation, and to learn temporal patterns
  • gaming data – from gameplay data and interactions; used to develop game play and strategy
  • structured data – e.g. from databases or spreadsheets; used for predictive analytics, recommendation systems, or fraud detection
  • sensor data – from cameras, lidar, radar, etc.; used for autonomous vehicle systems, industrial automation, etc.
  • healthcare data – from medical imaging like x-rays and MRIs, patient records, and clinic data; used for assistance in diagnoses, treatment, and research
  • financial data – from existing financial data from markets and transaction records; used for stock price prediction, credit scoring, and fraud detection
  • genomic data – from DNA sequences, genetic markers, and other related biological data; used for personalized medicine and improving understanding of genetics
  • simulation data – from data generated by simulations; used for learning how systems behave under different conditions
What are the issues with using personal data to train AI?

The most fundamental concern with using personal data for AI training sets is whether or not consent has been obtained from the people that data belongs to. Personal data varies in type and sensitivity. Some can be used to identify an individual, and some can be harmful if misused.

Healthcare and financial information are particularly significant examples of sensitive personal data. Sensitive data usually requires user consent to collect or use under data privacy law, while data that is personal, but not sensitive, sometimes only requires consent before being sold or used for targeted advertising, profiling, etc.

Not all batches of training data are equal. Quality, quantity, diversity, and permission for use can vary widely. That can have a significant impact on the “learning” and performance of the systems. Poorly balanced or non-diverse data can also produce skewed results, sometimes with offensive or legally precarious output, like systems that produce discriminatory recommendations or inaccurate identification.

What user consent is required to use personal data for AI training?

There are a number of factors that determine if user consent is needed for use of personal data for AI training. Like the Zoom controversy, it can depend if AI training is included in a company’s terms of service. If so, it’s possible additional consent is not needed. However, in some jurisdictions this would not be enough, like in the EU under the GDPR. In that case explicit consent would need to be obtained for use of personal data in AI training sets, and users would have to be informed about that use before data was collected for it.

Companies need to be aware of where their customers and users are located, and be familiar with relevant privacy laws protecting those people, updating their data privacy operations accordingly. Companies may already obtain consent for personal data collection, but under many privacy laws they can’t just add AI training as a purpose for that data collection and use without first updating their privacy notice, and, under many privacy laws, getting consent for this new use. In many jurisdictions users must also be able to opt-out of use of their data at a granular level, which could include for AI training.

A number of data privacy laws do not cover personal data that people make publicly available, which could include that generated on social platforms. But it is not fully clear yet how that would affect personal data use for AI training. Posts, comments, photos, etc. would be more likely to be considered public than private messages, for example.

Can user consent be obtained for AI use?

AI systems are often still experimental and the results unpredictable. Organizations can notify users about what they want to use data for, which typically must happen in advance, but it’s possible what the data actually gets used for, how it may be changed, or the results from using it may be different.

If data is being analyzed in vast quantities in real time, traditional mechanisms for obtaining user consent, like cookie banners, may not be fast or granular enough, or otherwise sufficient.

Can AI systems cause data privacy issues?

User-facing AI systems can be potentially manipulative, resulting in users providing information they didn’t anticipate. Systems may also surface more sophisticated and nebulous connections between data points, enabling identification and profiling at a level we have not seen before. This could potentially turn just about any data into personally identifiable or sensitive data. Current consent requirements may not adequately address this.

While manipulative user interface and user experience functions commonly known as dark patterns are increasingly frowned upon and, in some cases, have been regulated against, those tend to focus on tactics that are already familiar. Responsive design could enable the development of new and more sophisticated ways of manipulating users.

Does AI training affect cookie consent?

AI usage may actually help speed up the end of the use of cookies, especially third-party cookies, as it can offer functions that provide better results and that do not necessarily request collection of personal data.

Current cookie consent models may not be sufficient to cover AI use, since AI systems may analyze large amounts of data in real-time, rather than tools analyzing data from active cookies over time. For consent to be obtained before data collection or use begins, with current pop-ups the user would have to be bombarded with consent banners faster and more often than a human could process them.

How should companies obtain consent for AI training?

Companies that collect user data from their own platforms and users for AI training or other uses have direct responsibility for obtaining valid consent and complying with data protection laws. Best practices for obtaining consent for AI training are the same data privacy compliance best practices.

  • Provide clear and accessible notification to users in advance about how data will be used and obtain new consent if purposes change
  • Ensure users can accept or decline consent at a granular level, i.e. for all uses or just for some. Ensure it’s as easy to decline as to accept, and that users can change their consent preferences or withdraw consent easily in the future.
  • Be familiar with relevant data privacy laws and companies’ responsibilities. Review data collection and processing regularly to ensure notifications and consent information are up to date.
Does the GDPR cover artificial intelligence and consent?

The General Data Protection Regulation does not explicitly mention artificial intelligence, but like a number of other data privacy laws, references “automated decision-making”, which can include AI systems.

AI would be treated like any other use of personal data, that is, users would need to be notified about that requested use before personal data was collected for it, and consent would need to be obtained for that use before any collection or processing could occur.

What is the EU AI Act?

The AI Act is a law on artificial intelligence (AI) proposed by the European Commission. The aims of the law are to:

  • balance positive uses of the technology with risks
  • mitigate current and future risks and negative uses of the technology
  • codify consumers’ rights
  • clarify current and future questions about AI development
  • make the Act a global standard (like the GDPR)

The law would assign applications of AI technology to one of several categories:

  • Unacceptable risk – full ban on use
  • High risk – use allowed subject to assessment and compliance
  • Medium risk – use allowed subject to meeting transparency obligations
  • Minimal or no risk – permitted without restrictions if not notable risks are identified

Usercentrics does not provide legal advice, and information is provided for educational purposes only. We always recommend engaging qualified legal counsel or privacy specialists regarding data privacy and protection issues and operations.

Related Articles

California Privacy Rights Act (CPRA) and the future of privacy law

California Privacy Rights Act (CPRA) enforcement is starting: what you need to know

The California Privacy Rights Act (CPRA) has been in effect since January 1, 2023. CPRA enforcement was delayed due...

DMA Marketer

Implementing consent for Google ads personalization: A comprehensive guide to the Google Ads compliance alert

Google Ads’ notification to "implement consent for ads personalization" isn't just a policy change.