Article | 23 May 2023
AI vs GDPR: the privacy implications of pharmaceutical companies using AI tools
As most people will be aware, AI (artificial intelligence) tools are developing rapidly, advancing much more quickly than many would have anticipated. But what are the privacy implications? Setterwalls associates Jonatan Blomqvist and Karolina Jivebäck Pap examine the implications for businesses processing large amounts of sensitive data, such as pharmaceutical companies.
Background
Spring 2023 has seen the emergence of advanced AI tools and the more rapid development of this technology than many would have anticipated. The most talked-about AI tool is called ChatGPT, which has been developed by OpenAI, whose website describes it as follows:
“We’ve trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer followup questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests.”[1]
They have developed an AI chatbot that can answer a wide range of questions and follow-up questions. But how is the tool capable of this? ChatGPT uses ‘generative AI’, which means it can generate an answer to almost any question. It uses sample answers written by humans for pre-training and requires humans to reward model function ( reinforcement learning from human feedback (RLHF)). This training was done as part of the tool’s development and it essentially involved ‘scraping’ (collecting data from) the internet for all kinds of information, including personal information.
Moreover, it should be borne in mind that anything used as input, i.e. anything you provide the AI with when asking a question, could also potentially be used as output. That is, the AI tool could potentially use the data it has been provided with by its users for development and learning, making it smarter and enabling it, for example, to answer similar questions in the future.
These new AI tools are capable of helping all kinds of businesses to boost their efficiency by completing certain tasks and answering certain questions much faster than an employee could. So it’s not surprising that they are the subject of much discussion and already being widely used around the world. However, despite their popularity, they have certain implications, including privacy implications, and there is always a risk of them providing biased or even entirely incorrect information. One such example is the AI tool used by United Health to evaluate patients’ care needs. Its AI tool was neutral with regard to patients’ skin colour and was not intentionally discriminatory or biased, but the problem was that the algorithm was based on the cost of care for patients rather than a particular medical condition. For socioeconomic reasons and due to the high costs of the insurance-based US healthcare system, white patients sought care at a higher rate and thus had a higher cost of care. Since less money is spent on black patients with the same level of need, the AI incorrectly concluded that black patients were healthier than equally sick white patients[2].
Moreover, there is also a risk that trade secrets will be disclosed when using publicly available AI, in the same way that personal data might. In short, just as companies should not share personal data, and especially not sensitive personal data, with an AI chatbot, they should also not share trade secrets with it.
Some of the privacy implications
The GDPR is based on several legal principles, one of which is data minimisation. This principle states that only the categories of personal data necessary for the purpose for which you are collecting it should be collected. So the collection of some extra categories of personal data ‘just in case’ is not permitted. Of course, AI tools such as ChatGPT that scrape the internet for information about anything and anyone do not conform with this data minimisation principle, and this basic contradiction will likely continue to be an issue in the development of AI.
Another significant issue is the lack of privacy information for users of some AI tools, and ChatGPT is one such example. OpenAI’s privacy policy is not entirely clear with regard to the basis on which it collects personal data for training its AI. The policy states only that it may base processing on user-generated content (i.e. any kind of information that users input into the tool) as part of their legitimate interest in developing, improving or promoting their services.
Consequently, another related issue is the lack of transparency for users regarding what data is actually used to train the AI. It is also unclear how long OpenAI will store personal information, especially personal data that might have been collected when the AI tool was scraping the internet during its training. This goes against the GDPR’s storage limitation principle.
Sensitive data and AI
Special categories of personal data, such as health data, are not allowed to be processed unless an exemption under the GDPR applies (see Article 9 of the GDPR). This means pharmaceutical companies and other companies in the life sciences industry, such as medtech companies, that process large amounts of health data should be extra careful about how they process personal data and who they share it with.
When processing personal data, it is vital that all employees understand what they are allowed and not allowed to do, especially with regard to trying out interesting new technical solutions such as AI tools. Although AI tools offer efficiency gains, there is no transparency about what the data might be used for. So if a pharmaceutical company employee provides an AI tool with personal data in the form of health data, there is a risk that the data will be used to train the AI. There is no way of ensuring that the health data will not be used as output when other users ask related questions.
It is therefore essential that pharmaceutical companies ensure that their IT and privacy policies are up to date to include possible restrictions regarding the use of AI tools such as ChatGPT. It is also important to provide regular employee training. Companies could perhaps even regard the development of AI as an opportunity to update their general GDPR training.
Finally, it should be mentioned that the EU AI Act is currently under development, and its core principles include transparency and accountability. The act is currently undergoing last-minute changes by the European Parliament due to the rapid development of generative AI such as ChatGPT. For example, an addition has been made requiring generative AI to disclose any copyrighted material used to develop its systems. It will be interesting to see the final version of the Act and how it will apply to generative AI. Setterwalls is following this mater closely.
[1] https://openai.com/blog/chatgpt.
[2] New York insurance regulator to probe Optum algorithm for racial bias | Fierce Healthcare