Artikel | 28 Nov 2024
Scraping the Surface of Web Scraping – What Fintech Companies Should Consider When Harvesting the Web
In the highly competitive Fintech industry, staying ahead requires vigilant monitoring of competitors’ activities, customer behaviour, and up-to-date government regulations. Consequently, Fintech companies must have swift and accurate access to data, whether for optimizing investment strategies, understanding market trends, or enhancing customer experiences. Data collection and analysis are thus pivotal in this industry. This article delves into some of the regulatory requirements surrounding web scraping—a term often associated with the automated extraction of data from websites, despite its lack of a precise legal definition. While web scraping can provide valuable insights and benefits to Fintech companies, it also presents legal challenges that must be navigated carefully.
Use of scrapped data
Web scraping, also known as web data extraction or web harvesting, refers to the automated process of collecting data from websites. Scraping software tools varies in sophistication, from tools that capture the entire content of web pages to those designed to extract specific data elements. This process often involves a large-scale, indiscriminate collection of data, where the scraped data can provide profound insights and be particularly beneficial to fintech companies in areas such as:
- Identification of customer whims and market rhythms;
- Financial information and other economic indicators;
- KYC and AML data;
- Training of artificial intelligence (“AI”) tools and algorithms; and
- Market insight and social listening.
The utilization of scraping tools on websites that are frequently visited by individuals or that contain valuable intellectual property rights (“IPR”) can present several legal challenges. These challenges may include unlawful processing of personal data, unauthorized copying of information protected by IPR, and potential violation of agreements governing use of the website. Below we will investigate each of these legal hurdles more closely.
Processing of personal data
Personal data does not loose its protection under the GDPR simply by being published online, which means that the requirements under the GDPR must be complied with when web scraping. The European Data Protection Supervisor and several national data protection authorities have recently issued guidelines addressing the data protection risks associated with web scraping, particularly in the context of generative AI. Key concerns include the unlawfulness of the processing as well as the lack of compliance with the principle of data minimisation and transparency towards the data subjects.
In order for processing of personal data to be lawful, there must be a legal basis according to article 6 of the GDPR and the available legal basis for web scraping is in general legitimate interest. However, the use of legitimate interest requires a balancing of the rights and interests at issue to be carried out. It is the data controller who needs to demonstrate that this assessment has been performed. In some cases, a data protection impact assessment also needs to be completed. In this respect, it can be noted that the Dutch Data Protection authority has concluded that only targeted scraping – i.e. very limited scraping in terms of sources and purposes – is compatible with the GDPR.[1]
Furthermore, the principle of data minimisation in article 5 of the GDPR requires the data controller to not process more personal data than necessary. This can be met by for example defining precise collection criteria, ensuring that certain data categories are not collected or that certain sources are excluded from data collection, and by adopting measures to delete or anonymise personal data.[2] In this regard, it should be noted that the Swedish Authority for Privacy Protection (IMY) has categorized web scraping as a high-risk method from a data protection standpoint due to the vast volumes of data processed when using such method.[3]
Moreover, the transparency requirements towards data subjects in articles 13-15 of the GDPR mean that, as a general rule, the data controller must inform the data subjects about the processing of their personal data when the personal data is collected and when the data subject requests such information. However, a data controller that scrapes large quantities of personal data may find it challenging to appropriately inform data subjects accordingly. It may therefore in certain instances be justified to provide a privacy notice only via public means in accordance with article 14.5 of the GDPR.[4]
IPR protection
Photos, texts and other materials on websites may be protected by copyright. Under Swedish copyright law, which is harmonised to certain extent through EU acts, no formal requirements are necessary for a work to enjoy such protection. It is instead sufficient for the work to exhibit some level of originality and be the result of the creator’s own efforts. This grants the creator the exclusive right to reproduce, modify, and distribute the protected work to the public. Thus, if web scraping software makes unauthorized copies of protected works, such scraping likely violates the copyright to such works. For instance, copies are created when data is collected for processing, such as aggregation and compilation, meaning that a reproduction takes place. The same data is made available to the general public when it is included in a new product available to such public or posted on a website accessible to others.
In some instances, websites may also fall under the database right protection, having the effect that web scraping may also infringe upon the rights of database producers.[5]
User agreements
Under Swedish contract law, a website user can – under certain circumstances – be bound by the terms of use of a website by the mere engaging with the website. This means that such terms may apply to the conducting of scraping operations on websites. In this regard, it has already been confirmed by a CIJEU ruling that the holder of a publicly accessible database is free to impose contractual conditions on the use of its database, including provisions against scraping.[6] In this regard, it is widely common that platforms such as Facebook, LinkedIn and Bloomberg prohibit scraping and other automatic information collection on their websites via their terms of use.
In essence, if a website’s terms of use prohibit data extraction, using a scraping tool in violation of such terms risks breaching the contract, which could lead to damages, injunctions preventing the use of the data, or other consequences. For example, Facebook has deleted accounts, apps, and pages from foreign companies that provided analytical services in violation of Facebook’s terms of use. Accordingly, it is recommended to investigate whether the website from which it is desired to extract information offers compliant ways of doing so, such as through specific integrations and APIs.
Summary and conclusion
Web scraping may offer benefits for Fintech companies, including the ability to optimize strategies, understand market trends, and enhance customer experiences. However, it comes with complex legal challenges. To leverage its advantages while avoiding legal pitfalls, it is crucial for companies to ensure compliance with relevant regulations. This includes conducting thorough legal audits, staying updated with changes in laws and regulations, and implementing robust data protection measures.
[1] The Dutch Data Protection Authority, Guide to scraping by private individuals and private organisations (in Dutch), May 2024.
[2] EDPB, Report of the work undertaken by the ChatGPT Taskforce, 23 May 2024.
[3] IMY report 2021:1.
[4] EDPB, Report of the work undertaken by the ChatGPT Taskforce, 23 May 2024.
[5] CJEU, Innoweb BV v. Wegener (C-202/12).
[6] CJEU, Ryanair Ltd v PR Aviation BV (C-30/14).