Data Mining and the Training of Artificial Intelligence Models in Brazil

20 Jun 2025 | Newsletter

Marcos Chucralla Moherdaui BlasiLobo de Rizzo, Brazil
Danilo Martins BragaLobo de Rizzo, Brazil
Eduardo Medeiros SampaioLobo de Rizzo, Brazil
  1. INTRODUCTION

The speedy development of artificial intelligence (AI) and its influence on content creation has prompted extensive legal debates and regulatory efforts worldwide. A central point of discussion is the use of data mining for AI model training and its potential implications for copyright protection – a topic that intertwines technical and legal considerations. Data mining involves gathering and processing large datasets, which may include copyrighted works. This practice may raise questions regarding the potential unauthorized use of copyrighted protected materials.

While some legal frameworks offer specific exceptions for research and innovation, others impose stricter regulations, creating legal uncertainty for businesses, developers, and content creators alike. Understanding how various countries approach this issue is crucial for striking a balance between fostering innovation and ensuring legal clarity and protection.

  1. LEGAL CONTROVERSY

The primary legal concern surrounding data mining for AI training lies in the automated collection and processing of vast datasets without verifying the copyright status or usage rights of the content involved. As a method used to extract patterns and insights from large volumes of data, this practice may process third-party copyright protected materials – such as text, images, and videos – without prior authorization. Such use raises questions and can potentially expose developers and organizations to liability, depending on the jurisdiction.

This adds a new layer to the ongoing discussion about the balance between the exclusive rights granted by intellectual property law and the evolving needs of technological innovation. In the United States, data mining for AI could be analyzed under the fair use doctrine, particularly when applied to research and innovation. As noted by Samberg, Vollmer, and Teremi (2024)[1], fair use permits limited use of copyrighted material without prior authorization under certain conditions, such as for educational purposes. However, because fair use is inherently flexible and determined on a case-by-case basis, its application can widely vary depending on factors like the purpose of the use, the nature of the work, the portion of the work used, and the potential impact on the original work’s market value.

In the UK, the fair dealing doctrine is stricter, allowing unauthorized use only for criticism, news reporting, or private study. As Owen[2] (2015) explains, commercial use of copyrighted works without permission is typically not allowed, which, in our view, could restrict the training of AI models.

High-profile cases like Getty Images v. Stability AI (click here) and The New York Times v. OpenAI & Microsoft (click here) highlight the complex legal and regulatory challenges surrounding AI training and copyright. Similarly, controversies such as the use of Studio Ghibli’s distinctive artistic style by generative AI models (click here) have drawn further attention to the issue. These disputes reflect the complex debate about the boundaries of IP rights regarding AI and the proper line to be drawn between inspiration, fair use, and infringement.

  1. NATIONAL REGULATION

In Brazil, Articles 46 to 48 of the Brazilian Copyright Law (Federal Law 9.610/98) outline exceptions that allow the use of copyrighted works under certain circumstances, such as the reproduction of small excerpts for specific purposes and the so-called freedom of panorama, which allows the use of works permanently located in public spaces. However, these provisions do not specifically address AI training through data mining, which has led to discussions about the need for legal updates.

The Brazilian Congress has been debating AI regulation since 2020, with Bill 2338/2023 – dubbed the “AI Legal Framework” – currently under review (click here). Approved by the Brazilian Senate in December 2024, it aims to promote innovation while addressing copyright concerns in the AI context. However, in our view, the bill’s provisions on copyright seem to be vague and insufficient for addressing the matter properly.

The Brazilian Copyright Institute (Instituto Brasileiro de Direitos Autorais – IBDAutoral), in a study published in November 2024 (click here), has already highlighted significant gaps in the bill, which were then submitted to the Brazilian Senate in December (click here). Key concerns include the lack of provisions to: (i) ensure broad access to legitimate sources for scientific research, and (ii) mandate transparency from AI developers regarding the datasets used.

In this context, the Brazilian National Group (ABPI) is playing a critical role, either promoting forums to discuss the matter, or actively participating in the legislative debates to strive a proper balance. It has already been admitted as a participant in the public hearing that will discuss the interface between AI and copyrights and which will be scheduled by AI Special Commission of the House of Representatives.

  1. CONCLUSION

The training of AI systems through data mining presents significant legal and regulatory challenges, particularly concerning the use of copyrighted content. Although Bill 2338/2023 is under discussion and the current Brazilian Copyright Law provides general exceptions—such as the use of small excerpts and the panorama right—that could, in theory, serve as a legal basis to justify certain uses of protected works in AI training, there is still no specific provision or clear legal interpretation addressing this practice.

A modern regulatory framework must strive a balance between fostering technological innovation and safeguarding authors’ rights. Legal gaps should not hinder AI development – but transparency, protection, and fair compensation for creators must be guaranteed. A detailed, updated regulation is essential to create an environment where both innovation and creators’ rights coexist harmoniously.

[1] SAMBERG, Rachael; VOLLMER, Tim; TEREMI, Samantha. Fair use rights to conduct text and data mining and use artificial intelligence tools are essential for UC research and teaching. Office of Scholarly Communication – University of California, California, 12 Mar. 2024. Available at: https://osc.universityofcalifornia.edu/2024/03/fair-use-tdm-ai-restrictive-agreements/. Accessed on: 07 Apr. 2025.

[2] OWEN, Lynette. Fair dealing: a concept in UK copyright law. Learned Publishing, United Kingdom, v. 28, n. 3, p. 229–231, 01 July 2015. Available at: https://onlinelibrary.wiley.com/doi/epdf/10.1087/20150309. Accessed on: 07 Apr. 2025.