Who's Harry Potter?," explores a groundbreaking technique that allows large language models to selectively forget information. Microsoft researchers Mark Russinovich and Ronen Eldan demonstrated that AI models can be modified to erase knowledge of the Harry Potter books—characters, plots, and all—while maintaining their overall decision-making and analytical abilities. The choice of Harry Potter was deliberate, as the series is universally familiar, making it easy for researchers to evaluate the effectiveness of the technique.
The AI industry faces challenges regarding copyrighted material and problematic content in the vast datasets used to train large language models, which power AI chatbots. This has led to legal issues and public scrutiny for some AI companies. The research by Russinovich and Eldan suggests a potential solution by showing that AI models can unlearn specific content without compromising their functionality.
Another study conducted by researchers from the University of Washington, University of California at Berkeley, and the Allen Institute for AI introduces a language model named Silo. This model aims to reduce legal risks associated with data by selectively removing information. However, the researchers found that Silo's performance suffered when trained solely on low-risk text.
To investigate further, they turned to the Harry Potter books to analyze how individual pieces of text impact an AI system's performance. The researchers created two datastores—one including all published books except the first Harry Potter book, and another including all books in the series except the second, and so on. Removing the Harry Potter books from the datastore resulted in a decline in the model's accuracy, as
. Read more on livemint.com