Atlantic reporter Alex Reisner lately uncovered four datasets of music getting used to coach AI models and made them fully searchable for the general public. Two of the units are completely huge at 12 million and 9 million tracks. The opposite two are a lot smaller, however nonetheless symbolize a big quantity of coaching information at over 100,000 songs every.
In accordance with Reisner, the units have been downloaded hundreds of instances and, whereas it’s not possible to know precisely who has used them, Google and Stability have each confirmed they’ve in analysis papers. A few of the sources, just like the Free Music Archive dataset, are free to stream for private use however require licensing for industrial purposes.
Whereas the datasets are freely accessible on the web in principle, utilizing them as coaching information is just not so simple as downloading a ZIP file and feeding it to an AI mannequin. As Reisner explains:
Three of the datasets I discovered are distributed as a listing of hyperlinks to songs on YouTube or Spotify. AI builders obtain the precise audio utilizing instruments that automate the job, a few of which permit builders to bypass logins, commercials, and mechanisms that may earn cash or subscribers for creators. Such instruments violate the phrases of service of those platforms.
