Lawsuit Accuses Anna's Archive of Hacking WorldCat, Stealing 2.2 TB Data

ancuuiqter@lemmy.world · edit-2 2 years ago

Lawsuit Accuses Anna's Archive of Hacking WorldCat, Stealing 2.2 TB Data

MotoAsh@lemmy.world · edit-2 2 days ago

deleted by creator

Snot Flickerman@lemmy.blahaj.zone · 2 years ago

https://annas-blog.org/worldcat-scrape.html

WorldCat

That is when we set our sights on the largest book database in the world: WorldCat. This is a proprietary database by the non-profit OCLC, which aggregates metadata records from libraries all over the world, in exchange for giving those libraries access to the full dataset, and having them show up in end-users’ search results.

Even though OCLC is a non-profit, their business model requires protecting their database. Well, we’re sorry to say, friends at OCLC, we’re giving it all away. :-)

Over the past year, we’ve meticulously scraped all WorldCat records. At first, we hit a lucky break. WorldCat was just rolling out their complete website redesign (in Aug 2022). This included a substantial overhaul of their backend systems, introducing many security flaws. We immediately seized the opportunity, and were able scrape hundreds of millions (!) of records in mere days.

After that, security flaws were slowly fixed one by one, until the final one we found was patched about a month ago. By that time we had pretty much all records, and were only going for slightly higher quality records. So we felt it is time to release!

MotoAsh@lemmy.world · edit-2 2 days ago

deleted by creator

Dkarma@lemmy.world · 2 years ago

“AI models are technically theft: they weren’t licensed to commercially profit off of 99.99%”

This is simply a lie. There is no license like what you describe. You never need a license to view or learn from something given away completely free on the internet. You guys keep pretending there’s a law that says otherwise . There is not or you’d post it.

Copyright does not cover viewing or experiencing a piece.

MotoAsh@lemmy.world · edit-2 2 days ago

deleted by creator

BearOfaTime@lemm.ee · 2 years ago

Honest question: if you connect to say an FTP server, and there’s no dialog claiming a EULA, would you be bound by one?

I don’t know how they got the data, but the whole EULA thing would rely on there being proof Anna agreed to one, right? That seems a bit tricky. As for “unauthorized access”, if a path is available, and Anna used it, again with no warnings, where’s the legal line?

Having been in civil court a few times, judges will ask people “do you have a document proving there was an agreement?”, over any circumstance that could be misconstrued, or is a verbal claim.

No doc, verbal claim is dismissed unless other party admits to the verbal claim in court, to the judge.

Just seems to me EULAs are terribly hard to enforce.

Again, I’m more thinking out loud. I have no idea how these cases tend to proceed.

FigMcLargeHuge@sh.itjust.works · 2 years ago

That is going to depend on what type of access the ftp server allows. If it’s anonymous then I would argue that no, you cannot be bound by a EULA if no dialog is presented. But the article mentions “In addition to harvesting data from WorldCat.org, the defendants are also accused of obtaining and using credentials of a member library to access WorldCat Discovery Services.” Now it’s just my speculation, but if they used someone else’s id to scrape the data, then WorldCat can just produce any documents that id agreed to, and it will apply here. Sounds like they done goofed.

Snot Flickerman@lemmy.blahaj.zone · 2 years ago

You are generally required to put up unauthorized access warnings.

Similar to how you have to post “no trespassing” signs if you don’t want to be trespassed.

WarmApplePieShrek@lemmy.dbzer0.com · 2 years ago

That’s not true. Trespass works like that because big corporations don’t get trespassed much, but they lobbied for copyright to be automatic.

MotoAsh@lemmy.world · edit-2 2 days ago

deleted by creator

Lawsuit Accuses Anna's Archive of Hacking WorldCat, Stealing 2.2 TB Data

Lawsuit Accuses Anna's Archive of Hacking WorldCat, Stealing 2.2 TB Data

Lawsuit Accuses Anna's Archive of Hacking WorldCat, Stealing 2.2 TB Data * TorrentFreak