Is anyone aware of an existing project that can do something like this:
- Access an RSS feed.
- Parse the contents of the items in the feed, and fetch linked images.
- Take the new feed elements and add them to previously fetched elements.
- Store all of the content in a merged RSS/XML file, or something like a SQLite DB.
Context: I’d like to archive Mastodon posts of an account automatically. I’d prefer it to be a script/binary I could run on Linux as I’d likely throw it in a GitHub action and save the resulting output in the git repo.
I could probably whip something together but I’m lazy and I’d prefer to use something that already exists.
No but I have an indirect answer (a method?) for you. There are many open source projects that do this type of work. For example, newsblur. Maybe you can find a few of these projects in the language you want to use and see how they’re handling it. I would expect the to be done common libraries used between them.