NOTE: This is a follow up to my Getting an archive of blog posts post.
The state so far:
- TalkDigger (blog) is also struggling with needing an archive as I have written (Getting an archive of blog posts)
- We are struggling with the same problem with reBlogger. They also may consider building a crawler.
- Feedshow also are looking into building a crawler.
- I know of anothr website that also is looking for an archive
So there are four companies wanting this… so far. I wonder if all these 2.0 voting sites which depend on user-submissions would be interested in getting up to date data from a reasonably priced service? I think so. Technorati is… technorati, because it has so much archive and it's crawl is so complete. It's a huge advantage over everyone else.
Of course, no matter whoever provides the data, the cost of buying data will be the choke point, but we'll see what transpires. Maybe we'll build something and make it available to everyone else? Could be fun!
What a shame that Alexa doesn't see this need and fill it for us. (hint hint!) But doing a quick search in their web service system only returns 68K of posts – it's not much by comparison to what is out there. Their focus is different, they want to power search engines.
Update (5 April 2006):
- Feedshow have said they can provide a crawler – we're talking. Price will be the issue.
- Steve at Bitshop (our host) is very keen to help with hardware to do our own crawling (so he can have access to an archive)
Stay tuned. 😀