RSS crawler and archive

NOTE: This is a follow up to my Getting an archive of blog posts post.

The state so far:

So there are four companies wanting this… so far. I wonder if all these 2.0 voting sites which depend on user-submissions would be interested in getting up to date data from a reasonably priced service? I think so. Technorati is… technorati, because it has so much archive and it's crawl is so complete. It's a huge advantage over everyone else.

Of course, no matter whoever provides the data, the cost of buying data will be the choke point, but we'll see what transpires. Maybe we'll build something and make it available to everyone else? Could be fun!

What a shame that Alexa doesn't see this need and fill it for us. (hint hint!) But doing a quick search in their web service system only returns 68K of posts – it's not much by comparison to what is out there. Their focus is different, they want to power search engines.

Update (5 April 2006):

  1. Feedshow have said they can provide a crawler – we're talking. Price will be the issue.
  2. Steve at Bitshop (our host) is very keen to help with hardware to do our own crawling (so he can have access to an archive)

Stay tuned. 😀



One Response to “RSS crawler and archive”

  1. Library clips :: OPML archive for your blog :: April :: 2006 Says:

    […] Combining OPML and RSS to create an export format for a blog Indexablog MigRawTion Can you tell what it is yet..? (the use of microformats) RSS crawler and archive […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: