Do click-throughs indicate relevance and value?

Have you noticed since the reBlogger 3.1 update (on 22nd of Feb) that the reBlogger we have getting a LOT of posts in every day now? About 49 per day – a whole page!

reBlogger.com will have this problem AND MORE, so I was thinking more about how to encourage relevance – how to find the signal in the noise, how to find the needle in the haystack.

Of course, since RB will not by default display the newest daily content (there is too much anyway) it will ask for a search term first and THEN display the newest content – AJAX style. :)

You can then add more search terms and exclude words… and if you want, you can group those search terms together under a label (keyword/category) for easy storing. To store, so you can come back, you make a login. So that is an easy solution to having too much information thrown at you.

However, I have been thinking, we can count the number of people who clicked through to the original blog. I think when you have clicked through to the author, you have decided that in fact this author knows their stuff. You're voting for the author. So we can infer "authority" based on number of click throughs.

After using the search terms, and exclude terms, if we then order the content based on authority (not on time) we will be displaying the best content is at the top!

built_with_reblogger2006.gif

Web 2.0 Open API thoughts

How To Roll Out An Open API.

I now have full access to your APIs. Total time taken: 2 minutes tops.

If it takes 15 minutes to fill out the form and then two business days for a human to “approve the request”, you’ve already lost a huge percentage of developers. Think of it like this: you want the developers to advertise your service, carry your product, or pitch your full service to their bosses. Why would you make it hard for them to do this?

Nat sets the record straight and shares excellent perspectives….

Making it fun to explore

Flikr has as their goal to make photos interesting. So I checked out their calendar. Click a day and you see pics from that day. It's interactive. It's quick. I love it. How does this differ from how I want to be able to explore the best that blogs have to offer? Not at all. Now look at the context in which they present a photo – tags, related items, scollable thumbs, a photostream and more. I absolutely love the ability to add notes to photos. Could we add notes to blog posts?

So they are really making photos more interesting! It's fun to explore well design photo collections (and it looks fun to store my own photos – had I not already invested MASSES of time in already storing my photos here at textamerica.com)

Now look at Yahoo Avatars. By playing with this (absolutely amazing Ajax) application I've told Yahoo a phenomenal amount about myself – my taste in clothes, sports preferences and hair etc. That's interesting, but what really captures me about this application is the way I explored and built up my data. How is that different to what I want to be able to do when I explore blogs?

I'm drifting from the user interface we designed for reBlogger where:

  1. step 1 is to define and include keyword and then
  2. begin to define exclude keywords and gradually build more includes
  3. until I have so many includes that I really should define a category… which is essentially the name of my "view".

But when I look at flikr and Yahoo Avatars, I'm wondering if we could invent an entirely new and better visual way of exploring literally millions of blog posts to fnd just what I want?As I wrote those words "add notes to blog posts" I suddenly remembered Ivan's idea of building a layer OVER blogs, so casual visitors can add meaning. I can smell the idea breakthrough… it's coming soon, sooooon.

Filter + popularity /= exploring

Ivan is right on the money in saying we must build exploring software rather than searching software. You can red blogs and their contents on bloglines but can you find and extrapolate what you're searching for?

I'm convinced we must target a company like Microsoft and given them a way to find and compare data contained in blogs. On a corporate intranet which I saw, they were clearly wanting to collate data from blogs, but it was useless.

I'll be honest and say that reBlogger is good. It uses include and exclusion of keywords to ensure it's always on signal (there's no interferance) but what I've come to see is that when I'm trying to find lots of info on a particular topic, there is still so much info on there, it's hard to collect next to each other exactly what I want.

So we're always on topic and on signal – but the signal waxes and wanes so much. We need a way to bring good content to the fore and drop bad content to the back. Yahoo knows this and is buying up social applications left right and center, because people are indentifyng what content is good and what is bad.

Technorati also sees the need to find what sources are better than others. They have an authority slider. Interesting perspective here. But it's a filter. It reminds me of this awesome Amazon AJAX Diamond search.

No, I really think filtering and popularity is not far enough up the tree. We need to go higher.

built_with_reblogger2006.gif

Ten Things To Think About

Guys, as we build our system, let's keep these in mind. (Thanks to Dion Hinchcliffe) What a tremendously insightful list of guidelines! Rachel Cunliffe simply calls it "me first". both posts are excellent reading.

Encourage Social Contributions With **Individual Benefit**

Make Content Editable Whenever Possible

Encourage Unintended Uses

Provide Continuous, Interactive User Experiences

Make Your Sure Your Site Offers Its Content as Feeds and/or Web services

Let Users Establish and Build On Their Reputations

Allow Low-Friction Enrichment of Your Information

Give Users the Right To Remix

Reuse Other Services Aggressively

Build Small Pieces, Loosely Joined

I think the key point that Dion is making, which underpins all the rest, is this:

The idea is that most people will not spend the time to contribute content or enrichment to a web site unless they are getting something out of it. With social bookmarking, it's the fact that your bookmarks are uniquely valuable to you personally, regardless of whether they are socially shared.

No matter how great our ideas are, how flexible, how WS integrated, how small, how layered how whatever… the contributor must receive real and immediate value. Only after that can we expect submissions and only then after that can we expect our social software to kick in and begin mashing or enhancing the value of having brought information together.

blog content ownership and control

What if a website doesn't want Google to index it and redisplay their content in the index? What do they do? They set up a robots.txt of course.

RSS/ATOM/RDF feeds were initially set up so people could use it for free. It was free beer (meaning: at no cost) and free speach (meaning: freedom) all rolled into one. But now all this free content is being used cleverly by companies to earn money. So my free beer in my RSS feed is becoming beer for sale on your website. No wonder people cry 'no fair'.

What's the solution? I think there are three ways to go about this:

  • All bloggers can include YPN/Google (or AmazonSense) into your post and then set it free to show up everywhere, on any website as long as it continues to contain your publishing ad (the tax for displaying the content)
  • Bloggers force people to click through to their own website by stating somewhere that commercial publishers can only use the first X chars of the post
  • Bloggers stating somewhere that no one (not reBlogger, not Technorati, not bloglines) can reprint their content.

OK, so I've said this before – perhaps not as clearly. I italicized "stating somewhere" because that is the key thing.Setting boundaries:

  • If you have a website, you set boundaries (go/no go) areas for a bot on your website you use a robtos.txt
  • If you write software you have a terms of use (free, shareware, etc.)

But how do you set boundaries and publishing rights for your blog or news feed?

On this page, which is a list of posts by the blogger called "Search Views"
http://www.seodata.com/Keyword-weight-density/blogger/re-76_SearchViews.aspx
we must develop something like 'this is my feed, I want to remove it' which leads to a page that says something like:

Remove my feed

We recognize that the author owns their content and we are taking steps to enable the author to protect their content. If this is your blog feed and you do not want it to appear here or on any other website which aggregates content, you need to insert xyz into your rss feed and on the next fetch by our software, we will automatically remove your feed and remove all pages related to your feed. Inserting xzy into your rss feed is the same as placing a robots.txt onto your site, to tell robots you do not want to be crawled. We apologize for the inconvenience caused to you.There is perhaps an attribute which you can set in the element XYZ in your feed:

  • XYZ 1 – full feed syndication (it is assumed that the displaying website does not remove your adverts from Google or YPN or other, and a link back to your site is provided)
  • XYZ 2 – Partial syndication (In RSS 2.0 the website must use the shortened description, not the full feed, in other feeds the website only displays the first X chars and the reader is forced to click through to the originating website)
  • XYZ 3 – No syndication (this removes the feed and all posts from reBloggers, blogline, technorati etc.)

The XYZ element is a tricky thing in XML. Straight forward to do, but to get it right is important.

These are just my ideas. I don't know what it should be – but it should be extensible. Clearly it is something that will need eventually to be submitted to a standard body. But this solution right away will get their stuff (which they do own and should control) off reBlogger (and other aggregation) websites.

Aggregation is the way of the future, because society always mashes things together to form new super-structures, but only when the rights of the person/company providing the various contributing aspects are protected.

Syndication, mashing, aggregation will become a tidal wave… so we have to find ways to protect the authors. If only we had done this for painters and sculptors – they sell the content once and can never collect a usage tax ever again. :(

built_with_reblogger2006.gif

1 billion blogs by 2010… exploring the tree

1 billion blogs by 2010? How did this happen?

Truly the next killer app of the internet is 'self expression'.

Email and IM were the first killer apps. Why? Humans are social creatures and therefore making contact with others is vitally important. In terms of technology, that's the low hanging fruit. People connected to the internet to get in touch with each other.

When they connected, I guess we all thought the same thing: what do I do now? Higher up the tree is not just connecting to other people, but interacting with them. So then we saw podcasting, blogging, discussion lists, owning your own website, leaving comments after reading an article someone else posted. These are all further up the tree and a little more interactive, dependant on other people already. A term called prosumer (consumer + producer) would best describe these types of people: they are very active.

I think we're about to see something newer. (I'm late to this game, I can't claim any credit for this. I'm simply standing on everyone else's shoulders who have gone before.) The next level up the tree is all about expressing yourself and showing your world view and how you piece life together.

Clearly blogs which are used for self expression are a form of this. Podcasting is the epitome of the prosumer, the highest that level will get. Podcasts are linear, from the beginning to the end. The viewer is a passive consumer, they are not as interactive, they are more of a throwback to TV. Portable TV.

Podcasts have "normal" humor (did you hear the one about…) while blogs can invent new kinds of interactive jokes. I'm saying Podcasts are the end of the previous trend.

Blogs tend to be more about self-expression, interactive, non-linear – and self-expression will only pick up steam because there are so many things we can express about ourselves. One of these things which we are always trying to express in every conversation we have is: our world view. How many times have you said "well, I don't see it that way" or "I respect your opinion and you have to respect mine too!".

Self expression is currently about writing out your thoughts. That's cool. Self expression can also be about tying together many many other people's comments and bringing them together in a way which defines your view on life.

While everyone else is simply tagging the river of news as it flows by, we will allow you to make your own streams and rivulets and redirect them where you think they should go.

It's the whole tomato, tomato thing. I see the word stream and think river. You may see stream and think "Standard Tensioned Replenishment Alongside Method", or "Stratosphere-Troposphere Experiments by Aircraft Measurements", or "Stream Transport and Agricultural Runoff of Pesticides for Exposure Assessment Methodology" which all have the acronym stream.

The news is the same, but my context is different to yours. You only have to look at the whole cartoon debacle to see that people view the same information completely differently and with different intensities!

Tracking memes… and our world views

This is an interesting article called 'Rating the Meme Trackers – Memeorandum still tops, but Topix and TailRank up there too'. Gabe Rivera is interviewed about Memeorandum. The TechCrunch full list of meme trackers is here.

My fav quote is:

The space is clearly hot, with both funded and unfunded companies rushing to release products. The goal? Leverage all of the great edge blog content out there, figure out what’s hot at any given time by analyzing who’s linking to who (as well as other tools) and presenting that hot content to users.

I'm looking at the list and I know we've got something different here guys. :) We're not going to present one view on the world of information, forcing all the visitors to read the content through that "lense". We're going to let users and visitors create their own views.

Why should the website be the "newsmaster" when it's the users who have entirely different points of view and opinions. Why should I view news items that are – to me – off topic, because you included those items. I'd like to have my world view, my lense/radar, my own river of news.

Go to digg.com and you'll be presented with their view on the world. Wouldn't you like to see all the world's information presented in YOUR own view? Including your keywords and excluding the keywords you have no interest in?

Maybe personalbee is heading in our direction, so we better get on with the coding! :)

What's so cool is that our technology is already built: reBlogger.

Tim Bray agrees with me

… or… since he’s Tim… I agree with him? He says: “XHTML + Microformats · If you’re delivering information to humans over the Web, even if you don’t think of it as “Web Pages”, it’s almost certainly insane not to use XHTML.”  Found via Microformats

Wikigate

How can we add meaning?

What I want right now is to be able to choose some tags and collate them together into a keyword which I choose. So if I was tracking the congress-wikipedia debacle then I’d choose some inclusion tags (+wikipedia, +congress) and choose some exclusion tags (-bush) so that I exclude off topic posts. Then I’ll save these tags to my chosen view “Congress caught red handed” or a more cool new folksonomy (user-invented word) like: “wikigate”.

Now when I visit the site I can follow my wikigate story. As the drama unfolds, I can add more tags (like subpoena and 401k) (hehehe!) that I’m watching for, and they will filter up into my wikigate view. I may add more views that reflect how I track my world around me. All my views could be collected onto a world view page – a page that uniquely reflects me.

Maybe someday I’ll see a view that someone else created to track all Microsoft employee blogs (like Mini-MSFT) and I’ll like that view, but decide that I’d change a few things, exclude some topics and generally do a cleanup. So I import that view and edit it and republish it! Now users can try that view other view and my view and enjoy the one which most closely fits their world view.

That’s where I think it’s going… content will be ubiquitous… the real social value will be in expressing ourselves, our views on life and expressing our individuality, our way. And it will be easy, free and fun.

Content + meaning = Web 2.0

We will see vast proliferation of content, repeatedly repeated, over and over. Links back and forth. Relationships forming and breaking everywhere.

I pity the engines, Google in particular, because they are predisposed to deciding what constitutes good content by analysing inbound links or internal linking or themeing. But in web 2.0 content will be everywhere, constantly duplicating, being mashed into this format, that format, assembled into this mash-up, that folksonomy (user-generated word, a word invented to serve a purpose).

Engines will have to get smart about context. There is often a difference between the meaning that I have when I say something, and the meaning you perceive when the words enter your mind.

So if the same content is available in 20 different places, then which one should I be sent to? Google will have to figure out what context I am coming from and then “remember” which contexts the various 20 copies of the content are being used in, and send me to the copy of the content which most closely matches the meaning I am looking for.

I hope that makes sense, because IMO that is what social software is building up to: adding meaning. I take your content and add MY meaning to it. You take my RSS feed from this site and mash it into YOUR site, giving it a whole new meaning. You’ve added to my content, by mashing it into your site.

Social software is not there yet, we’re not yet adding meaning. Del.ico.us doesn’t add meaning, it just collates and exposes in a more useful way. Technorati also doesn’t add meaning, it collates and exposes. Social software is not yet mature, it’s in it’s infancy.

However, I still own my content, you can’t tamper with it, and if I include YPN in it, you must honor my YPN and allow me to keep earning from my content indefinitely. I put my blood and sweat into writing this, so I should keep earning from it. You can use it, but you must honor my ownership, for example provide a link back to me etc.

Content theft or revenue generation?

Ah yes… here we come to the emotional part: are bloglines, technnorati etc. stealing content? I've been mulling the moral issues for a while and here are some solutions that I have come up with.

Before I get to that, I think I should clarify things, before giving some solutions:

  • we must accept that the web 2.0 is not the web 1.0
  • The web 2.0 is NOT about Ajax. It IS about mashing (mash-ups, compiling new things out of several old things).
  • It's not about content theft, it is about syndication – hence the acronym RSS (Really Simple Syndication – implying you actually want to be syndicated!).

So there are three solutions that any content creator can implement, in my opinion:

  • The teaser strategy: keep your content on your own website and only publish the first 20 words of your writing in your RSS feed. You then earn income from people clicking through to your site. If you teaser is not attractive enough, they won't click through. If you don't have keywords in your teaser the software won't pick it up and people won't read it.
  • The free for all strategy: publish all of your content and put YPM in the post. By publishing all of your content in your RSS feed, you maximise the chances that your post will be substantially replicated around the various sites. By including YPN in your post, you ensure that the more your excellent post is copied and read and copied and read, the more you earn. And why shouldn't you? You wrote it and it's good enough to be copied! IMO you should earn for as long as the post is read. I have compassion for artist who only earn once and never earn again as the painting is looked at for decades and sold over and over at inflated prices. Not so for the blogger who puts YPN in their post.
  • For the love of it: for this person they don't want income and don't want syndication, which is fair enough… it's their content! I suggest inserting something like a "robots.txt" into your feed, explicitly saying "do not crawl me". then software like ours and technorati and bloglines will know to keep out.

Ok, so… is that it? the whole content theft issue is resolved just like that?  :)

Nomenclature is so… yesterday. Folksonomy!

I'd love to be able to go to a site and define my own nomenclature (wikipedia says it's a system of naming things).

But wait, if I'm collecting a bunch of existing words and collecting them together and calling it something that I want to call it then that is Folksonomy. Wikipedia says folksonomies are user-generated and disorganised… but they will eventually result in the semantic web – I coudn't have said it better myself.
Do you see where I am going with this?

Yahoo 360 says you should subscribe to a bunch of feeds and read them, but the problem with this is that only 1 in 20 of the posts is actually interesting to me. All noise, very little signal.

Technorati is fabulous at finding tags and related tags in posts… but they don't exclude other tags and don't group tags for me, I can't import other people's tags and improve upon them. They have missed the social aspect of things.

We're going to build something so cool that when you see it, you'll have fun. It's user-centric, sharable, it adds value to the original post (without changing the original post).

Keeping up with the news… social software style

Whenever I find items I want to follow, I dash into reBlogger (our product we're building) and write up a keyword and add the tags that I'm searching for. I've made one here for Microformats and tags which collects posts on 2 terms from 100+ blogs and collates only the posts that relate to the terms I'm looking for – which then feeds up to the keyword (like a category) which I have defined.

If you're reading this, you're wondering if this is just a plug for our software. It's 10% plug and 90% me getting excited about where this is all going. Read on…

I also enjoy following the various seo contests and so I created a "seo contest" keyword for that too. Now in one place I can view what people are writing about with respect to my two favourite hot topics right now.

Actually, I am fascinated by the whole wikipedia and US Government/Congress thing that is going on. Today I read that the entire congress IP address range is banned. It's hillarious! But unfortunately I can't make a keyword with sub-search terms to collate on our website… because our website is focussed on SEO and this stuff would be off-topic.

Where can I go to track a bunch of related keywords in one central location – and maybe track several of them… and build up a world view… my own view on the world. Sigh, there isn't such a place. But wait, we built reBlogger, so let's build this new free service too!

So, that's how our new free service on the web was born. Cool huh?

I realised that this ability to search and collect and display in a convenient way that suits only me, is only available on my site. It's not on yours and it's not freely available for everyone in the way that digg.com or bloglines.com is available.

built_with_reblogger2006.gif

RDF… nope. XML… hmmm. Social software – yes!

I (like so many others) really thought that RDF had a chance. After all, machine readable sounds so cool… say it with me “machine readable”. Surely if we could delegate understanding content to a machine then they would do it better than us. What was the problem then? As an author of a book on XML (which sold 18,000 copies and paid for several great holidays) I was fairly confident I’d be able to read RDF. I looked at it, was stumped and gave up.

“Ah, XML you old beast, you’ll do the job” I said. “You’ll make the web structured by letting us publish and mark our content up. What’s this, competing schemas? Repositories? WSDL(s)? No matter, Microsoft Biztalk will help translate between the different markups.”

Hmmm… it just didn’t sit right. Maybe that will work for rich banks, governments and Big Co’s, but not for me, not for the individual.

I sort of gave up and didn’t do anything. For years. I read Kurt Cagle’s stuff, I admired his passion to keep banging away on his drum – even if nothing changed. I watched my friends from yesteryear go forward, onward and upward, but I don’t think bigger was better. There had to be a better way. Something for the rest of us.

I’ve discovered social software and Microformats. I’ve seen the light. The individual is driving the agenda once again (just like Dave Winer said many many times… or at least I think he has said it, or thinks it) . Google and the engines will battle to keep up!

This is how it should be. I’m writing. I’m having fun again.

HTML wins (AKA the little rel- that could!)

I found the rel-nofollow a while back but didn’t “get it” until yesterday when I found rel-”tag” and then all the other microformats related to the rel HTML attribute. The lights came on for me. RDF wasn’t going to do it. XML got lost in politics and while the semantic emperors fiddled, social software came of age and began to solve the burning problem!

Introducing Mark

The team will introduce themselves in future posts.

I am busting to share some of the things I find fascinating about what I see going on in the web. Before I get cranking with my thoughts I figure the polite thing to do is introduce myself. Back in 1999 I began to write a book called XML Programming with VB and ASP in which I wrote about some of the ideas I thought the web could achieve.

Yesterday I literally bumped into some concepts which I recognised. Since then I’ve realised that while most of us thought the XML (or RDF) crowd would revolutionize the world of data… but in fact through sheer weight of numbers – the good old scrappy and disorganised HTML editors, writers and users will in fact categorize the data on the web.

Ah but I’m getting ahead of myself. Hehehe. I’m meant to be introducing myself, not talking tech. Well… not just yet. Not until the end of this post.