DRM in RSS 2, OPML 2 and ATOM 1

DRM, Digital Rights Management

DRM handles the description, layering, analysis, valuation, trading, monitoring and enforcement of the usage restrictions that accompany a specific instance of a digital work.

Any blog post is a "digital work" and as a creator of aggregation software, the rights of the author is important to me. But I'll go beyond that and say that I want authors to earn from their work and I want our software to handle that for them. I'll go even further beyond that and say that our software can encourage and enforce correct usage restrictions too.

To what extent do the existing web feed specs make provision for DRM? Or… to put it another way, if our software was to implement DRM on behalf of the authors, which spec has the features we need and which ones don't?

I've posted about the need for extensions to RSS in order to safeguard the content creators rights. As a creator of an aggregation product, I think it's very a important topic. Here are some of my posts that contain practical suggestions:

Although these posts are not in the same focus area (rights protection) as my own posts, I've found quite a few comments about the limitations of RSS and the inability to influence the "owners" of the spec who will take charge in dealing with the changes that are needed. Here are some of the posts I have found:

Clearly we need some changes, otherwise companies (Microsoft?) and people will just begin to implement their own changes as they see fit, on behalf of their customers. Another wild west scenario.

OPML

Dave might be onto some of the things I am looking for:

I'm leading a lunch discussion today about Identity in RSS and OPML, particularly OPML 2.0, which has a element for the author's identity. It's specified in 2.0 as a URL, and should plug into the work being done in this community.

The OPML 2.0 spec has some really useful information in the <HEAD> area.

<dateCreated> is a date-time, indicating when the document was created.
<dateModified> is a date-time, indicating when the document was last modified.
<ownerName> is a string, the owner of the document.
<ownerEmail> is a string, the email address of the owner of the document.
<ownerId> is the http address of a web page that contains an HTML a form that allows a human reader to communicate with the author of the document via email or other means.

Dave is clearly interested in taking the long view by including this element:

<docs> is the http address of documentation for the format used in the OPML file. It's probably a pointer to this page for people who might stumble across the file on a web server 25 years from now and wonder what it is.

But OPML is not designed to contain content, but rather to link to content – and perhaps to link to the content which is linked to by that content (recursively). It's very good and useful at that. OPML is not what I'm looking for.

RSS

The RSS 2.0 spec contains only 1 author related element and it's an email address:

An item's author element provides the e-mail address of the person who wrote the item (optional).

I don't think it's sufficient because email addresses change over time. So RSS would not provide enough information for the protection of the rights of the author.

ATOM

The W3C Atom format spec (not Atom 0.3) has far more useful information than either RSS or OPML in terms of tracking the lifetime of the "item" (content) and in always being able to find the original author. Atom even hasa "rights" element. No wonder entire sites are converting to ATOM.

The "atom:author" element is a Person construct that indicates the author of the entry or feed.

The "atom:contributor" element is a Person construct that indicates a person or other entity who contributed to the entry or feed.

The "atom:id" element conveys a permanent, universally unique identifier for an entry or feed.

The "atom:published" element is a Date construct indicating an instant in time associated with an event early in the life cycle of the entry.

The "atom:updated" element is a Date construct indicating the most recent instant in time when an entry or feed was modified in a way the publisher considers significant. Therefore, not all modifications necessarily result in a changed atom:updated value.

The "atom:rights" element is a Text construct that conveys information about rights held in and over an entry or feed.

I really like the foresight of this next element!

If an atom:entry element does not contain an atom:rights element, then the atom:rights element of the containing atom:feed element, if present, is considered to apply to the entry.

Atom does a far better job of giving the elements that can be used to protect the authors of the content. In the two specs above the main author element which is intended to contain an email. But email addresses change over time – and in this way an author could lose touch with the ways in which their content is being used.

Atom uses this word "person" throughout ther spec. What is a "person" in Atom?

A Person construct is an element that describes a person, corporation, or similar entity (hereafter, 'person'). This specification assigns no significance to the order of appearance of the child elements in a Person construct. Person constructs allow extension Metadata elements.

The "atom:name" element's content conveys a human-readable name for the person. The content of atom:name is Language-Sensitive.

The "atom:uri" element's content conveys an IRI associated with the person. Person constructs MAY contain an atom:uri element, but MUST NOT contain more than one.

The "atom:email" element's content conveys an e-mail address associated with the person. Person constructs MAY contain an atom:email element, but MUST NOT contain more than one.

Overall I can imagine Atom providing us with enough elements to be able to implement some form of protection for the rights of the initial author.

What is the issue here?

If we don't take action now, we will have a situation where people earn off content in the same way as people earn from paintings. If I paint a wonder piece of art, I sell it – and that's the end of my revenue. The artwork can be resold 20 times and increase in value 100 times… but I make nothing. Speculators make everything, I get nothing.

Without protecting the author and providing them with income, we really cannot expect to see the emergence of professional authors who create great content over the long term.

Specs

Here are links to the specs:

This is an important issue to me because we're building the reBlogger website based aggregator and I want to honor the digital rights of the author… but I can't programmatically determine what their rights are!

built_with_reblogger2006.gif

UGC – User Generated Content (redux)

Redux: revisited (yet) again.

Is UGC (user generated content) big? I think it hasn't even begun to impact things. Take a look at this Google graph (below) and read the comments on the Google Zeitgeist page about wikis

Wikis

I used to think that UGC was all about blogging, merely creating content. Then I suggested that in fact we might be on the edge of more than just content creation in Semantic mashup artistes. I think that some users want to own and build things. Some people are scripters who build extensions (think Firefox) and some are authors (think blogs). We can create spaces that cater for both.

Take a moment to read 20 Types of Blog Post he posits that the types are:

  • Instructional
  • Informational
  • Reviews
  • Lists
  • Interviews
  • Case Studies
  • Profiles
  • Link Posts
  • ‘Problem’ Posts
  • Contrasting two options
  • Rant
  • Inspirational
  • Research
  • Collation Posts
  • Prediction and Review Posts
  • Critique Posts
  • Debate
  • Hypothetical Posts
  • Satirical
  • Memes and Projects

You might wonder why I list these? How does this relate to voting? Currently voting is DIGG style, you vote something up. Some sites allow voting down. Slashdot does this well.

I'm thinking about a different kind of vote. I day dream about a site which allows voting on the type of post this is and then the system uses the post differently based on the extent to which the post has been categorized. So the UGC interaction has an effect on the way the post is used.

So the impact that UGC might have on the steps involved in searching could be:

  1. Voting for the content being viewed (UGC)
  2. Searching for content of a type (find information with a particular content type)
  3. Viewing the content, which is laid out differently depending on type (UGC)
  4. Provide links from the current position across to other related posts (horizontally to content in the same type or vertically into adjacent types) (UGC)
  5. Allow user to drag and drop the types so influence which types are adjacent to other types (UCG)

While viewing a post – such as this one here – do you think it's a case study or a review or research? While doing a search for case studies, should this post come up as a search result? If it does, should it be presented in a different context?

Ultimately I really like the idea that humorous posts look, behave and are presented in different contexts to authoritative research posts.

Tags /= keywords. Making a tag search site!

Every now and then I come across a blog that has really great information. Tagsonomy is one of those blogs and in particular I am really impressed by this post: The year in tags Having all these items together in one post really reinforces the changes that are happening at the moment.

IMHO what is missing is the ability for tags to be compared and matched, so if something is tagged in Amazon, Yahoo and Del.ico.us then searching for a tag on wordpress.com reveals content on those other systems – because the various tags are known to be the equivalent of each other.

I'm not sure if that was clear enough. You have probably seen opmlsearch.com right? You can search for a keyword there and find the opml files that contain that keyword. Cool. Is there such a thing as a tag search?

Update (Tues 4 April 2006): I found a tag search site! KeoTag
If I go to a tag search site and look up a tag, will it only return the posts that have that exact tag?What about related tags? What about similar tags? If you're only finding that exact tag, how is that different to a keyword?

Tags are not keywords. Content can be tagged differently on different sites. How we choose to tag content can reveal much about the content and also about our own thoughts.

Tags help to reveal two things:

  1. It reveals my perception of the context of the post I am reading. It's about the post. "I think this is about search" or "It's about memes" or "It's definitely about tags", or "What a great post about keywords"). A tag is in the eye of the beholder. (Or is that beauty?!)
  2. It reveals my perception of the thing *I* am looking for. It's about me. I am looking for "food","meat", "a recipe", "research" or whatever.

It's the same piece of content, but my choice of tags can reveal my intention (about me) and can reveal my perception of what I am looking at (the object).

So a tag search site would not function like a keyword search engine (simply looking for a keyword), it will have a deep understanding of core root or stem words and ontology in order to understand what the implied context is (of the post) from the reader's perspective.

When the tag search system can correctly understand the various tags that have been applied, then it can far better understand the post.

So here's the parts of what is needed for a tag search site:

  1. We need to have access to all tags on all systems (APIs from Amazon, del.ico.us, technorati, google etc.)
  2. We know the intention/perception of the person who tagged the content (relatively easy!)
  3. We then need to understand the intention/perception of the searcher (hmmm?!)
  4. Match the two together for a perfect results set containing content which matches my intention/perception

This would result in a far better search engine than what we have today.

Update (Tues 4 April 2006): This is a good blog post about tagging, Folksonomies – Tidying up Tags?,

technorati tags:
del.icio.us tags:
icerocket tags:

Semantic web for real – ZACK

This is a post about the next generation use-driven application. I've written a lot within the company about how I want our products to evolve – particularly limiting myself to a vision which is within reach of what I see around the web (otherwise no one can relate to it). I've been cautious about speaking in a limited way, but this post is a radical call to something vastly different. I'm not the inventor, I'm just standing on the shoulder of a great people and commenting on my perception of the future.

This new environment and approach I will call: ZACK. I need a name that can protect the names of the innocent… oh, and to avoid lawsuits. 😉

I am writing as if I am speaking to the inventor of ZACK.

Procedural? Class? Conversational and organic!

In blogs we all persist with a conversational style of marketing. But ZACK will not happen in a spontaneous combustion of conversation. ZACK will not come out endless conversations, even though it's structure is conversational. That is medium/message thinking. No, I think it will happen out of being used. All you have to do is start letting people use it!

Rather than create YAPLCL (yet another programming language and a class library) and making people use the "improved" class library and having them add on to it, you've done something entirely different. You've made the beginnings of an environment in which people can extend language and meta-sets of words can construct visual things, movement, time, spatial oritentation. If you don't have these things, it will come in time because they all grow out of the language – just like in the real world.

I don't know HOW you built ZACK, but I do see why if enough OSS people got excited by ZACK, it would be extended out of sight.

As far as I can think about it, you're built something that is made out of actual language. All languages have certain core words out of which all the others are built. Wherever ZACK is introduced, it will grow from there organically – sort of like how a fractal can keep expanding from one starting point on a page. So bothering existing organizations is not a good investment IMHO, just let people start using it.

Needs centric

The reason this is so different is that while Gates thinks it's about being "user-centric" (read: User-centric part #1) it is in fact about being needs-centric. For some people that's research, some need a market (market-centric) and some have other needs. In a system where the meaning of a word is contained within the word, all the various knowledge needs of a real person would be met by a system like this.
Or in other words, it's about being like eBay (focussed on helping people doing things) or being like that big marketplace company which is allowing it's users to write their own code and upload it so that they can sell services to each other. Cool.

But ZACK goes beyond those walled systems. ZACK is community, a market and it is conversation driven from the ground up. It was built using words and it will be extended by extending the words it uses into more meta-words. The more ZACK is used, the more it's extended. The more ZACK gets extended the more it's used. Viral to the max.

I can't imagine how this gets built… I am not a coder. 🙂

I don't know how you'd express the relationships between things in order to indicate context, but my simplistic guess is that the relationships between words (the object) are basically a new kind of "inheritance" between words, so the context is expressed as words. Word feeding off words feeding off words. Words are only useful when surrounded by other words. Words are unintelligable unless surrounded by other words. Word placed together become phrases. Words are the building block of ZACK.

For those reasons, I don't see how you avoided an indefinite loop either or gobbling up far too much processing power either – a high level word (meta-word or combination word) could "consume" or "depend" on thousands of words. heheheh. The mind bogggles.

Making money

I won't tell investors they can make money off it (other than with IP and licensing – which will alienate the very people who will make it grow). Why won't it make the inventors money? Because if ZACK is free and all a person needs is a server to store it on, and then data can flow freely from it to the other ZACK servers. Perhaps you can sell ZACK identities and the locations of words (which are really objects like hyperlinks, or names, or DNS) and ICANN would indeed be the people who could implement that.

Sure some people will make money. Of course. Amazon. People who sell things. That's normal. And unless Microsoft invents this or buys the company owning it, they should be afraid. The only people who will love this are the people who have learned to live in the OSS world of consulting.

IBM will be big winners. They jumped onto Java. They will jump onto this. They will advise British Airways on how to build words that people can use in their writing so that airline details show up right there in the document. (I can hear Microsofties complain that this is "smart tags" but it's totally better and different). At first British Airways will tell the ZACK (the architecture) to protect the component, IOW don't let anyone grow it. But in this day and age of mashups, see Semantic mashup artistes, they will either open up or be pushed aside by the hordes of "word growers".

What we have seen in firefox through the creation of extensions will pale by comparison. And if building an extension is as simple as understanding the words, and even better if I already know some of those words, then anyone can contribute extensions. It will explode. People may prepare for meetings by defining the words they use and synching their partners applications beforehand.

Controlling the meaning of words

Controlling the meaning of words will more than ever before be the way to control society. The US government will be excited. They have not been able to control the internet for a long time and now the internet IS the mainstream (sure it's inefficient, sure it's largely just a platform waiting for a truly good application – but it's still the mainstream at the moment). But if the US Govt can regulate ZACK by law then they can at least HAVE A HOPE of regulating thought again – by controlling the meaning of words used.

For a while FOX NEWS TV was the mainstream – they redefined the meaning of words on a daily basis. But now the mainstream (for word invention and meaning) is blogs – they are wild, wooly and uncontrollable.

If ZACK channels the mainstream, and if the meaning of words is *in* ZACK (not just is ZACK, but is also contained by ZACK, that is to say, through constant use and recombination language actually evolves within ZACK) then they (the Govt) have a chance of re-influencing the mainstream again. They have one place – a pressure point – through which meaning can be controlled: ZACK.

Diversity won't protect it. Even if too many people create alternate words and meanings, society still drives us all to collect behind one or two political parties, one cool event, the coolest RSS icon – the others languish fighting over the leftovers. It happens over and over again.

The dynamics will be interesting. ZACK will no doubt allow me to replace one meta-word with another meta-word, because the new one performs the same function, but because of a different meaning, it gives me new options and the end result is a different output than what I had before. ZACK will offer me other words that other people have created that appear to fit in. Perhaps I will be able to try before I buy… but in the end like with Firefox, most extensions will be free.

But even in a free for all medium like the internet, only a handful of websites end up being dominant. and the same may be true of ZACK, because just like the internet is not walled off (like AOL was) we still only visit a few places. Surfing ZACK will be different because you're always IN the ZACK application, but the information will be more recursive, more like Dave Winer's vision for OPML, where you're always in the OPML browser, but RSS feeds and websites and OPML entries all blur together and you just keep drilling down down down as you surf more and more, finding what you want.

However ZACK is different, you don't surf for interest. It's market driven, needs driven, want driven, interest driven. Right now you surf to find something to use. When I surfed for the first time on a green screen at a business I worked for almost 15 years ago, I wondered what to surf for. I was told there were TONS of things to see. I wasn't motivated. ZACK will let you surf things to USE them. If I can't find one, I can make one – and republish it.

Everything is a compound word or a meta-word

But back to the money side of things – ZACK will offer to sell my meta-word or compound-word for me, but in the end the mystery that is Linux, Firefox extensions and OSS will gradually destroy that market. "For sale" components will need to move further and further up the tree as the OSS free-as-in-beer crowd gobbles the low hanging fruit and moves up the tree.

In ZACK, every"thing" is like a hyperlink – except the thing *is* the thing, it's not a representation of the thing. A hyperlink can be edited and improved and republished. The other users can choose to be notified of upgrades to hyperlinks. Since ZACK is simply a compound of invented words (with only the very core root or stem words being protected) the number of notifications of possible upgrades expands astronomically as the words are upgraded through the larger number of visitors. Words are upgraded by inheriting from other words. The number of notifications would rise to become a real problem. The wiki and slashdot approach (where users contribute by coting onthe usefulness of words) and notifications below a certain level don't display. See, Relevancy! Relevancy! Devel… err… Relevancy!

Growing language by using and extending language

Or, I am only notified of the alternatives when I want to extend something. Or even better what if ZACK could track my behaviour and my goals and try to help me achieve my goal. If it was that functional (in terms of having functional "words") then it could bring to my attention recently published words that suit what I am trying to build.

That requires very high level behaviour words. The word "behaviour" itself is a compound word and once it is "invented" in ZACK, it can be used and "grown". Most words will be grown by users. Once grown they can be used. ZACK should track my behaviour and experiment with words to find the one that matches my need. The words that must have been grown in order for this to happen include: need, goal and action. Each are compound or meta-words, so many other words need to be grown first. ZACK will ship with a basic set of root or stem words and users will grow it from there.

It won't push any one major language since users will quickly port each word across to their own language, in the same way that the magnificent wiki project is gradually growing (through it's contributors) into many major languages.

The impact

ZACK has the potential to impact

  • knowledge gathering and storing and use
  • application development
  • the diversification of language away from English
  • the rapid expansion of the number of word in use
  • rapid expansion of IP owned by individuals rather than companies
  • the identity-ownership relationship will be maintained even while the IP is being replicated thousands of times around the ZACK-web (by comparison to how blogger lose their ownership of their posts, see blog content ownership and control)

Hard to beat

ZACK will be very hard to compete with, because:

  • people have a tendancy for to gravitate to one solution, especially if that solution provides everythng they want
  • the better ZACK "grows" words, the faster it will consume existing knowledge and language and extend it
  • even if the intial "growing" process is cumbersome, the "growers" (the users, the Firefox extension builders) will grow a better growing process. The better the growing process, the more people it's accessible to. The more people, the more contributions and the more growth.

I'd like to hear thoughts and feedback.

Semantic mashup artistes

Someone makes the car, another person engineers it, another builds the car show to display that year's best cars, another makes a car magazine, another offers to hotrod your car – and so on.

Is writing the blog post the end of the usefulness of that blog? No. Someone writes the blog, another person makes the blogging software, another (YPN) provides ads, another (FeedFlare) makes widgets for feeds. There is a whole ecosystem.

We all know about mashups for using applications (via their APIs) to merge them in new and interesting ways… but in the world of blogs… what about semantic mashups.

As a programer/project manager/business analyst I discovered there were two kinds of programers: builders and designers. The builders were the C++ types, they built the widgets (VBX, ActiveX etc.) that the designers (Visual Basic types) would use. You hardly ever found a VB programer writing a grid, they simply used a grid someone else had already built (usually in C++). It's anathema to a VB coder to write a widget, they use widgets!

This corresponds to the person who builds a chair and the interior decorator who uses the chair beautifully in a room.

This is why in a world of content, the 2.0 thing is to do a mashup. The builders have provided tons of apps for the designers to use in a mashup. PageFlakes shows that this concept is already very mature. Mashup camp shows there is already a mashup ecosystem growing well.

After cars came car shows. After C++ programmers came Visual Basic programmers. When you have enough variety of chairs, styles, colors and shapes – then come the interior house decorators.

Shortly to enter center-left: the blogoshpere designers… semantic mashup artistes. The question on their mind is: all this great content lying around in the blogosphere – what can we do with it?