jump to navigation


Who Are You? Whose Baby Is That? Didn’t I Write That? March 30, 2006

About a decade ago, just after the public Internet made its debut, cyberpunk author Neal Stephenson claimed in one of his novels that copyright violations would run rampant on the Internet, and there’d be nothing we could do about it.

Stephenson’s prophetic words have come true, especially in the blogosphere, where people lift content in its entirety from one website and post it onto their own. These lifters come in two categories, however: those that give credit and those that don’t. By credit, I mean either including the copyright notice and/or at least a link back to the original article.

Since links are the currency of the Internet, especially the blogosphere, not having at least a link back would make most bloggers upset. Jeremie Miller, founder of Jabber, is hoping to prevent these situations with MicroID.

Miller’s Jabber is the OpenSource software that powers many IM (Internet Messaging/ Instant Messaging) clients such as Skype and GTalk. With his new MicroID, he’s hoping to help identify authors of web content. His intent is that authors use MicroID each time they create a new web page. Web services like Google, Technorati, Yahoo, etc., would then use MicroID to determine the true owner of duplicate content.

The idea is simplicity itself: create a unique hash code (alphanumeric string) using two authority codes such as your email address and the root URL of your website where you want to post something. The MicroID that is generated would then be included in a tag on the web page containing your new content. A similar MicroID string could be used for verifying user memberships or validating user feedback for comment moderation.

All very interesting. But flawed. I can see this method of digital identification working for verifying user membership and validating user feedback. It’ll also work for sites that require you to embed a unique code into all of your web pages so that they can track it for you. And when you post new content, providing that search engines index the corresponding web page in a timely fashion, you’ll be authenticated as the original owner of the content.

In fact, if the engines implement a MicroID system for authenticating ownership, it’ll be crucial that they index new content almost immediately. Either that, or they need a system whereby an author sends a packet of information to a “please index” queue. This packet would contain two MicroIDs: one for authentication and one for the actual content. (Okay, just a hash value.)

Otherwise, a lifter could subscribe to the web feed (RSS, Atom) of a weblog or website, find out when there’s new content, and somehow manage to lift an article and get it indexed with their own MicroID before the real owner. Unless I’m dense and I’m missing something.

But MicroID isn’t going to stop actual copyright infringement. Because how hard is it for someone to do a “view source” in their web browser and cut and paste the portion they want? The future algorithms of engines might mean that un-MicroIDed pages may not get indexed. But lifters will still infringe, unless they have either explicit or implicit permission to republish. And who’s going to successfully stop them?

Technorati Tags: , , , , ,



Comments»

1. Jonathan Bailey - March 30, 2006

While I agree that Stephenson was correct that copyright infringement is rampant on the Web, I disagree with the idea that there’s nothing we can do about it. There’s a big difference between nothing being done and not being able to do something.

Simply put, most Webmasters don’t take any real action against those that steal their content and then lament that it happens. They don’t take the time to research and learn what their rights are and what they can do and then invest the time to develop a plan and handle it.

While I won’t say that I stop every case of plagiarism of my work, I do stop at least 95%. Though not perfect, I’m a long, long way from saying that there’s nothing I can do about it.

Regardless, you might also want to look into Numly numbers (http://www.numly.com). Chris has developed a system for tagging data and fingerprinting media files to verify ownership. While it doesn’t prevent content theft, it’s a huge aid in stopping plagiarism and it’s a content management system that is fully functional, with verification, today.

Anyway, thanks for the link and the interesting read!

2. rdash - March 30, 2006

Jonathan: You’re absolutely right. Unfortunately, most of humanity will complain but do nothing. It’s the reason why many terrible politicians win elections. But here’s what I didn’t say: how do you effectively stop people? If they’ve copied your work, how much are time and money are you willing to spend to protect your copyright? Some people don’t even know where to start.

How have you forced lifters to remove your content? Threatening to sue may work, if the person can be traced down. The United Nations Bern(e) Copyright Convention of 1972 is supposed to protect all citizens of any United Nations member nation. But most people don’t know that. And it still doesn’t make things easier.

What really disturbs me is the increasing trend towards the need for each of us to be digitally registered somewhere, just to protect ourselves. If there’s a way around it, I’d prefer that over biometrics or similar solutions.

3. Jonathan Bailey - March 30, 2006

rdash: Actually, I have stopping plagiarism down to a science, I spend an average of about fifteen minutes per case, much less on most since I account for the small percentage that are stubborn and require more attention.

You actually have a variety of tools in your hands. Cease and desist letters, DMCA notices (to hosts and to search engines), shame forums and so forth. I actually have a whole series about that on my site, it’s called “Stopping Internet Plagiarism” if you’re interested.

The good new is that, right now, you don’t have to register anywhere. It does help though to prove ownership, but I don’t see a Numly number as being intrusive. Still, I’ve never based a “win” on any kind of registering I’ve done and I’ve shut down over 300 plagiarists.

It’s all a matter of knowing your rights and knowing how to exercise them. It does take time, but not as much as some fear and it’s far less time than it would take to prove ownership of a work once it spreads like wildfire under a thousand different names.

To borrow from Dennis Miller, but that’s just my opinion, I could be wrong.

4. rdash - March 31, 2006

Jonathan: Very cool. Kudos to you. Seems like you have had a lot of success. I’ll check out your site. Maybe interview you? :) How do you feel about those bloggers/ writers who take the viral route and use article marketing to spread their content? What’s your feeling regarding partial-text vs full-text web feeds?

PS. Miller rocks, but he’s just too smart for most people to follow.

5. Jonathan Bailey - April 5, 2006

Sorry about not getting back in touch sooner. My main computer was, ahem, out of commission for a few days and I just got back around to checking everything.

I’ve had a lot of success but that comes from trial and error, I learned the hard way and made some very embarrassing mistakes.

If you decide you want to interview me, just send me an email to let me know. I’m available most days during the evening hours.

There’s nothing wrong with the viral content strategy so long as it’s done in a way that ensures the attribution is attached. I think Blogburst has a good approach there and a few other sites are doing similar things. Personally, I love Creative Commons License and copyleft ideals, but I know that they aren’t for everyone.

Truncated feeds are good right now, they defend against about 99% of all content scraping. But that’s about to change New scrapers are going to pull from the permalink and get the full text that way, thus removing footers and copyright info as well. New techniques will have to be developed to beat that, but that’s at least six months off, if not much longer.

And yes, Miller rocks, but he is over most people’s head. I refer to him as the “thinking dude’s” comedian.

6. Hans Gerwitz - May 26, 2006

MicroID is about neither “help[ing] identify authors of web content” nor “authenticating ownership.” It is, rather, intended to verify authorship. For identifying authors we need something like PersonCode, and authentication will require signing with something akin to S/MIME. Theft prevention is deeper still and might be insurmountable; just ask the RIAA about that one.

7. rdash - May 26, 2006

Hans, thank you for your comment. I’ll have to check out PersonCode.