Thursday, October 11, 2007

YouTube for documents pose risk to data security

The YouTube problem faced by content producers (e.g. television networks, record companies) has largely been a non-issue for most organisations. It's a big problem for them however. Articles all over the place going on about the billions of dollars in revenue being lost because it's easy for people to post and watch things on YouTube. Some have given up and embraced YouTube as a place to promote their artists. For example, RCA Records has a YouTube channel where you can watch all their latest music videos...and many old ones too.

The thing about YouTube is that it makes things you want to watch really easy to find. Just search for it. That's the real power (that and they were first to market and are now owned by Google, but these facts don't help with what I'm trying to say). It's a lot easier than asking your friends via email, instant messaging or social networking sites if they have certain files. YouTube also doesn't limit your "search network" to just your friends. You can search for videos posted by millions of people you don't know and will likely never meet.

Peer-to-peer networks, although related to the problem at hand are another issue altogether. They expose the same types of issues, but they are not as big of problem as one might think when it comes to corporate networks once you compare it to what I'm about to outline. Allow me to explain.

In a corporate environment, it's relatively easy to control the network traffic and applications that your users are running. With the right tools in place, you can prevent users from installing and running peer-to-peer applications or block the relevant network connections required for these things to function. This is a MUST. Imagine the whole peer-to-peer network workwide potentially being able to search for proprietary and sensitive information that is held on your corporate network. Users aren't trying to be malicious most of the time, but they aren't security people either. So they inadvertently leave great big holes in your organisation.

Imagine KFC's list of 11 secret herbs and spices being hosted on a KFC server somewhere and having that exposed to a peer-to-peer network! They probably wouldn't go out of business, but someone would put a serious dent in their revenue if they got their hands on it. Or if you're a retailer and one of your employees accidentally leaves a file full of customer financial details unencrypted and sitting on a folder that the peer-to-peer software can access. Great, big, giant hole that is going to cost the organisation lots and lots and lots and lots of money, not to mention the intangibles that cannot be measured in dollars (e.g. customer confidence, damage to the corporate brand). I think you get the idea. Peer-to-peer network on corporate network = bad idea. So lock that down.

Back to YouTube. But it doesn't host documents you say. Yep. True. Which is why YouTube is not really a problem for those of us not in the music, television and movie industries. What happens if there was a YouTube for documents? Quasi YouTube-like repositories have always existed. They're called online file servers. But they only store stuff...and most of the time this is not public. It's just there for the user to store their own stuff. And even if they someone allowed documents to be made public, they weren't very easily found. So they're not really YouTube for documents. Even so, you should probably be blocking users from uploading sensitive files to these sites. The risk profile isn't quite as high however, because of the lack of decent search capabilities. You put decent search capabilities and the power of tagging next to a document only type of site, and you get YouTube for documents.

I bring this up because of late, there have been quite a few start-ups doing just this. One of the early ones was Scribd. They actually market themselves as YouTube for documents. Recently I've also come across docstoc. They pretty much do the same thing. And to a lesser extent there are also sites that are essentially an online desktop/bulletin board for you to throw things on there. Photos, notes, music, videos...and documents. These can also be made public. I'm not sure if their search capabilities are decent, but they are there. Recent examples are Stixy and WIXI (as an aside, who the heck comes up with these names! They sound like something my little 5 year old cousin could have come up with).

So now we all have the same problem as the content providers that despise everything that YouTube represents...except corporations have more to lose. Why? Because it can potentially cost billions. That's right. BILLIONS. In fines from losing customer information, for not being PCI compliant, for not passing the many many audits organisations are subjected to nowadays and so on. And there's also the potential billions that could be lost if your "11 secret herbs and spices" gets out there.

Where previously the most convenient way for your data to leave via the network was through email (or even web-mail), it can now be hosted on multiple, searchable, document sharing websites. When you email a sensitive document out to a list of people, the speed that this document can proliferate is only as fast as the recipients can press the forward button. Even then, you're limited to their address books. This spread is exponential by the way, so it's by no means a non-issue. But it's still slower than having the sensitive document immediately available to a whole community of users! It's like aggregating all your contacts, the contacts of your contacts, the contacts of their contacts' contacts and so on (try saying that quickly) and blasting the document to all of them at once.

I've got an account on both Scribd and docstoc. I don't really use them at the moment. I just wanted to check them out. My first few searches on each already produced documents that I'm not so sure the companies they relate to want out there. But hey, they're freely available. You just need to sign up!

So how do you stop this? While you can conceivably and justifiably block most peer-to-peer applications, you can hardly block users from using the Internet! Sure, you can block Scribd, but then docstoc comes along. Then you block docstoc, and another competitor comes along. It may never stop. The same can be said about peer-to-peer clients, but it's a heck of a lot harder to build a new peer-to-peer client than it is to build a new document sharing website. Blocking specific sites or applications is only a temporary fix. There'll always be the next site or application that comes along.

The key is to protect the information and control the many ways it can leave and under what circumstances. There's obviously the whole issue of information identification and classification as the up-front step. But that's for another day :)

Just be aware of the escalated risks YouTube for document-like sites pose to your corporate data. They make the information much more readily accessible and data loss and leakage will happen a heck of a lot more quickly than it has in the past.


Jason Nazar said...

Ian, you make valid points. Its also important to note that there are mechanisms in place to prevent such abuse. 1) All users can flag a document and 2) Creative Commons Licensing - on docstoc when you select the most restrictive license users can only preview docs, downloads are prevented.

That said, I agree some abuse will still happen, we will be diligent about it fixing it, and I believe it will be outweighed by the benefit of helping people easily find content that is meaningful for them.

Ian said...

Thanks for the comment Jason. And for leaving your name so I can take your comment into context (Jason is docstoc's CEO for those scratching their heads).

I don't really see how flagging a document or only allowing previews gets around the problem I'm talking about. Fact is, the document's on there and viewable even though a download of it may not be allowed.

It's great to hear that docstoc's going to be diligent about abuse. But while it's relatively easy to pick up documents that infringe copyrights (e.g. someone uploads a digital version of a book that is not free), it's not quite so easy for you to figure out if a corporate document is allowed to be on there. The biggest problem you're going to have is that people who put documents up with the best of intentions aren't typically aware that certain documents shouldn't be made public in the first place. In these cases, you wouldn't know either...and neither would the organisations that own these documents. That's my point. No one is abusing anything nor can you do much about it because it's just a difficult problem to solve once information gets on there. It's simply plain ignorance of security policies on the user's part.

Of course, it's really the organisation's own fault for not educating their employees about their data security policies and not putting the preventative measures and controls in place. That's my main point.

I agree that sites like docstoc, Scribd and any other competitors of yours that may come along can become extremely useful for the purposes you designed it for. If I need a document for something, hopefully I can find it on there and it's going to save me a heck of a lot of work because I won't have to re-invent the wheel.

I'm just pointing out to organisations that the best intentions of their employees trying to be model citizens of the world can lead to sensitive or proprietary information leaving the walls of organisations. And that is what companies should be vigilant of rather than just flat out blocking everything.

Manoj Ranaweera said...

Ian, thanks for the coverage of a much discussed topic lately. has similar functionality but our aim is for corporate public-facing documents. However, the service is open to anyone and we will also have our fair share of copyright breaches to deal with in the future.

We are currently in Alpha - since last month. All of us will offer similar functionality in one sense or another, but application of this functionality makes many of us different. I accept there will always be some overlap.

It would be great if you also open an account at edocr and be part of our journey. Any feedback is greatly appreciated.

Happy to respond to any queries you might have.

Many thanks and best regards, Manoj

PS: wrt to names, how about yuuguu? You might also wants to look at wrt breaches of copyrights, etc.

Jason Nazar said...


I agree with you. That may be an issue, especially on our site since we're focused on the professional market.

It will be interesting to see how it plays out, we'll certainly be keeping an eye on it and open to any good recommendations.