2007-02-08

Peer-to-peer Collaboration Tools

I gave this presentation at an internal company conference last week. Large corporations suffer from different collaboration issues than the open source world, but we also have much in common. My hope is to show folks at my company some tricks that we can learn from the open source world. Open source development is almost certain to be globally distributed and using on-line tools for almost all of the communication.

[Introduction]

It is rare that I get to speak about a topic for which I have such a great interest and I know will have such a great impact. The scale of the impact is on par with the emergence of e-mail as a technology. In fact, I don’t believe it will be very long before peer-to-peer-supported blog-like technology replaces e-mail as the primary communication mechanism over the Internet and corporate networks. Perhaps 3-5 years.

The title of my presentation is "overview of peer-to-peer collaboration technologies" and is sub-titled "managing communications with a global workforce". I chose this subtitle to emphasize that these technologies will help you address some of the many challenges of working in global teams. Communication is a fundamental problem of business that is complicated by globalization. I hope to show you here what is happening now to solve the communication requirements.

Purpose of this presentation
The purpose of this presentation is to illustrate that the emergence of peer-to-peer-based collaboration software is inevitable, show why [our company] should embrace peer-to-peer software architectures, and give an overview of some current peer-to-peer, metadata, and collaboration technologies. Some peer-to-peer collaboration architectures, such as Microsoft Office Groove 2007, will quickly and significantly alter the client-server architecture used for most existing content management systems, such as SharePoint, TWiki, and [our proprietary systems].

These peer-to-peer tools will allow the elimination of many IT dependencies. IT will obviously have significant roles to play, but those roles will change. With the emergence of these tools, team productivity can start almost immediately. Additional productivity and stability can then follow with more formalized IT involvement. For example, someone needs to create bridges between all the products and protocols. The various tools need to be combined into a single product with a consistent look and feel. The costs of deploying a collaboration solution can be optimized. Additional points of access and different accessibility features can be provided. My suggestion is that checkpoints requiring resource allocation, however, can be eliminated until there is a business benefit for IT involvement.

This isn't as much a choice I'm suggesting we make as it is the recognition that existing forces will drive acceptance of peer-to-peer related technology. On-the-go collaboration is a critical requirement of global business communication tools. Information locked into what can be called "walled gardens" or server silos cannot benefit all of the necessary people at the necessary times. This is certainly true in a global workforce of partnerships, third-parties, outsourcing, and ODMs. While business communications regulations, such as Sarbanes-Oxley and other E-Discovery requirements, will make the process more complex; they make it no less inevitable. The required technologies will survive by natural selection.

Tasks, e-mails, notes, and calendar items are mostly the same: they are a bit of collateral communication along with some specific metadata to allow the tool to know how to advise the user. There should be a tight association between the collateral, or content, or microcontent, and the metadata. These digital communications must be preserved in context. New metadata types, such as test case and product requirement associations, must also be supported. Making special tools to handle these bits of metadata is a formula for disaster.

The madness of creating new centralized walled gardens of highly specialized data repositories must be replaced with a vision of building new interoperable data definitions, widely accessible visualization tools, and improvements to existing communications infrastructure.

In this presentation, I will cover:

  • how I got interested in this problem,
  • why we need collaboration tools,
  • why collaboration will go peer-to-peer,
  • what a peer-to-peer collaboration tool might look like,
  • why should [our company] continue to evaluate peer-to-peer collaboration technologies, and
  • how you can get involved.

How did I get interested in this problem?
[Our company] entered the portable media player market in 1999. This market largely grew out of the popular usage of peer-to-peer file sharing applications, such as the old Napster. These peer-to-peer file sharing applications provided a source for content that established distributors weren't yet willing to provide. Learning about the architecture of these systems convinced me that they were useful for far more productive arenas than the sharing of pirated music.

The most critical aspect of making both peer-to-peer file sharing networks and portable media players work for people is the management of metadata. Metadata is the information associated to the content, such as the artist, title, genre, format, or rating. With all of the possible content available for download and playback, user satisfaction is driven by the ability to find desirable content quickly and easily using metadata. Apple's combination of the iPod scroll-wheel and the iTunes Music Store, along with a bit of slick advertising, gave people answers to where to get to music they wanted quickly. It led them to great success, without necessarily relying on peer-to-peer file swapping. Still, file swapping networks are still the quickest and easiest way to get to some content and they remain in wide use.

Failed attempts to defeat the networks convinced me that the technology will continue to be widely adopted, despite organized attempts to the contrary. The file swapping networks that are still in existence survived because they use peer-to-peer architectures. In a peer-to-peer application, there isn't a central server that can be shutdown to disable the network. Peers directly share metadata with each other, providing a path for sharing content. To build a peer-to-peer network, you only need peers speaking the same protocols and a willingness to participate.

The thing to remember here is that intelligent managing of metadata is how content is found, no matter what the platform.

Another experience I had was when this team went global. My job quickly shifted from a focus on media player technology to a focus on information management. I spent my time worrying about version control, status reports, bug tracking, requirements management, and portfolio management. The communication was less-than-efficient and the results were less-than-desired. This had nothing to do with the talent of the team, but had much to do with the communication mechanisms and processes that were in place. An 11 hour time-zone difference is literally one world away.

I saw that my experience in peer-to-peer technologies and metadata management could be applied to solve my new information management problems. Just like getting to media content quickly and easily helps media consumers, the product developers need quick and easy access to information.

Why do we need collaboration tools?
Communication in a global team cannot be handled entirely face-to-face. No matter how many worldwide face-to-face meetings you hold, when you break up and get back to work, there is always something left unsaid. Having frequent teleconferences helps close communication gaps, but they impact the work-flow of team members and never provide the depth of conversation required to establish a cohesive team.

While there is no substitute for strong individual communication skills, collaboration tools provide a path for competing with otherwise more convenient information sources and distractions. On-line tools provide opportunities for collaboration between people where distance, time, cultural, language, experience, and ability barriers exist.

You can see this in the success of open source projects like FireFox, which is a web browser that competes with Microsoft's Internet Explorer. Contributions to open source projects come from people all over the world in many different situations. To make open source projects work, the participants make extensive use of on-line collaboration tools.

Also note, the communication in these tools isn't one-way. The messages being communicated aren't dictates from a single expert telling everyone how to solve the problems of the whole. Requirements and solutions come from every member of the team and beyond.

Why will collaboration go peer-to-peer?
First, what do I mean by peer-to-peer? Back in 2001, Daniel Bricklin gave a speech at the O'Reilly P2P Conference on "The Cornucopia of the Commons". That speech was later printed in a great
O'Reilly book. He quotes a 1968 essay on "The Tragedy of the Commons" summarizing a commonly expressed problem:

Therein is the tragedy. Each man is locked into a system that compels him to increase his herd without limit--in a world that is limited. Ruin is the destination toward which all men rush, each pursuing his own best interest in a society that believes in the freedom of the commons. Freedom in a commons brings ruin to all.

That isn't such a pleasant or welcome idea, but we can all see some degree of reality to that view.

Dan goes on to offer another view in the light of successful peer-to-peer architectures and their failings:

In the case of certain ingeniously planned services, we find a contrasting cornucopia of the commons: use brings overflowing abundance. Peer-to-peer architectures and technologies may have their benefits, but I think the historical lesson is clear: concentrate on what you can get from users, and use whatever protocol can maximize their voluntary contributions. That seems to be where the greatest promise lies for the new kinds of collaborative environments.

That is the thinking that launched the Web 2.0 explosion with businesses like Flickr, MySpace, and YouTube. Similarly, when I'm referring to peer-to-peer technologies, I'm talking about creating protocols that maximize the contributions from the edges. Those contributions generate a wealth of information and content highly valued by users.

The elimination of a required centralized server is one common technology applied in peer-to-peer architectures. The benefits that approach provides to IT may be a bit counter-intuitive. I've done a small amount of examination of four benefit areas to collaboration tools by elimination of a required centralized server: security, reliability, interactivity, and efficiency.

The elimination of a centralized server is sometimes required for security purposes. We occasionally enter contracts with other corporations that give very explicit rules about who can have certain data on their hard drives. A centralized server would violate that policy, whereas it may still be acceptable to share the data with specific peers. Ultimately, security is best provided if information managers control the access to data, rather than IT.

There is also the occasional need to utilize modern tools that aren't yet available on our internal servers. Some of those tools are available on external servers, but it isn't secure to place [our company's] private data on those servers. A tool that does not rely on a centralized server could be more easily deployed. This doesn't eliminate the need for security audits on the tool, but it does eliminate the overhead of evaluating the impact of that tool on other tools running on a common server.

Despite on-going improvements to server reliability, there doesn't ever seem to be an end to the reasons why a server must occasionally be shutdown or relocated. With an architecture that doesn't rely on a single centralized server, the reliability is certain to be increased. An application network that requires a high degree of reliability should still make use of high-reliability servers, but it makes sense to architect those networks to not rely on them exclusively when possible.

Efficiency benefits from eliminating requirements on a central server come from the ability to rapidly deploy a new peer-to-peer network. A business could deploy a new network right away to gain the benefits of collaboration without waiting for a high-reliability server to come on-line. Such a server could then later be deployed to increase the reliability of the network without creating downtime. Further, an information manager within the business can be directly responsible for adding and removing users, machines, and services on the network.

Efficiency benefits also come from more general aspects of peer-to-peer architecture. By allowing users to organize their own data, they can access it more efficiently. In most cases, there isn't one ideal solution to data organization. A peer-to-peer architecture allows everyone to organize their own data and allow others to benefit from that organization.

By interactivity, I mean both the possibility of working with other tools and the way the tool works for its users. Interoperability and connectivity to other tools doesn't necessarily require the elimination of a central server, but tools that don't attempt to create a centralized repository for information are more likely to support the standards required for interoperability. By creating protocols for use in a peer-to-peer network, some of the requirements for interacting with other tools are necessarily met.

One particular interactivity benefit of eliminating requirements on a central server is the possibility of providing "on-the-go" or disconnected collaboration. You experience this today with Outlook when you are on a plane. You can read your e-mail that is cached locally, create your responses off-line, and synchronize your mail once you are connected to the network again.

E-mail must be the most common on-line tool for collaboration. It is simple, universal, "on-the-go", search-able, you know who is on both ends, you get notification, and for the most part it just works.

Being such an effective collaboration tool, e-mail has many peer-to-peer architecture characteristics. Foremost, it is distributed and resilient. It relies on DNS, the domain name service that is used to look-up the address of servers on the Internet. DNS is distributed and resilient, allowing for server failures at many points in the system. Further, SMTP, the protocol used to forward e-mail messages, can be run on just about any computer. You could almost decide to run SMTP on your own desktop machine, but you'd quickly find that getting the DNS record to point to your machine is a bit of a hassle when you don't leave your machine running all of the time. This doesn't matter that much, though, since [our company] provides you with a high-reliability server to collect your e-mail and clients that cache the e-mail for off-line use.

So, what is wrong with e-mail as a collaboration tool? First of all, the information is trapped into little personal "silos" that no one else on your team can search or access. Certainly you don't want to share all of the information provided to you by e-mail, but some of it you do. Some of it you just want other people to know you have, but not necessarily give it to them without your approval. You can forward e-mail to individuals or mailing lists. You can archive e-mail sent to mailing lists on a website. You can even create shared mailboxes, though those are a bit complex for many people to handle. In the end, you are left with a large number of small silos of data that can't be organized as part of a larger body of information.

E-mail is not secure. Sure, there are tools for encrypting e-mail, but they are practically only ever used on the most sensitive data. E-mail encryption tools are simply too difficult to use and you cannot yet create encrypted messages for the vast majority of your e-mail address book and expect the recipients to be able to perform the decryption.

Monitoring e-mail for sensitive data is virtually impossible. E-mail is sent from all levels of the organization all across the world, without any approval of information managers. When inappropriate e-mail is detected, there is no realistic mechanism for confirming retraction of that data from recipients. Outlook has a "recall" feature, but it often fails and cannot be confirmed outside of our organization. E-mail must be the most important and most dangerous of all the collaboration tools available today.

Perhaps the worst aspect of e-mail is the lack of efficiency. Information coming in e-mails cannot be easily categorized. Creation of e-mail filters often makes the situation worse by creating yet more silos of data. Fields that would allow for some categorization, such as priority, action required flags, and deadlines, are typically never used and are often misused. Misuse is sometimes the result of spam, which is a problem that cannot easily be avoided. By one study, 94% of e-mail last month was spam.

What about TWiki or SharePoint? These tools are often called "content management systems". They can be quite effective in improving communication within a global team, but they have their own issues. Perhaps the best summary comes from a success story taken from the TWiki website from a company that deployed TWiki to improve support to field engineers:

People in the field were used to using email for communicating with the factory. Email is a one to one communication, a mailing list a one to many. The problem with email is that useful information does not reach everybody, email is not easy to search and email gets lost over time. Collaborating the Wiki way solves these problems, however changing habits is a difficult issue that needed to be coached.

Initially we also had a chicken-and-egg problem, i.e. voices like "why should I use this collaboration tool, the content is so limited". The solution was to assign a support engineer who monitored the mailing lists and entered useful information into TWiki.

Successful deployment took over 6 month[s], [which was] longer [than] expected. But now everybody is used to browse, search, collaborate and document the Wiki way.

The result was that customer satisfaction with the field support improved. The effort was a real success.

This sort of dedicated information management may not be something we can easily commit in our environment. If deployment took 6 months, how are we supposed to keep up with frequent release cycles? How do we convince managers to commit resources to TWiki when current searches for me today often return my own weekly reports, rather than something of valuable interest? With so little attention, even the top-level structure of the TWiki sites today can't even keep up with our organizational structure. I believe in TWiki, but we can only overcome this "chicken-and-egg" problem in each team by strongly evangelizing its use during the painful learning stages.

SharePoint has similar problems, but is a bit of a different beast. SharePoint is particularly well-suited for collaborating on Microsoft Office documents. It uses a standard protocol called WebDAV that allows for folder views in Windows Explorer. Most importantly, [our company] supports a mechanism for accessing SharePoint sites to customers and partners from outside the firewall.

The biggest problem with SharePoint, beyond the problems it shares with TWiki, is its complexity. The user permissions tables are extremely convoluted. Editing the content of any one page requires extensive knowledge of the overall system, rather than simply clicking an "edit" button and changing some text. I’m not saying that TWiki markup is trivial, but it doesn’t require learning specialized tools or extensive training. The help system is built right into the tool. Also, the SharePoint version control system is somewhat less than reliable because it allows overwriting and elimination of old document revisions.

We are still learning about how best to use TWiki and SharePoint on our projects and the best standard practices are not obvious with either tool. Neither provides great search solutions for the data you need on their own, especially if the data is in mixed formats. Instead of providing a complete knowledge picture, the combined usage of e-mail, TWiki, and SharePoint creates islands of information that must each be explored separately.

[Description of an internal collaboration tool]

Efforts like this should continue, but it is best to break up the platform into interoperable components. Consider interaction with a tool such as Microsoft Live Clipboard. Live Clipboard is a mechanism for users to initiate sharing of data between websites without requiring development of web services scripts or other complicated programming. By supporting such a feature in all of our collaboration tools, the islands of information can be bridged.

The same guy at Microsoft who dreamed up Live Clipboard, Ray Ozzie, has also brought us one of the more compelling peer-to-peer collaboration tools already available, Microsoft Office Groove 2007. Recently we made use of Groove on one of our projects. The project spanned two partner companies and two contractors. Some folks on the team were able to start collaborating on the very first day by simply installing the tool. It took a few more days for others to overcome some minor installation headaches that were likely related to the tool being in beta. The product will be released with the 2007 version of Microsoft Office.

Groove works across firewalls, provides account management, secures communications, provides synchronization for off-line usage, includes instant messaging with some voice capability, and has some limited integration with Office applications and SharePoint. When I talk about the integration being limited, however, it needs some emphasis. All of your important e-mail, calendar, and contact information isn't easily shared in Groove with something as simple as a single click on a category. To get that information into Groove, a user must jump through many hoops. A major concern for the team was lack of support for maintaining old versions of documents. Groove did a great job of ensuring everyone had a copy of the latest version of a document, but SharePoint was required to maintain historical copies.

Groove also lacks a client for any platform besides Windows and the client can be a bit slow because it consumes a large amount of memory at times. There is no way to see the data in Groove by simply logging into a web page. You can synchronize a SharePoint with Groove, but it is a tool separate from the other tools in Groove which all seem to act as more islands unto themselves.

Ultimately, usage of Groove suffers for many of the same reasons as the web-based content management tools. Some folks wouldn't use it regularly, instead using familiar tools such as e-mail. It never became part of the team's everyday work flow, partially because other tools were required to author rich documents and manage code. The client tool was seen as painful to start-up or leave running. Ultimately, nothing was pushing users to actively communicate using Groove.

Certainly there are some dangers with this going unchecked, primarily related to it being difficult for IT to log file exchanges. Exchanges over SSL secured websites or, to a lesser degree, encrypted e-mails offer similar challenges. Ultimately, there are always ways for employees to circumvent security and, in some cases, the risk of not progressing business is worse than the risk of compromising security.

What might a peer-to-peer collaboration tool look like?
Groove offers a good starting point for describing the peer-to-peer collaboration tools of the future, but it is not alone in its class. What I'd like now is to describe for you some of the building blocks for creating a tool like Groove and some of the building blocks that could be used to make a better tool. I won't draw you a complete picture of the ideal peer-to-peer collaboration tool, but I hope to point you in that direction.

I'll finish typing this up when I get back from vacation next week. I need to scrub and upload the pictures...

2007-02-07

Ending walled gardens with microformatted microcontent

Forgive me for stating the obvious today, but I'm surprised how little I seem to see this point repeated.

I continue to be surprised by the continual emergence of Web 2.0 companies putting up walled gardens to try to hold onto users. I think MySpace is the current prototypical example. They could use a lesson from AOL on how likely they are to be successful over the long term. When AOL offered ease-of-use, content, and community features that simply weren't available on the Internet at the time, AOL grew at enormous rates. Once the Internet was well established outside of AOL's walls, AOL quickly tumbled.

You can expect the same results with MySpace. From outside MySpace today, it is possible to link to MySpace, be linked by MySpace pages, and consume MySpace blog entries using an RSS feed. Interaction to the broader Internet is roughly limited to just those features. Once there is broader access to video/photo/blog/music hosting, friend linking, trusted commenting, shared calendars, templates/widgets, and all of the other MySpace features, people will start wondering why they put up with all of the advertising and spam on MySpace. MySpace is "free", but there are many degrees of freedom. Person A might even want to control how Person B's sites appear when Person A reads them. This freedom is established with microformatted microcontent over RSS/Atom feeds and aggregated feed readers.

Microcontent is simply each of the entries you make on your MySpace pages (or other people's pages) today, but placed into small, consistent containers. Your preferences could be stored in a microcontent article. Certainly each of your blog entries, friend relationships, and everything else you communicate on MySpace could also be placed into microcontent articles once the right microformats are defined. Those microformats provide the protocol for interoperability in whatever RSS reader you use.

Of course, the RSS readers need to be smarter than they are today. Bloglines and Google Reader are fantastic compared to what we had before they existed. They are, however, still significantly limited, providing almost no microformat or customization support at all. This will change quickly, however, and the garden walls of social networking will fall before people start trying to build walls around something totally new.

So, Web 2.0 companies, I guess you can't help but build your pretty little walled gardens while you can. If I was a VC, I wouldn't give you a dime unless you had an exit plan.