Subclock: May 2006

Saturday, May 27, 2006

The future of software.

One of the best things about my job is the opportunity it gives me to talk to interesting people building surprising applications. I have been lucky enough to do a lot of that just lately. As a result, I have become convinced that the way we build, deploy and manage applications today is wrong, and that future systems are going to look and work very differently.

I wrote about this lately in my post on mashups, but I didn't think the thing all the way down to the ground there. This post is the first of two in which I'll develop the idea more completely.

The network is the computer

John Gage at Sun Microsystems dreamt up the catchphrase "The network is the computer" long before it was true. The vision then was that the world would be covered with computing power tied together with ubiquitous communications. The internet from space [1] still shows big dark holes, but it's clear that it's only a matter of time, now. If you're reading these words, then in most of the places you go, you have easy access to broadband and cycles.

Obviously, if the network is the computer, then the software you use is going to run on the network, and not necessarily on the collection of wires and chips underneath your desk. Gage was looking a long way out, but he saw the future clearly.

The timeshare generation

It's true today that applications run on the network, and not on your personal computer. Every time you fire up your Web browser or email client, you're running a distributed application. The client software on your local machine talks to server software running remotely so that you can read the news, shop for good deals on travel and keep in touch with your family.

Important business applications are moving in this direction as well. Before we sold Sleepycat to Oracle, we used Trinet as our outsourced HR and payroll provider, and Upshot (since purchased, serially, by Siebel and Oracle) for sales force automation. These hosted apps allowed us to work from anywhere in the world, to cooperate with one another and to rely on a central service to manage day-to-day operations of information technology that would have been a lot of trouble to run ourselves.

These services, and others like them, are useful and valuable, and I am glad that they were available to us. They are not, however, very interesting. They do not make good use of the network as the computer.

Essentially, these services are exactly like timesharing systems in the 1960s and 1970s. Instead of buying and running a large and expensive computer system for yourself, you contract with a specialist who builds and operates that system for you. You have the illusion that you are the only user of the system, but in order to realize economies of scale, the specialist provider is really sharing the same computers and software with lots of other people.

Hosted apps are the same monolithic standalone software packages that we used to have to manage on our own. We get better reliability and lower cost by centralizing them and spreading the maintenance cost across many users. Fundamentally, though, we are doing the same old thing on the brave new platform.

The IC revolution

A near historic analogue to this situation is the invention and adoption of the transistor in the 1960s and 1970s. When it was first invented, the transistor was widely viewed as an excellent substitute for the vacuum tube in electronics -- it was smaller, much more reliable and vastly cheaper. Vacuum tube systems were rapidly replaced by transistor systems, and radios could suddenly fit in your shirt pocket.

The real power of transistors wasn't unlocked until the advent of digital systems [2], and especially the invention of integrated circuits (ICs) by Bob Noyce and others. ICs are not transistors doing the work of vacuum tubes better -- they are transistors doing something that vacuum tubes never could [3].

Today's hosted applications are nothing more than better vacuum tubes. They are an old idea -- timeshare computing -- copied to a new medium -- ubiquitous networked processor cycles. Hosted apps, like portable radios, are merely better. They are not different.

What will change

The next ten years in technology will see more and faster processing and networking. The change in quantity will drive qualitative change. We will begin to build applications that are different in kind from the ones we use today.

Applications of the future will not be monolithic systems centralized to simplify their management. Instead, they will be composed of small cooperating components, each specialized in a particular task, tied together on demand to perform a particular task. Pieces of the application will run in different administrative domains: IBM may get some data analysis from Microsoft in order to tune its Yahoo! ad keyword selection based on the clickstream it observes among shoppers on Dell's e-commerce site.

You can already see examples of systems like these. Mashups are a halting first step. Sun offers compute cycles for hire. Amazon is selling cheap online storage via S3. Internally, Amazon is building its core technology platform in exactly this way. Hard-core technology companies like Sun and Amazon are several standard deviations out on the high end of the curve, but over time this architecture will become commonplace. One day, ordinary non-technical consumers will not only use network computing apps like this. They will be able to program them themselves, easily tying information and analysis together to answer questions. They will not concern themselves with what work is done where.

The hard part

Software engineers have long ridden on the backs of hardware engineers. Computer programs are fast and sophisticated today mostly because the people with the soldering irons have made chips so fast and memories so big that we can be profligate when we program them. To some extent, we can follow the same strategy here. The technical trend toward ubiquitous computing is almost irresistible.

There are, however, critical problems we have to solve to make this new kind of application work.

When we reach across the boundaries of organizations effortlessly, and stitch together applications from all over the place, how can we trust the answers we get? How can IBM be certain that Microsoft got the right answers when it analyzed the Dell clickstream? Was that clickstream correct?

Just as importantly, how can we be certain these applications will run at all? Systems made of many small pieces have many places to fail. Any single component failure, or the failure of any connection among the components, can freeze the application as a whole. When we build distributed systems, even out of simple and reliable pieces, we introduce complexity. Complexity is a crushing weight that eventually guarantees failure. How can we manage that risk?

Those problems are hard ones -- too hard to explore here. I'll write more about them later.

Notes

[1] Eick's is one of several very cool maps digested by CNET. See in particular the colorizations by Bill Cheswick. Cheswick runs Lumeta, which specializes in building and rendering these maps. They don't show geography -- they show a deeper truth.

[2] Digital systems do not actually exist -- transistors are really just analog devices with very steep transfer curves. I have not mentioned it to anyone, though, because I do not want to undermine the global market for digital technology.

[3] I am not ignoring early work on tube-based computers. ICs are devices that could never have been built on vacuum tube technology.

Tuesday, May 16, 2006

Links for 16 May 2006.

Go on. Click through. You'll be a better person for it.

Wonder where that spam came from? Paste the SMTP headers in the message into this way cool maps mashup, and see for yourself.
Jon Udell's got a useful set of bookmarklets.
Google Translate is just SYSTRAN for now, but the site promises to be driven off statistical data from large bodies of translated text one day. Fast, too.
And while we're at the First Church of Mountain View: Trends lets you find out what was hot, when.
There's a fascinating post by the OpEd editor at the Omaha World Herald showing the distribution of religious sects across America.

Monday, May 15, 2006

Friday is Bike to Work day.

I will be setting a bad example, but I'm in favor nevertheless. Friday, May 20, is Bike to Work day.

Sunday, May 14, 2006

Links for 14 May 2006: Special Wee Beastie Edition.

That dang Internet just keeps making more stuff to read:

Schrödinger's cat is so last millenium. Quantum puppies have entered the light cone.
Origins of the Black Death: In defense of gerbils.
Emergent behavior: Cockroaches act democratically.
Some remarkable pictures among the winners of the National Wildlife Magazine photo contest (Google cache version is here, in case the NWM site failure is persistent).
Schneier's got a wicked squid thing going.

Saturday, May 13, 2006

All mashed up.

I must have been looking sideways just lately -- I have been busy! -- because I was surprised this week by several stories about a new idea in databases. Coté over at Redmonk posted links to several stories on his del.icio.us linkroll. Daniel Druker and Robert Rich wrote an article about it in DB2 Magazine, and Bill Snyder piled on in a story he wrote for TheStreet.com.

This new idea is called Master Data Management, and if you buy the momentum stories, it's the Next Big Thing.

I don't think that the emperor is entirely naked, here, but his mother ought not to have let him leave the house dressed that way. Master Data Management is an old idea in database systems, and all our experience so far says that it's unbelievably hard to do well.

First, though, some context.

Of all the ideas that are currently burbling around in the Web 2.0 cauldron, I personally find mashups to be the most compelling. The best mashups that I have seen so far are based on the Google Maps API. People are building sites that show the locations of their first kiss, local public libraries, sex offenders living in the area and more. The idea is to use simple, standard web-based interfaces to combine data from one site with map data from another. Once you internalize this idea, you realize that there are lots of different data sources out there that you'd like to tie together.

The old-fashioned name for this discipline among database researchers is federated databases. The idea is to take a collection of databases, created and maintained by different organizations for different purposes, and combine the information that they store in interesting ways. Much research money, and some investment capital, has been plowed into this idea, with (so far) no big bang. Those efforts have not been a complete bust, but in more than a quarter century of work, no single general-purpose technique has been discovered that works well.

The problem is that the different groups who build and maintain these databases collect and store information with different assumptions. Is my first name "Mike" or "Michael"? Are the prices you publish in euros or yen? Are dates represented in American or European format? The answers make a difference if you're combining records from different sources.

Worse, the reliability of the combined data is generally worse than the reliability of data in any single database. If my phone number is wrong in one database, and my age is wrong in another, then the combination is wrong in two particulars, not just one.

While inaccuracies like that may seem unimportant, they can matter a great deal. One of the example stories for the success of MDM is the casino that recognized a card cheat by a match on his telephone number with a different casino's employee database. Think about it: Do you know anyone that has written your telephone number down wrong? Would you want the companies you do business with to make decisions about you based on information that may be wrong, and that you can't review and correct?

It's absolutely possible to handle these issues, especially for single companies combining data that's all under their control. The established database vendors all offer products that do this, but they require careful analysis and considerable effort on installation. The information they operate on needs curation.

Mashups are much too powerful an idea to constrain to mapping apps. We'll see more, and more interesting, examples. Some will certainly tie together legacy data from a variety of sources, including relational database systems. This is an old technique, though, with a lot of practical experience highlighting problems in the field. Web 2.0 apps that use the technique but ignore the experience are going to deliver wrong answers.

Don't believe everything you read on the internet.

Sunday, May 07, 2006

The Invention of Silicon Valley.

I've just finished The Man Behind the Microchip: Robert Noyce and the Invention of Silicon Valley, by Leslie Berlin. It's a remarkable book. Living and working in Silicon Valley, you can sometimes forget that there was a time before integrated circuits and venture capitalists. Berlin does an excellent job of documenting the creation of the Valley and the emergence of an industry.

Her work on Noyce's boyhood and very early career is fascinating. He grew up in the Midwest. As a boy, he designed and built technical toys, like an aircraft powerful enough to lift him and his twelve-year-old brother when it was pulled behind a neighbor's car. Through a very good high school teacher, he learned of the Bell Labs work on transistors almost immediately after it was published. His fascination with the device led him to MIT, and into industry.

He quickly discovered an entrepreneurial streak in himself. That led him to Santa Clara, still a tiny town in the midst of apricot and orange groves. He worked for William Shockley, but strong personal differences drove him and his colleagues out, and together they formed Fairchild Semiconductor. Several years of success at Fairchild led Noyce and others to found a company called Integrated Electronics, shortened to Intel.

Besides Shockley, the book is loaded with names that industry veterans will recognize: Eugene Kleiner, Andy Grove, Gordon Moore, Arthur Rock and many others. There is the obligatory collection of Steve Jobs stories, in which the unkempt and ill-mannered teenager invites himself into a central role in the Valley and the industry. Noyce even rubbed shoulders regularly with Warren Buffett, as both were on the board of Grinnell College in Iowa. Intel is one of the very few technology investments that Buffett is on record as endorsing, but his endorsement carries an important qualification: "We were betting on the jockey, not the horse."

It's hard for an entrepreneur to read the book without identifying with Noyce -- his mix of passion and pragmatism, and even his professional failings, will be familiar to many who have started companies.

As much as it's a history of Noyce, though, the book is a history of the Valley, and the business and economic forces that shaped it. Berlin documents this wonderfully here:

In the same way that [Tandem, Atari, Genentech and others] build on the previous generation's technical advances, they also took advantage of the network of suppliers, venture capitalists, equipment vendors, specialized law and public relations firms, contract fabs, and customers that had sprung up in the past decade to support high-tech entrepreneurs in Silicon Valley. By 1983, more than 3,000 small consulting firms in Santa Clara County provided new companies with startup expertise and continuing help over the early years of operation. Many of the chip designers, glass blowers, fab houses, and die cutters that catered to Silicon Valley high-tech entrepreneurs were themselves small privately-held firms. This "supply chain," most often mentioned for its support of small companies, is itself an entrepreneurial phenomenon.

You read a great deal these days about the emergence of high tech economies in China, Eastern Europe and elsewhere. Berlin's observation here is critical: You can't create an entrepreneurial powerhouse without a substrate of small, entrepreneur-driven companies competing to provide consulting and services to the tech companies. It simply isn't possible to direct a one hundred million dollar firehose of capital into a region and drive fast economic growth and innovation. It takes time for the ecosystem to produce a diverse collection of suppliers and consumers. In the Silicon Valley, these companies evolved simultaneously with the tech companies they served.

Of course, entrepreneurs today have an important advantage over Noyce: The global Internet, built on the integrated circuit that Noyce and his colleagues invented at Intel, allows companies to reach across large distances to share work and products with others. In that sense, the Silicon Valley that Noyce invented has become a global phenomenon.

Subclock