IT shops are being bombarded by mixed and incorrect messages about the legal aspects of open source software and...
the current status of grid and virtualization technologies, says Donald Becker, Beowulf Project co-founder and founder and chief scientist of San Francisco-based Scyld Software, a subsidiary of Penguin Computing.
Becker sounds off on these subjects in this excerpt from our interviews during and after the recent LinuxWorld Conference & Exposition in San Francisco.
What do you think of the controversy over Linus Torvald's attempt to enforce his trademark for the Linux name?
Becker: The controversy shows the increasing sensitivity in the open source and free software community. The SCO lawsuit really brought those things into focus for people developing open source, open standard software.
The legal scene for open source and Linux is playing out exactly the way you'd expect a natural evolution of a marketplace should.
In the past, the marketplace found a way to produce proprietary software and successfully make a business and create a commercial marketplace out of it.
The evolution now shows that with Linux you can make a business and create a commercial marketplace, too. So, now that that's been shown, there's a need to make it clear that the licenses, trademarks and copyrights, while being open, have boundaries. Naturally, these boundaries will be tested until everyone understands the nature of the new marketplace that's emerged.
In the open source community, is there great concern about open source software being usurped by proprietary vendors?
Becker: That's one of the interesting developments. If you keep software proprietary, intellectual property is something you have to defend well.
In proprietary software world, sometimes the software was used in another proprietary product and no one was the wiser because the code was hidden. There are many instances in which no one found about proprietary software for many years, and often it came out after the (offending) company was acquired.
Just because we publish our source code for open source software doesn't mean it's public domain. If software is more open, you have much clearer boundaries. There are clear things that are allowed and specified. Sure, it's easy for people to deliberately misinterpret things, figuring that if the software is published then they can take it and put it into their proprietary product, or they copy from it and not give attribution to the original author. The license clearly says that this is not acceptable.
If your code is open you have the means to be more vigilant. It's harder for the thief to hide his actions.
With all the attention on this since SCO's suit, there's a better understanding today of what open source software is.
Aren't software vendors worried that open source software will commoditize all software? Is this the way it will play out?
Becker: Open source software will be the dominant model for infrastructure software; software that everyone needs to run needs to interoperate with and needs to validate where security is involved. So, yes, infrastructure software will be a commodity software item.
The Linux distribution is a good example of this. Essentially, every Linux distribution has parts that are almost completely identical internally to other distributions. One really operates the same as another, with the exception of some added-on features. The infrastructure is common and consistent.
As for the future of software vendors, the central players will be the ones who deliver useful innovation in a layer on top of that infrastructure, who make software that users can easily and effectively use.
One great opportunity for innovation lies in application verticals, where software is customized for a relatively small marketplace. That software won't be commoditized. That's where companies can deliver real value on top of the infrastructure level.
Those companies who try to differentiate by taking the infrastructure layer and trying to change aren't going to successfully build a new marketplace.
Continuing with the subject of confusing messages, are IT shops getting the real picture of the capabilities of grid computing?
Becker: There's a lot of confusion out there. People often can't distinguish between tools, such as schedulers, that are for clusters and those for grids. The big issue is the confusion between what's a grid and what's a cluster.
So, what is the difference between grids and clusters?
Becker: Grid computing is where you have separate administrative domains over a wide area, machines used by different people trying to cooperate.
Clustering is where you have a set of machines in one place, and you're trying to get them look like a single machine.
There are different aspects from the user's standpoint when they run their jobs, from the administration side when you do things like update the software.
Where does utility computing fit into the scene?
Becker: Grid and clustering are two different ends of the spectrum. In between is utility computing, generally that's a pay-per-hour computing service that exists somewhere in between the grid – although its often called a grid – and custom clusters. The service handles the administrative effort to make sure the environments are consistent. The user just plugs in and uses the providers' computing system.
When do you use one model and not the other?
Becker: If all the machines are in one place on a single network, it's much better to treat your computing resources as a cluster. There's much more predictable performance and much easier administration, and it's also easier to write and update applications there.
So, where does the concept of virtualization fit?
Becker: In the grid approach, the virtualization layer has an API (application program interface) that the programmer writes to. That can be different with different grids.
In a cluster, usually some aspects are virtualized to have single unified processes across all the machines. Although some of the application files may be different on different machines, there's only a single point of administration where you update libraries and executables.
Virtualizing some aspects of a cluster is easy to do. With a grid, there are often different operating systems running under different administrative domains, so it can be difficult to have a clean virtualization model.
Virtualization, right now, only seems to play well in a cluster arena and perhaps in the utility computing space.
That's right now, but what about the future of virtualization on grids?
Becker: It's easy to imagine a day when a grid might provide an entire virtual machine for the application. That would reduce the burden of updating the environment because it would actually be running on the virtual machine.
As a clustering pioneer, it's natural that you see advantages to clustering. What's the bottom line difference between grid and clustering today?
Becker: Cluster software is, at this point, mature and well tested. Grid software is, on the other hand, very new, still in the research environment, still being worked out in the deployment environment. The companies that provide grid services are generally pre-product phase. They're still in a deployment phase that's too early for production.
I would say grid is not for commercial enterprises at this point. It works today where you have research groups, such as an academic research group, that need to share data sets.