News Stay informed about the latest enterprise technology news and product updates.

Sistina execs: Linux-based SANs spell success

Scalability used to be the straw that broke Linux's back. That's not the case anymore, according to Matt O'Keefe, CTO of Sistina Software Inc., in Minneapolis, Minn. "With a Linux cluster, an IT shop can design the system from the get-go so that it can handle very large amounts of data," he said. From 1990 to May 2000, O'Keefe taught and performed research in the areas of storage systems and parallel simulation software at the University of Minnesota. Unable to find a storage solution for the complex data they were gathering, he decided to create one himself using Linux clusters and storage area network (SAN) technologies. He founded Sistina, a storage infrastructure company, in 1997. In this interview, O'Keefe and Joaquin Ruiz, Sistina's marketing and product management vice president, discuss why Linux is a viable and scalable enterprise platform and describe the strengths that resulted from pairing Linux and SANs.

Why do you think that Linux is gaining acceptance as an enterprise-level platform?
Look at technologies making enterprise Linux really happen. The Linux 2.4 release is a base-level server operating system that can cover a fair number of bases in the enterprise. Also, Linux and Intel have become a great team. In Intel PC server hardware, the industry continues to offer bigger and better processors. That's why recent studies have shown that Intel server revenues are now exceeding revenues of customized Unix systems built on RISC systems.

That doesn't mean that people have to keep paying more for those bigger and better processors. There's a different part of Moore's law that's working to users' advantage: if people feel they don't need absolutely the latest, best performance machine, they can go down the price [curve] and not pay top dollar for their hardware. Linux is part of that trend of not buying more than you need. Why are Linux and SANs a good match?
The combination of SANs and Linux is the key to efficient storage resource management efficiency. If you have lots of servers, it's impossible to manage them as a general pool unless you have shared storage. If you end up with lots of isolated islands of storage, your management costs just balloon. There's a huge amount of overhead in backing up, securing, and managing all these separate islands of storage. Why are Linux and SANs a good match?
Today, IT shops can create a very robust implementation of SANs on Linux. Moving to this kind of architecture can reduce their costs and give them a better architecture to build their data center on. What challenges does Linux face in gaining greater acceptance?
The challenge right now is having more customers understand that the scalability and robustness of Linux is here. IT shops may not realize that they don't have to use the same headroom model anymore. With Linux, they can use the incremental compute model to add storage, I/O and/or compute capacity without having to shelve the architecture.

The good news is that many developers and application providers are either porting or have ported already to Linux and are using Linux as one of their main platforms. Now, all those applications have to be certified, and ISVs and system providers have to cooperate with each other; and that's happening to a large extent. What industries and/or applications are ripe for Linux clustering and SANs?
Look at Wall Street, where people are doing analytics. They'll have particular financial models that they need to compute in real time. A lot of them have already started looking at data sharing. But, in most cases, the companies on Wall Street are building large clusters without any data sharing. When they do that, the cluster isn't a general compute resource that can run more than a small number of jobs. Instead, they have clusters that are specialized to run a particular task. That's inefficient. What you really want is a general compute resource. What mistakes do IT shops make in server utilization today?
They plug servers in to solve spot problems. They don't sit back and think about re-architecting their data centers around commodity technologies that provide great price performance and can be widely used in the enterprise.

It's a mistake to wait. Linux is reliable and robust. Linux is here now. It's got what it needs to take off. You have a chance to get ahead of your competitors and ahead of the curve with the price performance you can get with these technologies. What industries and/or applications are ripe for Linux clustering and SANs?
Also, we've seen several large U.S. and non-U.S. telcos move or consider moving away from Unix-based equipment to Linux clusters with SANs behind them. This helps them streamline their network management and better utilize IT bandwidth going across their IT backbone.

Data sharing allows you to more efficiently and effectively manage a system and run a lot of different kinds of compute jobs on the same clusters. What are the challenges inherent in moving from direct-attached storage to a Linux/SAN cluster?
The challenge is going from a direct-attached model to a network-attached model where you use a block transfer mechanism like Fibre Channel. The challenges there have been interoperability, making sure that all the HBAs, the HB drivers, the Fibre Channel switch fabric and the storage all speak the same protocol, all use the same name services, all use the same level of revision of drivers, and so on.

A lot of that has been worked out over the last two years. Now, it's much easier to deploy SANs. The interoperability challenge is not behind us but, to a large extent, it's been mitigated. Do IT shops have misconceptions about what clustering can and can't do?
A lot of IT professionals, especially those in Unix shops, know clustering more from an application failover environment. That is significantly different yet complementary to Sistina's approach, which is the ability to have shared resources in a cluster. That's a slightly newer topic to the Linux crowd. They may think that clustering is difficult and expensive. That's the misconception that we have to overcome. On Linux, the expense goes way down. The difficulty goes down when you think big and, from the beginning, provide the storage infrastructure software that allows you to do data sharing. That's a significant improvement in overcoming the challenges of the complexity and expense that IT shops have perceived. What would a SAN bring to the Linux stronghold of Web serving?
In Web server clusters, the standard architecture is typically non-shared storage. So, there are a lot of issues around replicating the same data across multiple servers or, if it's dynamic data, actually getting the files as quickly as they're needed. Data sharing would be a better option. Do IT shops have misconceptions about what clustering can and can't do?
That's not to say that there is not a level of complexity and expense involved. Yes, there is. However, the long-term benefits are significant. Once the system is running, it is running. From our experience, we do not hear back from our customers' IT staff once the system has been integrated and is up and running.


Send your Linux clustering and server issues and questions to Matt O'Keefe,'s Linux server issues expert

Join your peers inside's discussion forum

Best Web Links on storage

Will Linux spell doom for Microsoft? Find out at the Enterprise Windows Decision Conference.

Dig Deeper on Linux servers

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.