Become.com -- a new shopping search engine -- is the creation of Michael Yang and Yeogirl Yun, a pair that knows...
a thing or two about building search engines from the ground up. Yang and Yun founded the shopping comparison site mySimon and later sold it to CNET Networks for $700 million. Yun also founded the search engine WiseNut, which he later sold to LookSmart.
Become.com, the pair's latest project, which launched in February, uses a search algorithm called Affinity Index Ranking to recognize product reviews, articles and consumer reports of specific sites.
According to Jongkeun Park, network/IT engineer for Become.com., one of the key components of a successful site launch is a solid IT infrastructure.
A startup tech company needs technologies that are highly scalable, easy to administer, based on open standards and, of course, low cost, he said. This type of flexibility can require a startup to consider several factors.
The outsourced component
Become.com launched from an outsourced data center in Mountain View, Calif., managed by New York City-based AboveNet, allowing the site to launch more quickly.
"Building our own data center facility would have meant committing massive resources and delaying launch of our search engine," Park said in an e-mail interview. "Outsourcing has allowed us to fully support the company's goal of creating the world's largest search engine for shopping in less than one year."
Yun, Become.com's chief technology officer, shopped around five or six different places before deciding AboveNet was the most effective and best solution.
According to Park, a tier 1 network and great amount of bandwidth is required to support a massive crawler-based search engine (currently visiting and indexing over 3.2 billion Web pages, half of Google's index). And in terms of cooling and power, Park quoted the AboveNet Web site for features including:
Park is the sole member of the Become.com staff that manages 100-plus servers at the data center. If Become.com has a disk failure, Park goes to the AboveNet facility to replace the hardware. According to Yun, this arrangement has worked because the systems have been quite stable, the hardware is kept to a minimum and a single disk failure does not require immediate replacement.
The hardware aspect
Become.com uses off-the-shelf Dell servers with 500 GB of hard disk attached. According to Park, based on internal testing, as well as price quotes from other vendors, Dell provided the strongest combination of reliability and price for performance.
Yun called the Dell product robust, and said there was little maintenance involved with operating them. According to Yun, the server farm may experience a hard disk failure every other week, in which case Park replaces the machine. @10461
"We're quite different from Google because we don't need as many servers," Yun said. "It's a different philosophy. Google has maybe 100,000 servers -- cheap hardware. The problem with that approach is that there is so much manpower required to maintain that. It's great that you have a low cost overall, but you need a lot of people to maintain the equipment. Overall cost is really high if you have that number of servers."
As the lone IT engineer, Park spends a surprisingly small amount of time managing the data center hardware. He was initially spending over 90% of his time installing and managing the server farm. But today, he spends only 30% of his time managing servers and the network. The rest of his time is split between developing and working with systems management tools (20%), internal IT -- e-mail, PCs, telephone system -- purchasing ( 20%), internal support (20%) and researching new technologies and products (10%).
Open source crawler
The other component of Yun's startup plan is open source. Become.com runs SuSE and White Box enterprise Linux platforms, and is experimenting with other versions. Yun is also playing around with Solaris 10.
"Linux provides a cost-effective and enterprise-ready OS. We believe strongly in the benefits of open source, and Linux has been a great choice for us," Park said.
Park also said the company hadn't taken a serious interest in any new computing structures, such as virtualization or grid computing.
"We had very aggressive time frames and development goals," Park said. "We also had very high stability requirements. I took an approach that met our needs, minimized risk and ensured cost containment. I have not, however, ruled out exploring these technologies in the future."
Yun hopes to increase the servers to handle 5 billion pages six months from now, nearly double the current amount. When that happens, 60 servers will arrive in the data center, and five or six engineers will come to unpack and put the servers in the rack. But after that, Yun said a lot of the process is automated.
"We only have 13 engineers, but every single one is like a super-engineer," Yun said. "Our one IT engineer [Park] handles all of the servers. It's unbelievable the amount of talent each individual has. That's why we're able with only 24 people in this company to crawl billions of Web pages and put together a really strong search engine."
Yun has been in the search industry for 10 years and knows that it's very important to reduce costs. He knows the amount of CPUs it takes to handle 500,000 queries versus 1 million, and how to plan in terms of management. And it's a big part of the reason he's vying to hit the dot-com lottery for a third time.
Let us know what you think about the story; e-mail: Matt Stansberry, News Editor