Home > Data Center News > Become.com: Search engine data center fast track
Data Center News:
EMAIL THIS

Become.com: Search engine data center fast track

By Matt Stansberry, News Editor
02 May 2005 | SearchDataCenter.com

Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   

Become.com -- a new shopping search engine -- is the creation of Michael Yang and Yeogirl Yun, a pair that knows a thing or two about building search engines from the ground up. Yang and Yun founded the shopping comparison site mySimon and later sold it to CNET Networks for $700 million. Yun also founded the search engine WiseNut, which he later sold to LookSmart.

Become.com, the pair's latest project, which launched in February, uses a search algorithm called Affinity Index Ranking to recognize product reviews, articles and consumer reports of specific sites.
For more information:

The five phases of IT Value Management

The fad-free path to management success

According to Jongkeun Park, network/IT engineer for Become.com., one of the key components of a successful site launch is a solid IT infrastructure.

A startup tech company needs technologies that are highly scalable, easy to administer, based on open standards and, of course, low cost, he said. This type of flexibility can require a startup to consider several factors.

The outsourced component

Become.com launched from an outsourced data center in Mountain View, Calif., managed by New York City-based AboveNet, allowing the site to launch more quickly.

"Building our own data center facility would have meant committing massive resources and delaying launch of our search engine," Park said in an e-mail interview. "Outsourcing has allowed us to fully support the company's goal of creating the world's largest search engine for shopping in less than one year."

Yun, Become.com's chief technology officer, shopped around five or six different places before deciding AboveNet was the most effective and best solution.

According to Park, a tier 1 network and great amount of bandwidth is required to support a massive crawler-based search engine (currently visiting and indexing over 3.2 billion Web pages, half of Google's index). And in terms of cooling and power, Park quoted the AboveNet Web site for features including:

  • Mechanical systems with multiple levels of redundancy.
  • Cooling systems that ensure ambient temperatures do not affect computing power.
  • Advanced continuous power supply and distribution systems that protect against commercial power grid fluctuations and service interruptions.
  • Very early smoke detection alarm (VESDA) that constantly samples the air for dangerous particles.
  • Biometric authentication and round-the-clock surveillance.

    Park is the sole member of the Become.com staff that manages 100-plus servers at the data center. If Become.com has a disk failure, Park goes to the AboveNet facility to replace the hardware. According to Yun, this arrangement has worked because the systems have been quite stable, the hardware is kept to a minimum and a single disk failure does not require immediate replacement.

    The hardware aspect

    Become.com uses off-the-shelf Dell servers with 500 GB of hard disk attached. According to Park, based on internal testing, as well as price quotes from other vendors, Dell provided the strongest combination of reliability and price for performance.

    Yun called the Dell product robust, and said there was little maintenance involved with operating them. According to Yun, the server farm may experience a hard disk failure every other week, in which case Park replaces the machine.
    For more information:

    The five phases of IT Value Management

    The fad-free path to management success

    "We're quite different from Google because we don't need as many servers," Yun said. "It's a different philosophy. Google has maybe 100,000 servers -- cheap hardware. The problem with that approach is that there is so much manpower required to maintain that. It's great that you have a low cost overall, but you need a lot of people to maintain the equipment. Overall cost is really high if you have that number of servers."

    As the lone IT engineer, Park spends a surprisingly small amount of time managing the data center hardware. He was initially spending over 90% of his time installing and managing the server farm. But today, he spends only 30% of his time managing servers and the network. The rest of his time is split between developing and working with systems management tools (20%), internal IT -- e-mail, PCs, telephone system -- purchasing ( 20%), internal support (20%) and researching new technologies and products (10%).

    Open source crawler

    The other component of Yun's startup plan is open source. Become.com runs SuSE and White Box enterprise Linux platforms, and is experimenting with other versions. Yun is also playing around with Solaris 10.

    "Linux provides a cost-effective and enterprise-ready OS. We believe strongly in the benefits of open source, and Linux has been a great choice for us," Park said.

    Park also said the company hadn't taken a serious interest in any new computing structures, such as virtualization or grid computing.

    "We had very aggressive time frames and development goals," Park said. "We also had very high stability requirements. I took an approach that met our needs, minimized risk and ensured cost containment. I have not, however, ruled out exploring these technologies in the future."

    Expansion

    Yun hopes to increase the servers to handle 5 billion pages six months from now, nearly double the current amount. When that happens, 60 servers will arrive in the data center, and five or six engineers will come to unpack and put the servers in the rack. But after that, Yun said a lot of the process is automated.

    "We only have 13 engineers, but every single one is like a super-engineer," Yun said. "Our one IT engineer [Park] handles all of the servers. It's unbelievable the amount of talent each individual has. That's why we're able with only 24 people in this company to crawl billions of Web pages and put together a really strong search engine."

    Yun has been in the search industry for 10 years and knows that it's very important to reduce costs. He knows the amount of CPUs it takes to handle 500,000 queries versus 1 million, and how to plan in terms of management. And it's a big part of the reason he's vying to hit the dot-com lottery for a third time.

    Let us know what you think about the story; e-mail: Matt Stansberry, News Editor



    Tags: Managing data center outsourcing services and vendorsCapacity planningData center LinuxVIEW ALL TAGS

    Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   


    RELATED CONTENT
    Managing data center outsourcing services and vendors
    Data centers deal with the fallout of mergers and acquisitions
    Gartner dispels concerns surrounding cloud computing services in India
    FBI raids Dallas data center colocation company
    Texas Memory Systems increases solid-state disk capacity: News in brief
    CRG West preps data centers for cloud computing customers
    Data center panel weighs cloud computing risks, rewards
    Disaster recovery strategies: Should you outsource, manage in-house or partner?
    Cloud computing versus colocation: What's the right fit?
    How to prepare for remote data center maintenance trips
    Business continuity planning consultants: Are they worth the money?

    Capacity planning
    Five ways to consolidate servers and extend the life of your data center
    Indemnification, support woes plague open source systems management
    Configuration management skills are vital to going green in the data center
    Capacity planning tools tutorial for Linux and Unix
    Virtualization brings automated server provisioning into reality
    BMC tool optimizes zIIP and zAAP use by IBM mainframe
    IBM tool improves mainframe capacity planning
    Cheap commodity servers can turn into expensive investments
    Capacity planning for virtual servers: New risks, new tools
    Time is ripe for data center infrastructure databases

    Linux servers
    Sun's McNealy touts open source, bashes Oracle and IBM
    Microsoft and Red Hat to cross-certify OS, virtualization platforms
    Choosing the best server OS: Linux vs. Windows comparisons
    Windows Server 2008: What's in it for users?
    Novell's certifications remain intact with virtualization partnership
    Mainframe Linux use growing
    Can you run Linux on bare metal on the mainframe?
    A Linux cluster primer, part two
    A Linux cluster primer, part one
    Adding Linux to your data center

    RELATED GLOSSARY TERMS
    Terms from Whatis.com − the technology online dictionary
    indemnification  (SearchDataCenter.com)
    on-demand computing  (SearchDataCenter.com)
    TCO  (SearchDataCenter.com)
    Teraplex  (SearchDataCenter.com)
    utility computing  (SearchDataCenter.com)

    RELATED RESOURCES
    2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
    Search Bitpipe.com for the latest white papers and business webcasts
    Whatis.com, the online computer dictionary



  • Efficient Management for Data Centers
    HomeNewsTopicsITKnowledge ExchangeTipsBlogsMultimediaWhite PapersEvents
    About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
    SEARCH 
    TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

    TechTarget Corporate Web Site  |  Media Kits  |  Site Map




    All Rights Reserved, Copyright 2005 - 2009, TechTarget | Read our Privacy Policy
      TechTarget - The IT Media ROI Experts