Working with an IBM pSeries running Linux and a $900,000 grant from the National Science Foundation, three major universities will build a portal for the Library of Congress, which will ultimately be the largest repository in the world for digital moving images.
This is the first centralized online catalog of film, television and digital video images culled from libraries, national archives, museums and broadcasting companies accessible to anyone via the Web. Expected to be complete in early 2004 and available to the public sometime toward the end of the year, it will work much like an Internet search engine except that it will be modified to locate moving images only.
A matter of preservation
The project, known as the Moving Images Collection (MIC), was originally commissioned for design by The Association of Moving Image Archivists (AMIA) through a grant from the National Film Preservation Board of the Library of Congress.
Until now, the only catalogs available of films and broadcast images have been from individual private collections or museums, said Jim DeRoest, assistant director of computing and communications at the University of Washington in Seattle, one of the developers of the project. DeRoest said making one single source for all these images has been a goal of researchers for sometime, particularly as a matter of preservation.
According to DeRoest, a search for the Apollo 13 mission for example, could come up with a number of sites to link to including broadcast footage from CNN, information from NASA archives and lectures from research institutes. Not all the hits will lead to digital footage, however. Some will direct you to places where you can view or purchase the video.
Platform of choice
One would think that a mainframe would be the server of choice for a project of this magnitude, but according to Chuck Bryan, director of Linux marketing for IBM eServer pSeries, the three universities - the University of Washington, Rutgers University Libraries and the Georgia Institute of Technology Interactive Media Technology Center - will use IBM's pSeries servers, which use IBM's Power 4 processors.
The University of Washington and Rutgers University will use IBM pSeries systems running Linux to design and develop the directory and catalog databases of digital images. Georgia Institute of Technology will use the pSeries systems to develop the Web portal where users will access the actual information on the Internet and enter in their key search terms, said Bryan.
DeRoest said the pSeries is being used because of its low cost, but more important, because of its ability to scale. The universities selected the Linux operating system, because of its openness and flexibility. The server runs pure native Linux with the Power 4, which has the ability to scale up to a 32-way processor. The pSeries can also be clustered, providing additional processing power for the project.
"There was real interest from the Library of Congress in trying to do open source as much as we could," said DeRoest. "We also had a budget to work within, and Linux gave us a low cost entry."
All three universities were already using pSeries, although they were running AIX, so the choice to stick with the pSeires was an obvious choice, said DeRoest. "We were familiar and satisfied with the hardware."
More important, the Library of Congress also uses pSeries, because the project will be handed off to them to maintain when the project is complete.
The MIC databases and Web portal will be powered by two IBM eServer p630 and two IBM eServer p610 models running SuSE Linux Enterprise SLES 8 and leveraging IBM directory server.
The eServer p630 and p610 systems will serve as the gate to the database and permits users to search and locate the moving images.
View an early version of the MIC database.