News Stay informed about the latest enterprise technology news and product updates.

# Searching tables for reference data

## Robert Crawford explains some methods he's used to search a table for reference data.

This column originally appeared on TechTarget's Expert Answer Center as a post in Robert Crawford's blog. Robert served as the on-demand expert on the Expert Answer Center for two weeks in January to February 2006, during which he was available to quickly answer questions on CICS application performance and design, as well as to write daily blog entries. Keep an eye on the Expert Answer Center for topics that could help your IT shop.

There seem to be a few things that all programmers must do. One of those is search a table for some reference data. There are lots of ways to go through tables, some of which are more efficient than others. In this blog entry I'll talk about some of the methods I've used.

First, of course, there's the sequential search from the top to the bottom. This works well enough for small tables or every-once-in-a-while searches. About the only improvement you can make to this scheme is to sort the table entries by the search data element or key. That way, your search loop ends either when you find your row or a table entry whose key is lower than the search data.

Next comes the binary search. For this to work the table entries must be laid in order in contiguous storage. It's kind of like trying to guess a number between 0 and 100 by asking, "Is it 50?" and then halving the range of numbers at each subsequent try. With this method you use an index to address the row you're interested in. As you go through the loop you must also track the upper and lower bounds of your search. At initialization, the upper bound is the last row in the table while the lower bound is the first. You add the two together and divide by two to get the index of the middle entry. If that's your entry, you're done. If your data is lower, set the higher bound to the current index. If the data is higher, you set the lower bound to the current entry. At the top of the loop you add the bounds together and try again. You know the row isn't in the table when the upper and lower bounds are the same.

The last option is probably the fastest, although it tends to use a lot of storage. It's called hashing and it involves transforming the table's key into an offset of a hash table. The address in a hash table points to entry. The number of entries in the hash table depends on all the possible outcomes from the hashing routine. However, since hashing algorithms aren't perfect, you may have to deal with a "synonym chain," which links together all the table keys that hash to the same offset. As an example, I wrote a program that hashed a four-byte transaction ID by dividing it with the number of table entries. Since I used the remainder from the division as my offset into the hash table, my hash table had as many entries as the main table. Another method uses the PACK instruction to compress the data by eliminating some nibbles (byte halves). With the offset from the hashing routing, you pick up the address of the table entry from the hash table. If the entry matches, you're done. If it didn't, you have to run the synonym chain, if it exists. If you don't find the entry you were looking for at the end of the chain, you know the row wasn't in the table.

Once again, some of these algorithms are easier to implement in some languages than others. The programmer also has to be comfortable with using data as numbers and doing math with pointers. However, there are performance gains especially when performing repetitive searches or looking through long tables.

#### Start the conversation

Send me notifications when other members comment.

## SearchWindowsServer

• ### Azure PowerShell cmdlets monitor, manage VMs

PowerShell gives administrators a way to customize reports that hone in on the details that matter to the business, such as the ...

• ### Windows out-of-band patches overshadow April Patch Tuesday

Microsoft delivered out-of-band security patches to address the Total Meltdown and malware engine exploits as a precursor to its ...

• ### Microsoft Project Honolulu shows promise but needs work

Microsoft Project Honolulu is still in the technical preview stage, but it needs to resolve a number of issues before it's a ...

## SearchServerVirtualization

• ### Virtual private cloud vs. private cloud differences explained

Virtual private clouds and private clouds differ in terms of architecture, the provider and tenants, and resource delivery. ...

• ### Hybrid cloud vs. private cloud: What's the difference?

Private clouds, which are owned and operated by the business, provide security, but they have certain limitations. Hybrid clouds ...

Hybrid cloud is touted as providing the best of both private and public clouds, but this type of infrastructure comes with its ...

## SearchCloudComputing

• ### Factor performance into an application modernization strategy

Dev teams increasingly use cloud, containers and microservices to modernize apps, but these technologies bring little value ...

• ### Users demand more from containers in cloud

Technology is always advancing, but it needs to march in the direction that users want. Containers are maturing, but plenty of IT...

• ### IaaS and PaaS blurred lines increase lock-in risks

There are three distinct cloud service categories: IaaS, PaaS and SaaS. However, IaaS and PaaS are getting a little too close, ...

Close