News Stay informed about the latest enterprise technology news and product updates.

# Searching tables for reference data

## Robert Crawford explains some methods he's used to search a table for reference data.

This column originally appeared on TechTarget's Expert Answer Center as a post in Robert Crawford's blog. Robert served as the on-demand expert on the Expert Answer Center for two weeks in January to February 2006, during which he was available to quickly answer questions on CICS application performance and design, as well as to write daily blog entries. Keep an eye on the Expert Answer Center for topics that could help your IT shop.

There seem to be a few things that all programmers must do. One of those is search a table for some reference data. There are lots of ways to go through tables, some of which are more efficient than others. In this blog entry I'll talk about some of the methods I've used.

First, of course, there's the sequential search from the top to the bottom. This works well enough for small tables or every-once-in-a-while searches. About the only improvement you can make to this scheme is to sort the table entries by the search data element or key. That way, your search loop ends either when you find your row or a table entry whose key is lower than the search data.

Next comes the binary search. For this to work the table entries must be laid in order in contiguous storage. It's kind of like trying to guess a number between 0 and 100 by asking, "Is it 50?" and then halving the range of numbers at each subsequent try. With this method you use an index to address the row you're interested in. As you go through the loop you must also track the upper and lower bounds of your search. At initialization, the upper bound is the last row in the table while the lower bound is the first. You add the two together and divide by two to get the index of the middle entry. If that's your entry, you're done. If your data is lower, set the higher bound to the current index. If the data is higher, you set the lower bound to the current entry. At the top of the loop you add the bounds together and try again. You know the row isn't in the table when the upper and lower bounds are the same.

The last option is probably the fastest, although it tends to use a lot of storage. It's called hashing and it involves transforming the table's key into an offset of a hash table. The address in a hash table points to entry. The number of entries in the hash table depends on all the possible outcomes from the hashing routine. However, since hashing algorithms aren't perfect, you may have to deal with a "synonym chain," which links together all the table keys that hash to the same offset. As an example, I wrote a program that hashed a four-byte transaction ID by dividing it with the number of table entries. Since I used the remainder from the division as my offset into the hash table, my hash table had as many entries as the main table. Another method uses the PACK instruction to compress the data by eliminating some nibbles (byte halves). With the offset from the hashing routing, you pick up the address of the table entry from the hash table. If the entry matches, you're done. If it didn't, you have to run the synonym chain, if it exists. If you don't find the entry you were looking for at the end of the chain, you know the row wasn't in the table.

Once again, some of these algorithms are easier to implement in some languages than others. The programmer also has to be comfortable with using data as numbers and doing math with pointers. However, there are performance gains especially when performing repetitive searches or looking through long tables.

## Content

Find more PRO+ content and other member only offers, here.

#### Start the conversation

Send me notifications when other members comment.

## SearchWindowsServer

• ### Server Core management remains a challenge for some

Server Core, the minimal Windows Server deployment, removes some admins from their GUI comfort zone, but its benefits reduce some...

• ### Meltdown and Spectre vulnerabilities dominate January Patch Tuesday

Complications surrounding the fix for the Meltdown and Spectre microprocessor architecture flaws will make the patching process ...

• ### Windows Server hardening still weighs heavily on admins

Windows Server hardening procedures drew renewed interest following the rash of ransomware outbreaks this year. See what tips on ...

## SearchServerVirtualization

• ### Copy files from the host to a Hyper-V machine

There are many ways to copy files from the host to a Hyper-V machine, including Enhanced Session Mode and the Copy-VMFile ...

• ### VMware VVOLs adoption now poised to grow after slow start

VVOLs' low adoption rates lie in overhyped expectations and the risk-averse nature of enterprise IT. The problem it solves is ...

• ### Migrate cloud applications back to on-premises applications

In the cloud frenzy, it's easy to lose sight of what to do if the cloud doesn't work. With time, patience and cost planning, ...

## SearchCloudComputing

• ### Google Cloud Dedicated Interconnect offers VPN alternative

Google's Dedicated Interconnect enables an enterprise to privately connect its data center to the public cloud. Here's a ...

• ### Chip bugs hit cloud computing usage less than first feared

IT shops expected their cloud usage to flag due to recent chip bugs, but most environments survived the patches unscathed.

• ### Providers continue to push hybrid cloud technologies in 2018

The hybrid cloud market changes rapidly, as major cloud providers release new services to bridge private and public platforms, ...

Close