News Stay informed about the latest enterprise technology news and product updates.

Searching tables for reference data

Robert Crawford explains some methods he's used to search a table for reference data.

This column originally appeared on TechTarget's Expert Answer Center as a post in Robert Crawford's blog. Robert served as the on-demand expert on the Expert Answer Center for two weeks in January to February 2006, during which he was available to quickly answer questions on CICS application performance and design, as well as to write daily blog entries. Keep an eye on the Expert Answer Center for topics that could help your IT shop.

There seem to be a few things that all programmers must do. One of those is search a table for some reference data. There are lots of ways to go through tables, some of which are more efficient than others. In this blog entry I'll talk about some of the methods I've used.

First, of course, there's the sequential search from the top to the bottom. This works well enough for small tables or every-once-in-a-while searches. About the only improvement you can make to this scheme is to sort the table entries by the search data element or key. That way, your search loop ends either when you find your row or a table entry whose key is lower than the search data.

Next comes the binary search. For this to work the table entries must be laid in order in contiguous storage. It's kind of like trying to guess a number between 0 and 100 by asking, "Is it 50?" and then halving the range of numbers at each subsequent try. With this method you use an index to address the row you're interested in. As you go through the loop you must also track the upper and lower bounds of your search. At initialization, the upper bound is the last row in the table while the lower bound is the first. You add the two together and divide by two to get the index of the middle entry. If that's your entry, you're done. If your data is lower, set the higher bound to the current index. If the data is higher, you set the lower bound to the current entry. At the top of the loop you add the bounds together and try again. You know the row isn't in the table when the upper and lower bounds are the same.

The last option is probably the fastest, although it tends to use a lot of storage. It's called hashing and it involves transforming the table's key into an offset of a hash table. The address in a hash table points to entry. The number of entries in the hash table depends on all the possible outcomes from the hashing routine. However, since hashing algorithms aren't perfect, you may have to deal with a "synonym chain," which links together all the table keys that hash to the same offset. As an example, I wrote a program that hashed a four-byte transaction ID by dividing it with the number of table entries. Since I used the remainder from the division as my offset into the hash table, my hash table had as many entries as the main table. Another method uses the PACK instruction to compress the data by eliminating some nibbles (byte halves). With the offset from the hashing routing, you pick up the address of the table entry from the hash table. If the entry matches, you're done. If it didn't, you have to run the synonym chain, if it exists. If you don't find the entry you were looking for at the end of the chain, you know the row wasn't in the table.

Once again, some of these algorithms are easier to implement in some languages than others. The programmer also has to be comfortable with using data as numbers and doing math with pointers. However, there are performance gains especially when performing repetitive searches or looking through long tables.

Start the conversation

Send me notifications when other members comment.

SearchWindowsServer

• Battle lines over Windows Server 2008 migration drawn

Microsoft and AWS stepped up their ongoing battle promoting a variety of Windows Server 2008 migration tools in hopes of luring ...

• Build a PowerShell logging function for troubleshooting

A basic administrative skill is checking over logs to find out why something broke. Learn how to build a proper logging mechanism...

• January Patch Tuesday fixes cryptography bug found by NSA

The U.S. National Security Agency shared information with Microsoft about a significant spoofing vulnerability in Windows that ...

SearchServerVirtualization

• Explore native and third-party VM automation tools' features

VMware and Microsoft offer native tools that provide VM automation capabilities, but admins should also look at third-party ...

• Use multiple Dockerfiles for complex application configuration

Dockerfiles can help with application configurations, but there are some questions you must consider, such as how many ...

• An introduction to VMware vRealize Operations Cloud

VMware is set to release a cloud-based management software as a service in 2020, which uses predictive analytics and AI features ...

SearchCloudComputing

• 5 cloud database comparison tips to guide your data strategy

Catch up on these tips that compare the strengths, weakness and available integrations of popular public cloud database and ...

• Reduce cloud latency for remote employees and offices

Latency remains an issue for cloud users with remote facilities. See how SD-WAN and satellites can improve network performance ...

• AWS multi-account management best practices with Control Tower

With the help of AWS Control Tower, organizations who own and operate multiple cloud accounts can manage them all under one roof ...

Close