Date: Thu, 19 Jun 1997 10:48:28 -0400 (EDT) From: "Art S. Kagel" To: Timothy Robb Cc: djgpp AT delorie DOT com Subject: Re: Hash Tables In-Reply-To: Message-Id: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Precedence: bulk On Wed, 18 Jun 1997, Timothy Robb wrote: > I've just recently finished designing a hash table class for strings > and was wondering if I could get some critism on it's design and > suggestions for improving it. > > You can download it from > http://www.geocities.com/SiliconValley/Vista/7489/hash.tar.gz and it is > only 6344 bytes in size. I'll see if I can find time to look at this. > > One thing I would like to improve is the hash function. > > One more thing, can someone explain the idea behind a Double Prime Hash > Table, http://www.cris.com/~Ttwang/tech/primehash.htm > OK. For a hash table to work the hash function must hash to one of a set of numbers (at least) relatively prime to the size of the table. The best, and most common, approach is to use an absolute prime number to guarantee relative prime-ness. When multiple record keys hash to the same value, as can happen with any hash function, there are two basic ways to deal with this. One is a hash table with a single entry, or fixed number of entries, per bucket with overflow buckets, when a bucket is full placing the clashing key in the first free overflow bucket (this can be a second, dedicated overflow table with its own hash function or a "skip N buckets if full" where one keeps skipping until a free bucket is found). The second, which Mr. Wang uses is to make each bucket an expandable linked list. He points out that a hash function poorly designed, without consideration of the size of the hash table, can cause a situation where all, or a large percentage, of keys hash to one or a few buckets causing long clash searches and degrading toward a linear search. To reduce the problem hash tables are sometimes resized when the clash chains become too long. Mr. Wang makes two suggestions to relieve the problem. First, Mr. Wang suggests that a table size which is itself prime, in part, solves this problem. Additionally, Mr. Wang discusses expandable hash tables and suggests that one expand the table by allocating the second, larger, table and only migrating items from one table to the other as they are added, or I suppose, re-added (updated), such that for an indeterminate period there are two tables to search. He asserts that this reduces the time required to resize the table as one need not rehash all existing keys from the original table to the new table at once. This scheme, he claims, requires 1.5 hash searches to find a key. The actual average number of searches is one plus the result of the number of entries in the second table searched divided by the total number of entries so that one could optimize such a system by keeping track of which table currently contains the most elements and searching it first. I hope this is what you need. Art S. Kagel, kagel AT bloomberg DOT com