delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1997/06/19/10:48:58

Date: Thu, 19 Jun 1997 10:48:28 -0400 (EDT)
From: "Art S. Kagel" <kagel AT ns1 DOT bloomberg DOT com>
To: Timothy Robb <trobb AT neutrino DOT phys DOT laurentian DOT ca>
Cc: djgpp AT delorie DOT com
Subject: Re: Hash Tables
In-Reply-To: <Pine.SGI.3.91.970618170624.26442A-100000@neutrino.phys.laurentian.ca>
Message-Id: <Pine.D-G.3.91.970619100254.26510A-100000@dg1>
Mime-Version: 1.0

On Wed, 18 Jun 1997, Timothy Robb wrote:

> I've just recently finished designing a hash table class for strings
> and was wondering if I could get some critism on it's design and
> suggestions for improving it.
> 
> You can download it from 
> http://www.geocities.com/SiliconValley/Vista/7489/hash.tar.gz and it is
> only 6344 bytes in size.

I'll see if I can find time to look at this.

> 
> One thing I would like to improve is the hash function.
> 
> One more thing, can someone explain the idea behind a  Double Prime Hash 
> Table, http://www.cris.com/~Ttwang/tech/primehash.htm
> 

OK.  For a hash table to work the hash function must hash to one of a set
of numbers (at least) relatively prime to the size of the table.  The best,
and most common, approach is to use an absolute prime number to guarantee
relative prime-ness.

When multiple record keys hash to the same value, as can happen with any
hash function, there are two basic ways to deal with this.  One is a hash
table with a single entry, or fixed number of entries, per bucket with
overflow buckets, when a bucket is full placing the clashing key in the
first free overflow bucket (this can be a second, dedicated overflow table
with its own hash function or a "skip N buckets if full" where one keeps
skipping until a free bucket is found).  The second, which Mr. Wang uses is
to make each bucket an expandable linked list.  He points out that a hash
function poorly designed, without consideration of the size of the hash
table, can cause a situation where all, or a large percentage, of keys hash
to one or a few buckets causing long clash searches and degrading toward a
linear search.  To reduce the problem hash tables are sometimes resized
when the clash chains become too long.  Mr. Wang makes two suggestions to
relieve the problem.

First, Mr. Wang suggests that a table size which is itself prime, in part,
solves this problem.  Additionally, Mr. Wang discusses expandable hash
tables and suggests that one expand the table by allocating the second,
larger, table and only migrating items from one table to the other as they
are added, or I suppose, re-added (updated), such that for an indeterminate
period there are two tables to search.  He asserts that this reduces the
time required to resize the table as one need not rehash all existing keys
from the original table to the new table at once.  This scheme, he claims,
requires 1.5 hash searches to find a key.  The actual average number of
searches is one plus the result of the number of entries in the second
table searched divided by the total number of entries so that one could
optimize such a system by keeping track of which table currently contains
the most elements and searching it first.

I hope this is what you need.

Art S. Kagel, kagel AT bloomberg DOT com

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019