1: /***************************************************************************
2: * File: HashedArrayTree.java
3: * Author: Keith Schwarz (htiek@cs.stanford.edu)
4: *
5: * An implementation of the List abstraction backed by a hashed array tree
6: * (HAT), a data structure supporting amortized O(1) lookup, append, and
7: * last-element removal. In this sense it is akin to a standard dynamic
8: * array implementation. However, a hashed array tree also has the advantage
9: * that its memory overhead is only O(sqrt(n)) rather than the typical O(n)
10: * found in dynamic arrays and their variants.
11: *
12: * Internally, the hashed array tree is implemented as an array of pointers
13: * that optionally point to an array of elements. The topmost array and each
14: * element always have the same size, which is always a power of two. In
15: * this sense, the hashed array tree is essentially a two-dimensional array
16: * of elements. However, the advantage of the hashed array tree is that the
17: * topmost array pointers are all initially null and only filled in when space
18: * is needed. This means that the maximum overhead of the structure is the
19: * size of the topmost array, plus the number of unused elements in the current
20: * block. Here is one sample HAT:
21: *
22: * [ ] [ ] [ ] [ ]
23: * | | |
24: * v v v
25: * [0] [4] [8]
26: * [1] [5] [9]
27: * [2] [6] [ ]
28: * [3] [7] [ ]
29: *
30: * Here, the topmost array of pointers has three pointers in use, each of which
31: * point to an array of the corresponding number of elements.
32: *
33: * Whenever an element is added to a hashed array tree, one of three cases
34: * must hold:
35: *
36: * 1. There is extra space at the end of the final subarray (for example, in
37: * the top picture). In that case, the element is added to that position.
38: * 2. The final subarray is full, but space for another subarray exists in the
39: * topmost array. In that case, a new array is allocated and the element
40: * is added to that array.
41: * 3. The final subarray is full and no new arrays remain open. In that case,
42: * since the topmost array has size 2^n and each array has size 2^n, there
43: * must be a total of 2^(2n) elements in the hashed array tree. We
44: * next double the size of the topmost array to 2^(n + 1), then allocate
45: * 2^(n - 1) subarrays of size 2^(n + 1) elements for a total of 2^(2n)
46: * elements' worth of space. The elements from the old HAT are then copied
47: * over, a new array is allocated for the new element, and it is added as
48: * the first element of that array.
49: *
50: * Let's now talk about the performance and memory usage of this structure.
51: * First, we note that we can perform lookups in O(1), assuming that the
52: * machine is transdichotomous (meaning that a single machine word can hold
53: * the size of any array). This can be done by breaking the input index in
54: * half, then using the first half to choose which array to look into (the
55: * "hashed" part of HAT) and the second half to choose which index to select.
56: * This trick is similar to the trick used to represent a two-dimensional
57: * array using a linear structure.
58: *
59: * Next, let's think about how much time it takes to do an append operation.
60: * Each append when space remains takes O(1) to look up the proper position
61: * in the hashed array tree for writing, but appends can be much more expensive
62: * when the HAT needs to be resized. Fortunately, this does not happen very
63: * often. Whenever the HAT doubles in size, its capacity grows from 2^(2n) to
64: * 2^(2n + 2), meaning that four times as many elements must be inserted before
65: * the next copy operation. If we define a potential function as twice the
66: * number of filled-in elements in the HAT above half capacity (i.e. the
67: * number of elements in the arrays in the second half), then we can prove an
68: * amortized O(1) time for append. Consider any series of appends. If the
69: * append does not expand the HAT, then it takes O(1) time and increases the
70: * potential by 1/2. If the append does expand the HAT, and the HAT's topmost
71: * array has size 2^n, then there must be 2^(n-1) elements in the latter half
72: * of the HAT, so the potential is 2^(2n). The time required to move each
73: * of the 2^(2n) elements is 2^(2n), and in the new HAT there are no elements
74: * in the latter half of the array. Consequently, the new potential is zero.
75: * The actual time required to perform the append is thus 2^2n + O(1), and
76: * the decrease in potential is -2^2n, so the amortized cost is O(1) as
77: * expected.
78: *
79: * Last, let's talk about the cost to do a remove. This is similar to the
80: * append case - we delete the last element of the last array, removing the
81: * array from the topmost array if it becomes empty. We also compact the
82: * HAT if it becomes too sparse by shrinking from a HAT of size 2^(2n + 2) to
83: * a HAT of size 2^(2n) if the HAT becomes one-eighth full. A similar
84: * potential method can be used to show that this operation runs in amortized
85: * O(1).
86: *
87: * Finally, let's consider the memory overhead of the HAT. For any HAT of
88: * topmost array size 8 or more, since the HAT is always at least one-eighth
89: * full, there must be at least one full array. This array has size equal to
90: * the size of the topmost array (call this k), and so if every array were to
91: * be filled in to capacity, there would be a total of k arrays of size k,
92: * for a total of k^2 elements. Of this capacity, we know that at least an
93: * eighth of them are filled in, so k^2 must be at most 8n, and so
94: * k = O(sqrt(n)). To finish the analysis, the overhead of the structure is
95: * at most the overhead of this top-level array, plus potentially k - 1
96: * unused elements in some array. This is a total of O(k) = O(sqrt(n))
97: * overhead, which is what we originally desired.
98: */
99: import java.util.*; // For AbstractList
100:
101: @SuppressWarnings("unchecked")
102: public final class HashedArrayTree<T> extends AbstractList<T> {
103: /* To simplify the implementation, we enforce that the size of the topmost
104: * array never drops below 2. This prevents weirdness when we try to
105: * allocate 2^(n-1) arrays during a doubling and find that n = 0.
106: */
107: private static final int kMinArraySize = 2;
108:
109: /* The topmost array of elements; initially of size two. */
110: private T[][] mArrays = (T[][]) new Object[kMinArraySize][];
111:
112: /* Number of elements, initially zero since the HAT is created empty. */
113: private int mSize = 0;
114:
115: /* A constant containing lg2 of the topmost array size. This enables some
116: * cute bit-twiddling tricks to improve efficiency.
117: */
118: private int mLgSize = 1;
119:
120: /**
121: * Returns the number of elements in the HashedArrayTree.
122: *
123: * @return The number of elements in the HashedArrayTree.
124: */
125: @Override
126: public int size() {
127: return mSize;
128: }
129:
130: /**
131: * Adds a new element to the HashedArrayTree.
132: *
133: * @param elem The element to add.
134: * @return true
135: */
136: @Override
137: public boolean add(T elem) {
138: /* First, check if we're completely out of space. If so, do a resize
139: * to ensure we do indeed have room.
140: */
141: if (size() == mArrays.length * mArrays.length)
142: grow();
143:
144: /* Compute the (arr, index) pair for the next position. The next
145: * position is at the location indicated by size(), but we know that
146: * space exists from the previous call.
147: */
148: final int offset = computeOffset(size());
149: final int index = computeIndex(size());
150:
151: /* Check if an array exists here. If not, make one up. */
152: if (mArrays[offset] == null)
153: mArrays[offset] = (T[]) new Object[mArrays.length];
154:
155: /* Write the element to its location. */
156: mArrays[offset][index] = elem;
157:
158: /* Update the element count. */
159: ++mSize;
160:
161: /* Per the Collections contract, return true to signal a successful
162: * add.
163: */
164: return true;
165: }
166:
167: /**
168: * Sets the element at the specified position to the indicated value.
169: * If the index is out of bounds, throws an IndexOutOfBounds exception.
170: *
171: * @param index The index at which to set the value.
172: * @param elem The element to store at that position.
173: * @return The value initially at that location.
174: * @throws IndexOutOfBoundsException If index is invalid.
175: */
176: @Override
177: public T set(int index, T elem) {
178: /* Find out where to look. */
179: final int offset = computeOffset(index);
180: final int arrIndex = computeIndex(index);
181:
182: /* Cache the value there and write the new one. */
183: T result = mArrays[offset][arrIndex];
184: mArrays[offset][arrIndex] = elem;
185:
186: /* Hand back the old value. */
187: return result;
188: }
189:
190: /**
191: * Returns the value of the element at the specified position.
192: *
193: * @param index The index at which to query.
194: * @return The value of the element at that position.
195: * @throws IndexOutOfBoundsException If the index is invalid.
196: */
197: @Override
198: public T get(int index) {
199: /* Check that this is a valid index. */
200: if (index < 0 || index >= size())
201: throw new IndexOutOfBoundsException("Index " + index + ", size " + size());
202:
203: /* Look up the element. */
204: return mArrays[computeOffset(index)][computeIndex(index)];
205: }
206:
207: /**
208: * Adds the specified element at the position just before the specified
209: * index.
210: *
211: * @param index The index just before which to insert.
212: * @param elem The value to insert
213: * @throws IndexOutOfBoundsException if the index is invalid.
214: */
215: @Override
216: public void add(int index, T elem) {
217: /* Confirm the validity of the index. */
218: if (index < 0 || index >= size())
219: throw new IndexOutOfBoundsException("Index " + index + ", size " + size());
220:
221: /* Add a dummy element to ensure that everything resizes correctly.
222: * There's no reason to repeat the logic.
223: */
224: add(null);
225:
226: /* Next, we need to shuffle down every element that appears after
227: * the inserted element. We'll do this using our own public interface.
228: */
229: for (int i = size(); i > index; ++i)
230: set(i, get(i - 1));
231:
232: /* Finally, write the element. */
233: set(index, elem);
234: }
235:
236: /**
237: * Removes the element at the specified position from the HashedArrayTree.
238: *
239: * @param index The index of the element to remove.
240: * @return The value of the element at that position.
241: * @throws IndexOutOfBoundsException If the index is invalid.
242: */
243: @Override
244: public T remove(int index) {
245: /* Cache the value at the indicated position; this also does the bounds
246: * check.
247: */
248: T result = get(index);
249:
250: /* Use a naive shuffle-down algorithm to reposition elements after
251: * the removed one.
252: */
253: for (int i = index + 1; i < size(); ++i)
254: set(i - 1, get(i));
255:
256: /* Clobber the last element to play nicely with the garbage collector. */
257: set(size() - 1, null);
258:
259: /* Decrement our size. */
260: --mSize;
261:
262: /* If we are now at 1/8 total capacity, shrink the structure. */
263: if (size() * 8 <= mArrays.length * mArrays.length)
264: shrink();
265: /* Otherwise, if the size is now an even multiple of the array size,
266: * we can drop the very last array. This is the array whose offset
267: * is one after the end of the elements.
268: */
269: else if (size() % mArrays.length == 0)
270: mArrays[computeOffset(size())] = null;
271:
272: return result;
273: }
274:
275: /**
276: * Given an index, returns the offset into the master array at which the
277: * element with that index can be found.
278: *
279: * @return The index into the topmost array where the given element can
280: * be found.
281: */
282: private int computeOffset(int index) {
283: /* This can be computed by dividing the index by the index by the
284: * topmost array. However, if we want to be very clever, we can do
285: * this efficiently by bit-shifting downard by the lg2 of the size
286: * of the topmost array.
287: */
288: return index >> mLgSize;
289: }
290:
291: /**
292: * Given an index, returns the offset into the appropriate subarray in
293: * which the element with that index can be found.
294: *
295: * @return The index into the subarray array where the given element can
296: * be found.
297: */
298: private int computeIndex(int index) {
299: /* This can be computed by modding the index by the index by the
300: * topmost array. But we can do this more efficiently with a different
301: * tactic. Since the array size is a perfect power of two, it must
302: * look like this:
303: *
304: * 00..010..0
305: *
306: * Subtracting one yields
307: *
308: * 00..001..1
309: *
310: * ANDing this with the index produces the value we're looking for.
311: */
312: return index & (mArrays.length - 1);
313: }
314:
315: /**
316: * Grows the internal representation by doubling the size of the topmost
317: * array and copying the appropriate number of elements over.
318: */
319: private void grow() {
320: /* Double the size of the topmost array. */
321: T[][] newArrays = (T[][]) new Object[mArrays.length * 2][];
322:
323: /* The new arrays each have size 2^(n + 1). We need 2^(n - 1) of them
324: * to hold the old elements. Allocate those here and copy everything
325: * over.
326: */
327: for (int i = 0; i < mArrays.length; i += 2) {
328: /* Allocate the array. */
329: newArrays[i / 2] = (T[]) new Object[newArrays.length];
330:
331: /* Use System.arraycopy to move everything over. */
332: System.arraycopy(mArrays[i], 0, newArrays[i / 2], 0, mArrays.length);
333: System.arraycopy(mArrays[i + 1], 0, newArrays[i / 2], mArrays.length, mArrays.length);
334:
335: /* Null out the old arrays to be nice to the GC during this
336: * potentially stressful time.
337: */
338: mArrays[i] = mArrays[i + 1] = null;
339: }
340:
341: /* Switch out this new array for the old. */
342: mArrays = newArrays;
343:
344: /* Bump up lg2 of the size. */
345: ++mLgSize;
346: }
347:
348: /**
349: * Decreases the size of the HAT by shrinking into a better fit.
350: */
351: private void shrink() {
352: /* If the size of the topmost array is at its minimum, don't do
353: * anything. This doesn't change the asymptotic memory usage because
354: * we only do this for small arrays.
355: */
356: if (mArrays.length == kMinArraySize) return;
357:
358: /* Otherwise, we currently have 2^(2n) / 8 = 2^(2n - 3) elements.
359: * We're about to shrink into a grid of 2^(2n - 2) elements, and so
360: * we'll fill in half of the elements.
361: */
362: T[][] newArrays = (T[][]) new Object[mArrays.length / 2][];
363:
364: /* Copy everything over. We'll need half as many arrays as before. */
365: for (int i = 0; i < newArrays.length / 2; ++i) {
366: /* Create the arrays. */
367: newArrays[i] = (T[]) new Object[newArrays.length];
368:
369: /* Move everything into it. If this is an odd array, it comes
370: * from the upper half of the old array; otherwise it comes from
371: * the lower half.
372: */
373: System.arraycopy(mArrays[i / 2], (i % 2 == 0)? 0 : newArrays.length,
374: newArrays[i], 0, newArrays.length);
375:
376: /* Play nice with the GC. If this is an odd-numbered array, we
377: * just copied over everything we needed and can clear out the
378: * old array.
379: */
380: if (i % 2 == 1)
381: mArrays[i / 2] = null;
382: }
383:
384: /* Copy the arrays over. */
385: mArrays = newArrays;
386:
387: /* Drop the lg2 of the size. */
388: --mLgSize;
389: }
390: }