Binary Search over a Non-Uniform Distribution using the Geometric Mean

Introduction

Suppose you are given a list of n elements in an array, A. You want to search A for some value x. If you cannot make any assumptions about A then you must search all elements. Most people would start at the first element and proceed to the next. You could try a different search strategy, but if the contents of A are truly random then the linear search will require n/2 comparisons on average (assuming x is always present) and Θ(n) in general.

If A contains some structure then we can search much faster than Θ(n). For example, if A is sorted then we can begin at the center and recursively branch left or right based on whether the midpoint is greater than or less than x. This binary search (or bisection search) will complete in Θ(log₂ n) time.

Can we do even better than Θ(log₂ n)? If we add even more assumptions about A then the answer is yes. For example, suppose A contains only odd numbers. Is 2 ∈ A? No, obviously.

In this experiment we will search a list that is not uniformly distributed. You can create any generating function you wish to create A and x. You could think of this as a probability density function. I recommend using a power of 2 for n. Branching by arithmetic mean, m = (i + j) / 2, will always complete in exactly log₂ n steps. Branching by the geometric mean, m = √(ij) = e^{(ln i + ln j)/2}, can return a result slightly faster if x is small. This is not very exciting, but it is kind of neat to branch by an alternative midpoint for a binary search.

Experiment

Random number function:

Search random x in random A:

Result

x is present:

Position of x:

Steps with arithmetic mean:

Steps with geometric mean:

Binary Search over a Non-Uniform Distribution using the Geometric Mean

Introduction

Experiment

Result

Data