ljinLab
6 min readMar 6, 2023

--

Image by Foodie Factor from Pixabay

If hello world represents everyone's first line of code, the binary search is probably the first piece of algorithm for most people. The binary search is a simple yet elegant algorithm that is used to find the value given a sorted list. In this article, I explain the basic structure of the algorithm as well as the problems with the usual way of learning binary search.

Working Principle

Binary search works by repeatedly dividing the search intervals in half, and narrowing down the search until the desired element is found. This algorithm represents a simple yet powerful concept of Divide and Conquer. The algorithm starts by checking the middle of the list. If the middle value is equal to the element that we are searching for, the target, then the search is over. However, the world is rarely that generous. If the middle value is not equal to the target, then we start the search over. Not from the complete beginning, as that would be incredibly wasteful and pointless. If the middle value is less than the target value, then we can assume that the target value, if it exists, can be found in the right half of the list. This idea is very powerful, but if you’re just getting into learning algorithms, you may be confused. How do we know that the element is in the right half in this situation? It’s because of the condition that, for the binary search to work properly, the list must already be sorted. Let’s look at it with an example.

target = 14
some_list = [1,3,4,5,6,7,8,9,10,11,14,23]

Because the list is short, we can visually verify that the list is already sorted. Therefore, we can use the binary search to check if a value is present in the list.

The total length of the list is 12. In other words, the list’s indices range from 0 to 11, inclusively. Now let’s think back to our elementary school math. How do you find the middle of two numbers? You add the two up, and divide by two. However, if the list has even elements, like the one shown in the example, the middle is a fraction. To make things simpler, we round the resulting fraction to a smaller number. In the example’s context, we would get 11 by adding 0 to 11; get 5.5 by dividing 11 by 2; and get 5 by rounding 5.5 to the smaller integer.

The 5th element of the list is 6. Because 6 is smaller than 14, our target, we assume that the target must be in the right half of the list.

Now, how do we tell our program that the element is in the right half? We can do so by changing our lower bounds. Initially, the bounds of the problem were 0, our beginning index, and 11, our last index. However, if we change the lower bound to 5, our previous middle value, the computer will search only the right half of the original list. If the middle value were larger than the target, we could assume that the target can be found in the left half, using the same logic.

We simplified the problem to be half as long, right off the bat. The binary search continues this operation until we find the target or until we run out of elements to search. Now that we’ve looked at it in English, let’s look at it in Python.

Implementation

Let’s define a function called the binary_search that takes two arguments, the sorted list and a target value. The function will search the list and return the index of the target if the target is present in the list. To make our outputs consistent, let's return -1 if the value does not exist within our list.

def binary_search(arr: list, target) -> int:

In the body of the function, let’s set some initial conditions. We need the lower bound, upper bound, and the middle index to start.

low = 0
high = len(arr) - 1
mid = (high + low) // 2

As you’ve noticed, I used two slashes to denote a division. In Python, // makes sure that the output of the division is an integer. Now that we have our bounds, let's implement the core logic.

while low <= high:
mid = (low + high) // 2
if arr[mid] == target:
return mid
elif arr[mid] < target:
low = mid + 1
else:
high = mid - 1

The while condition may throw you off. Once low has become larger than high, we will have run out of elements to search. Therefore, the condition is stating that we would like to repeat the below instructions until we exhaust the entire list or we hit a return statement.

The logic is as I’ve explained above. If the middle value is smaller than the target, we look at the right half of the remaining list by updating the lower bound. If the middle value is larger than the target, then we look at the left half of the remaining list by updating the upper bound. In each iteration, we reduce the array by half, making this an effective algorithm.

If we put it all together, the code looks like this.

def binary_search(arr: list, target) -> int:
low = 0
high = len(arr) - 1
mid = (high + low) // 2
while low <= high:
mid = (low + high) // 2
if arr[mid] == target:
return mid
elif arr[mid] < target:
low = mid + 1
else:
high = mid - 1
return -1

Once the lower bound grows larger than the upper bound, it means that the value we’re looking for is not in our list. Therefore, we return -1 as we've decided earlier.

Problem

If you’ve read up until this point and have a solid math background, you will think to yourself, “this algorithm looks fine.” However, if you’re dealing with computers, even if you think you’ve considered every edge case, there will be a new one. In this case, the breaking point is the problem of Integer Overflow.

Integer Overflow

As humans, smaller numbers are more comfortable to work with. That’s why we shrink larger numbers to be expressed as we do with scientific notations. The integer overflow occurs when the result of an arithmetic operation on integers is too large (or too small) to be represented by the integer data type being used. In other words, the result exceeds the maximum or minimum value that can be stored in the data type.

For example, if we try to add two integers, both close to the maximum value that can be represented by a 32-bit integer (2,147,483,647 or 2³¹ -1), the result will exceed this maximum value and overflow. Different programming languages deal with overflow in different ways. Most typical workarounds are wraparound and errors.

Overflow in Binary Search

For the binary search algorithm, we need to find the middle. In the progress of doing so, I introduced a familiar equation: (high + low) // 2. However, the problem arises when both high and low values are near the right end of the maximum limit. We run into a problem where we have to add two very large numbers. This problem has not been noticed until 2006.

The newly proposed equation is the following: low + (high - low) / 2. This way, by slowly working towards the right end of the range, we can avoid the integer overflow problem. If we were to write the binary search using the new equation would be as shown below.

def binary_search(arr: list, target) -> int:
low = 0
high = len(arr) - 1
mid = 0 # it doesn't matter what mid is here
while low <= high:
mid = low + ((high - low) // 2)
if arr[mid] == target:
return mid
elif arr[mid] < target:
low = mid + 1
else:
high = mid - 1
return -1

Conclusion

The key to writing code is to be humble. There will always be broken implementations in our code. This is why we have to keep a willing-to-learn attitude as developers. Jon Bentley wrote an entire book on programming and was able to share his published mistakes in front of students at CMU. If the binary search is your first step into algorithms, you’re still probably a little bit confused. However, it is critical that you remember that it’s okay for you to be confused and wrong. To quote a great wizard, “every great [developer] was a student just like us. If they can do it; why not us.”

--

--

ljinLab

Programmer, GIS Enthusiast, Entrepreneur, Life long student. (personal website under construction)