Searching and Sorting

June 10, 2020 Java 9 min

Sorting

Sorting is the process of arranging a set of data into an ordered sequence. Most commonly, this is done with numbers in either ascending or descending order, but can also be done on other types of data such as strings. In the real world sorting is used in a variety of tasks, from attendance lists, to databases, and even to improve searching speed (something we will cover in the second part of this tutorial).

Throughout this tutorial, we will be primarily sorting numbers because they are easier to work with. We will also be sorting them in ascending order, as shown in the example below:

{5, 4, 2, 1, 3} in to {1, 2, 3, 4, 5}

Bubble Sort

While many different sorting algorithms exist1, one of the most commonly taught is Bubble Sort. The reason for this choice is it's extreme simplicity. In fact, let's take a look at the entire code required for bubble sorting the array in the previous example, then go over it piece by piece later on.

int[] arr = {5, 4, 2, 1, 3};

for (int i = 0; i < arr.length; i ++) {
    for (int j = 1; j < arr.length - j; j ++) {
        if (arr[j] > arr[j-1]) {
            int temp = arr[j-1];
            arr[j-1] = arr[j];
            arr[j] = temp;
        }
    }
}

That's right, you can sort an array of any integers in under 10 lines of code2.

In simplest terms, bubble sort works by repeatedly swapping adjacent pairs of integers until the array is sorted. To better understand this, let's draw out a diagram of the array in it's initial, unsorted position.

Unsorted array

Suppose we would like to move the largest element to the end, into its final position. The simplest way to do so is the following:

  1. Compare the first and second element.
  2. If the first element is greater than the second element, swap the elements.
  3. Repeat step 1 offset by one element (comparing second and third, then third and fourth, etc.)

Let's visualize this transition.

And now the code that does this:

// Step 3 in following example.
for (int j = 1; j < arr.length - j; j ++) {
    if (arr[j] > arr[j-1]) { // Step 1
        int temp = arr[j-1]; // Step 2
        arr[j-1] = arr[j];   // ...
        arr[j] = temp;       // ...
    }
}

Now that we have one element in the correct place, we can repeat these steps for remaining elements. This is the outer loop of the program, as seen below:

for (int i = 0; i < arr.length; i ++) {
    ...
}

Let's visualize that below:

And there you have it, a sorted array.

Other Sorts

Unfortunately, as simple as bubble sort is to write, it's incredibly slow. From the nested for loop we can see that this sorting operation will access each element as many times as there are elements before it. When rounded, this gives us a big O notation3 or time complexity of O(n2). I won't get into the more complex aspects of Big O Notation here, but all you need to know is that the larger the value in the brackets, the slower it is.

With that understood, let's take a look at a much faster sorting algorithm.

Array.sort(arr);

That's right, a single line of code allows you to sort an array of basically any type of object. Internally, the sorting implementation that Java uses is:

a Dual-Pivot Quicksort by Vladimir Yaroslavskiy, Jon Bentley, and Joshua Bloch. This algorithm offers O(n log(n)) performance on many data sets that cause other quicksorts to degrade to quadratic performance, and is typically faster than traditional (one-pivot) Quicksort implementations.


Searching


Often, once you have a set of data, it is necessary to find a specific item in that data. For example, a bank may have a list of hundreds of thousands of accounts and when you login to yours, you want to see the information for your specific account. So, the banking application or backend needs to search through the database of all accounts and select yours.

Sequential Search

The simplest searching algorithm is know as sequential search and involves going through the entire set of data and comparing the value that one is looking for with the current value and returning it if they match. Assuming we use the same array as before and would like to find the index of the value 1, let's write a sequential searching algorithm for this.

int[] arr = {5, 4, 2, 1, 3};

for (int i = 0; i < arr.length; i ++) {
    if (arr[i] == 1) {
        return i;
    }
}

If this snippet were inside a method, it would return the value 3, because the element with value 2 is at index 3.

Binary Search

Unfortunately, the same way that bubble sort was slow, sequential search is also extremely inefficient. It requires going through every element in the array until the correct value is found. At worst, this can take a time of O(n) where n is the number of elements in the array. This is the case because at the worst case, every value needs to be accessed once.

A much better solution is to use Binary Search — but it has one downside: it requires the array to be sorted initially. The way it works is by starting at the middle of the array and checking whether the value being looked for is larger or smaller than that value (or if it's equal, just return it). Assuming the array is sorted least to greatest, if the value is larger then the entire right side of the array is cut off and the same process of choosing the middle is done on the left half. If the opposite is true, with the value at the middle being smaller, then the left side of the array is cut off. With this process, it is possible to find the value in O(nlogn) time. While I won't go into what this means, one can tell that this search is faster because we can only divide the array in half so many times. For example, an array of 64 elements can only be halved 6 times so the element must be found in, at most 6 steps, much better than the 64 required in a sequential search. So with it explained, let's visualize it below.


All visualization were made with algorithm-visualizer. You can view the bubble sort visualization here and the binary search visualization here.


  1. Wikipedia has a comparison of many sorting algorithms here.

  2. Excluding the array declaration and initialization. Yes, you can convert this into a single line.

  3. Big O Notation is far more complicated than this. More information available in this Wikipedia article.