Big O Primer

This is what happens when you don’t know Big O

Introduction

Big O is all about saving time and saving space, two resources that computer algorithms consume to do repetitive jobs (like sorting strings, calculating sums, or finding primes in haystacks). Big O analysis is required when data points grow very numerous. If you need to sort thousands, millions, billions, or trillions of data points then Big O will save you time and/or space, help you pick the right algorithm, guide you on making the right trade-offs, and ultimately be enable you to be more responsive to users. Imagine if Google Search told users to grab a cup of office while searching for the best coffee shop… madness would break out!

You could argue that Big O analysis is not required when filling a few dozen rows with fetched data in a web view. But it’s a good idea to understand why a naive user experience can create a backend scaling nightmare. Product Managers, Designers, and Frontend Devs will want to understand and be able to discuss why fetching a fresh list of every customer from the database for every page view a bad idea. Even with caching and CDNs Big O leads to better system design choices.

Properties

There are three points of view on algorithm complexity analysis:

  • Big O: upper bounds, worst case estimate.
  • Big Ω: lower bounds, best case estimate.
  • Big Theta: lower and upper, tightest estimate.

The distinctions are literally academic and generally when we let’s do a Big O analysis of algorithm complexity we mean let’s find the tightest, most likely estimate.

There are two types of algorithm complexity analysis:

  • Time complexity: an estimation of how much time source code will take to do its job.
  • Space complexity: an estimation of how much space source code will take to do its job.

Most of the time we worry most about time complexity (performance) and we will happily trade memory (space) to save time. Generally, there is an inverse relationship between speed of execution and the amount of working space available. The faster an operation can be performed the less local space your source code has to work with to do it.

CPUs have memory registers that they can access quickly but only for small amounts of data (bytes and words). CPUs also have access to increasing slower but larger caches for storing data (L1, L2, L3). You, as a coder, usually don’t worry about CPUs, registers, and level 1, 2, or 3 caches. Your programming language environment (virtual machine, compiler) will optimize all this very low-level time and space management on your behalf.

Modern software programs, which I will call source code, live in a virtual world where code magically runs and memory is dutifully managed. As a coder you still occasionally have to step in and personally manage memory, understand what data is allocated to the stack versus the heap, and make sure the objects your source code has created have been destroyed when no longer needed.

Modern software applications, which I will also call source code, live in a virtual world where apps magically run and storage is autoscaled. As a system designer you’ll have to choose your virtual machines and storage APIs carefully—or not if you let managed app services and containerization manage all your code and data as a service.

Usage

Big O will help you help you make wise choices with program design and system design. Big O, like other dialects of mathematics, is abstract and applies to wide range of software phenomenon. Using Big O to optimize your source code means faster web pages, apps, and games that are easier to maintain, more responsive to the users, and less costly to operate.

Big O reminds me of Calculus and Probability! It’s all about estimating the rate of change in how your source code processes data over time. Will that rate of change get slower with more data? That answer is almost always yes! Big O helps you identify bottlenecks in your source code, wasted work, and needless duplication. Big O does this through a formal system of estimation, showing you what your source code will do with a given algorithm and a given set of data points.

There are even more bottlenecks that will slow down your app even if your algorithms are optimized using Big O. The speed of your CPU, the number of cores it contains, the number servers, the ability to load balance, the number of database connections, and the bandwidth and latency of your network are the usual suspects when it comes to poor performance in system design. Big O can help with these bottlenecks as well so remember to include these conditions in your circle of concern!

Big O estimated are used to:

  • Estimate how much CPU or memory source code requires, over time, to do its job given the amount of data anticipated.
  • Evaluate algorithms for bottlenecks.

Big O helps engineers find ways to improve scalability and performance of source code:

  • Cache results so they are not repeated, trading off one resource (memory) for another more limited resource (compute).
  • Use the right algorithm and/or data structure for the job! (Hash Table, Binary Tree, almost never Bubble Sort).
  • Divide the work across many servers (or cores) and filter the results into a workable data set (Map Reduce).

Notation

Big O notation is expressed using a kind of mathematical shorthand that focuses on what is important in the execution of repeated operations in a loop. The large number should be really big, thousands, millions, billions, or trillions, for Big O to help.

Big O looks like this O(n log n)

  • O( … ) tell us there is going to be a complexity estimate between the parentheses.
  • n log n is a particular formula that expresses the complexity estimate. It’s a very concise and simple set of terms that explain how much the algorithm, as written in source code, will most likely cost over time.
  • n represents the number of data points. You should represent unrelated, but significant, data points with their own letters. O(n log n + x^2) is an algorithm that works on two independent sets of data (and it’s probably a very slow algorithm).

Big O notation ignores:

  • Constants: values that don’t change over time or in space (memory).
  • Small terms: values that don’t contribute much to the accelerating amount of time or space required for processing.

Big O uses logarithms to express change over time:

  • A logarithm is a ratio-number used to express the rate of change over time for a given value.
  • An easy way to think of rate-of-change is to image a grid with an x-axis and a y-axis with the origin (x = 0, y = 0) in the lower left. If you plot a line that moves up and to the right, the rate of change is the relation between the x-value and the y-value as the line moves across the grid.
  • A logarithm expresses the ratio between x and y as a single number so you can apply that ration to each operation that processes points of data.
  • The current value of a logarithm depends on the previous value.
  • The Big O term log n means for each step in this operation the data point n will change logarithmically.
  • log n is a lot easier to write than n – (n * 1/2) for each step!
  • Graphing logarithmic series makes the efficiency of an algorithm obvious. The steeper the line the slower the performance (or the more space the process will require) over time.

Values

Common Big O values (from inefficient to efficient):

  • O(n!): Factorial time!
    Near vertical acceleration over a short period of time. This is code that slows down (or eats up a lot of space) almost immediately and continues to get slower rapidly. Our CPUs feel like they are stuck. Our memory chips feel bloated!
  • O(x^n): Exponential time!
    Similar to O(n!) only you have a bit more time or space before your source code hits the wall. (A tiny bit.)
  • O(n^x): Algebraic time!
    Creates a more gradual increase in acceleration of complexity… more gradual than O(x^n) but still very slow. Our CPUs are smoking and our memory chips are packed to the gills.
  • O(n^2): Quadratic time!
    Yields gradual increase in complexity by the square of n. We’re getting better but our CPUs are out of breath and our memory chips need to go on a diet.
  • O(n log n): Quasilinear time!
    Our first logarithm appears. The increase in complexity is a little more gradual increase but still unacceptable in the long run. The n means one operation for every data point. The log n means diminishing increments of additional work for each data point over time. n log n means the n is multiplied by log n for each step.
  • O(n): Linear time!
    Not too shabby. For every point of data we have one operation, which results in a very gradual increase in complexity, like a train starting out slow and building up steam.
  • O(n + x): Multiple Unrelated Terms
    Sometimes terms don’t have a relationship and you can’t reduce them. O(n + x) means O(n) + O(x). There will be many permutations of non-reducible terms in the real world: O(n + x!), O(n + x log x), etc….
  • O(log n): Logarithmic time!
    Very Nice. Like O(n log n) but without the n term so it a very slow buildup of complexity. The log n means the work get 50% smaller with each step.
  • O(1): Constant time!
    This is the best kind of time. No matter what the value is of n is, the time or space it takes to perform an operation remains the same. Very predictable!

Analysis (discovering terms, reducing terms, discarding terms):

  • Look for a loop! “For each n, do something” creates work or uses space for each data point. This potentially creates O(n).
  • Look for adjacent loops! “For each n, do something; For each x, do something” potentially creates O(n + x).
  • Look for nested loops! “For each n, do something for each x”. This creates layers of work for each data point. This potentially creates O(n^2)
  • Look for shifted loops! “For each decreasing value of n, do something for each x”. This creates layers of work that get 50% smaller over time as the input is divided in half with each step. This potentially creates O(log n).