September 23, 2014

Recently I’ve stumbeled upon a few blogposts and Internet discussions involving the complexity of popular algorihms and operations of well known data structures.
In these posts, one usually can’t miss the `Big-O`

notation - including it’s pitfalls, of which there are a few.
Therefore I would like to try to clear things up.

Let’s start head on. Wikipedia says:

Let $f(x)$ and $g(x)$ be two functions defined on some subset of the real numbers. One writes: $$f(x) = O(g(x)) \text{ as } x \to \infty$$ if and only if there is a positive constant $M$ such that for all sufficiently large values of $x$, $f(x)$ is at most $M$ multiplied by $g(x)$ in absolute value. That is, $f(x) = O(g(x))$ if and only if there exists a positive real number $M$ and a real number $x_0$ such that $$|f(x)|\leq M|g(x)| \text{ for all } x > x_0.$$

Pretty straight forward? No? Ok, let’s try to break it down.

We have two functions, $f(n)$ and $g(n)$. $f(n)$ is the actual running time of our algorithm, and $g(n)$ is whats inside the $O()$. Lets set $f(n) = 2n^2+12n+61$, and $g(n) = n^2$ so we have some good ol’ nubmers to look at. Now we could write:

$$f(n) = O(g(n)) \implies 2n^2+12n+61 = O(n^2).$$

If we simplyfy, our claim is that when $n$ gets large, the two functions are almost equal. This means that if we need $f(999999)$, a pretty good aproximation would be $g(999999) = 999998000001$.

Note the ‘*pretty*’! $f(999999)$ is actually $ 2000008000051 $, which is **twice** as much.
Now, look again at $f(n)$, and take a guess why it’s twice, and not any other multiple.
So far so good. This probably isn’t news for anyone having seen the `big-O`

notation before; however, remember that we simplyfied things a little.

When working with algorithms, one usualy is interested in getting the fastest.
Why spend two seconds sorting a list of numbers when you could use only one second?
Having now learnt the magic of `big-O`

, we find a table of popular sorting algorimths, and their average case runtime;
we laugh when we realize the author of the table has included multiple $O(n^2)$ algorithms, when we
see multiple $O(n \log{n})$, and even some $O(n)$ algorithms.
“Why would you even do something like that?” we say to ourselves, and shake our heads.

Later we decide to learn the almighty `quicksort`

.
We look up an implementation in our favorite language, but we are startled when the first lines (maybe) looks something like this:

```
def quicksort(array):
if array.length < 10:
insertionsort(array)
# ...
```

We rush back to the runtime table, and finds `insertion sort`

: $O(n^2)$?!
Surely something must be wrong here; why else would one use a slower algorithm?
We decide this implementation is far from optimal, and find another one. Which turns out to be exacly the same.
So what is going one here? `big-O`

promised us that $O(n\log{n})$ is better than $O(n^2)$, so how come
quicksort wants to use a $O(n^2)$ algorithm?
This is when our simplification comes back to bite us in the rear.

We said *“If we simplyfy, our claim is that when $n$ gets large, the two functions are almost equal.”*
So what about when $n$ isn’t very large?
Looking back to the Wikipedia definition, there was something about an $x_0$.
It turns out `Big-O`

got the definition of “large” covered;
Large is if $x\gt x_0$. This doesn’t say much, though.
But here’s the kicker: **we get to choose $x_0$**.

Consider the following graph:

If we choose $x_0=a$ we can’t really say much; the graphs are switching between being on top and bottom. However, if we choose $x_0=b$ we can see that (for all we know, and which is probable) $f(x)$ is the larger function, and hence the one with the longer run time. In this case, $f(x)=x^2+8 = O(n^2)$ and $g(x) = 8x = O(n)$. Given that, we would say that $g(x)$ is the faster of the two. But what if the only possible values for $n$ is between $a$ and $b$? Then, clearly, $f(x)$ is faster, as seen from the graph, even though $O(n^2) \gt O(n)$.

Now we understand that the `Big-O`

notation doesn’t really say anything about actual runtime, just the runtime when your input is really large.
This is exacly what happends in `quicksort`

- the `insertion sort`

algorithm, though pretty slow on large input, performs really well on smaller inputs.
Even `timsort`

, the standard sorting algorithm in `Python`

, `Java SE 7`

, and the `Android`

platform, uses `insertion sort`

^{1}.

It’s often easier to grasp a concept when you don’t look at the general case, but rather a example.
Let’s say one of our functions have a runnning time of $f(n) = 2n^2 + 9n +120$.
A start would be to set $f(n) = O(2n^2 + 8n + 120)$.
Figuring out the asymptotic runitme of a function isn’t excacly straight forward.
The short story here is you can forget all parts without the highest exponential^{2}, which in this case is the $2n^2$ .
Then we’re left with $O(2n^2)$.
We can also forget about all constants; now there’s really nothing left for us to remove, as we’ve got $O(n^2)$.

How can we know this is correct? Let’s just look at the following graph:

We see $M=3$, so $Mg(n)=3n^2$, and $f(n)=2n^2+8n+120$. We can see in the beginning, $f(n)$ is the slower one (remember, the one on top is the slower!), but $3n^2$ catches up pretty fast, and it only goes one way from there. From the definition, we now see that:

$$f(n) = 2n^2 + 9n + 120 = O(n^2)$$

because $3n^2$ was always larger than $f(n)$ from the intersection point.

Again, we don’t know excacly where the graphs intersect, so just given the graphs we couldn’t say what $x_0$ could be,
but that doesn’t matter. The intersection point could be a gazillion, and it would be OK.
It could take the function a gazillion years to compute with the input $n=x_0$, but it wouldn’t matter,
because *asymptotically* $Mg(n)$ would be the larger function.

So we’ve seen a few cool graphs, and some math written in $\LaTeX$, but what does it all mean? Why should you care? Let’s say you’re writing a cool program with a function you know will be called a lot. Maybe it looks a little bit like this:

```
for (int i = 0; i < n; i++){
for (int j = 0; j < n; j++){
for (int k = 0; k < n; k++){
// Lots of cool stuff
}
}
}
```

After what we’ve learned you could easily see that this code runs at least in $O(n^3)$, and being familiar with quite a few algorithms, you realize that’s qutie a bit.^{3}
You could now consider rewriting it, making it more efficient, even without ever running it! Sounds too good to be true? Well, it kind of is.

Remember that the actual running time and *asymptotic running time* is **not** the same thing.
This means if this part of your code is crucial to your application, you should use a *profiler* to find out which of your algorithms that have the shortest *actual* running time.
If it is not crucial, you should probably leave it be. Remember kids: premature optimization is very bad!
Having said that, the asymptotic running time is *usually* a good indication.

“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.”

— Donald E. Knuth

- What about $\Omega()$ and $\Theta()$?
- I have decided not to say anything about neither of those, because
`big-O`

is the one people usually run into, and it is definitely the most used - at least outside computer science circles (you could even say outside academia). - You said one shouldn’t optimize without profiling anyway. Doesn’t that make this whole mess kind of useless?
- A lot of CS students jump to this conclusion when realizing you shouldn’t go on an optimizing spree as soon as you have learned algrorithm analysis. And it isn’t completely wrong; you probably did fine before reading this text, and you would probably continue to do so without all this. This is just another tool in your programmer toolbox. But the same way a musician should understand his instrument, a programmer should understand his algorithms and data structures - you wouldn’t care to listen to a guitarist who had no idea where all the sound came from, would you?

Time to see what you’ve learned.

- When finding the runnning time of $f(n) = O(2n^2 + 8n + 120)$, I said you could ignore constants. Why?
- How come I can say $2n^2 + 4 = O(n^4)$ and still be right?
- How would you know which algorithm is faster:
`merge-sort`

or`heap-sort`

, both of which are $O(n \log n)$?

This is by no means a complete guide to algorithm analysis.
In fact, this is only about the `big-O`

notation, on which it even isn’t a really thorough guide - this is only a gentle introduction.

If this was interesting I strongly recommend reading more. This is just the tip of the iceberg.
Great resources are CLRS^{4} if you think math is OK, or Sedgewick^{5} if you don’t.
If you actually love algorithm analysis, and just read this for fun, I recommend Knuths TAOCP^{6}. Pick any volume, although I suggest you start with volume 1 if you want to understand any of the code.^{7}
If you don’t like reading, why did you even read this seciton? $\blacksquare$

- http://en.wikipedia.org/wiki/Timsort.
^{[return]} - This is extremely simplified, and will only work on polynomials!
^{[return]} - There are good algorithms with a running time of $O(n^3)$; for instance Floyd-Warshall runs in $O(V^3)$.
^{[return]} *Introduction to Algorithms*by Cormen, Leiserson, Rivest, Stein. Amazon.com^{[return]}*Algorithms*by Sedgewick and Wayne. Amazon.com^{[return]}*The Art of Computer Programming*by Donald E. Knuth.^{[return]}- In addition, the current editions of volume 1-3 uses MIX as the target machine, although volume 4 uses MMIX, a newer and more modern version. This means if you want to read volume 4 you should also buy Volume 1 fascicle 1, as this teaches MMIX instead of MIX.
^{[return]}