14 numbers every developer should know

Cosmin
Cosmin Negruseri
26 martie 2013

Jeff Dean , a famous Google engineer, popularized a list of latency numbers everyone should know. The list is a great resource for designing large scale infrastructure systems.

Algorithms and their complexity often occur in critical parts of computer systems, but I find that few engineers have a good understanding of how a O(n!) algorithm compares to a O(n5) one.

In the coding contest world, competitors think about these tradeoffs all the time. No wonder, there's a set of numbers every algorithm designer should know.

The table below shows the limits that can be reached in a few seconds by algorithms of different complexities, n being the input size. I've added some algorithms and data structure examples for each complexity class.

maximum ncomplexityalgorithmsdata structures
1,000,000,000 and higherlog n, sqrt nbinary search, ternary search, fast exponentiation, euclid algorithm 
10,000,000n, n log log n, n log* nset intersection, Eratosthenes sieve, radix sort, KMP, topological sort, Euler tour, strongly connected components, 2satdisjoint sets, tries, hash_map, rolling hash deque
1,000,000n log nsorting, divide and conquer, sweep line, Kruskal, Dijkstrasegment trees, range trees, heaps, treaps, binary indexed trees, suffix arrays
100,000n log2 ndivide and conquer2d range trees
50,000n1.585, n sqrt nKaratsuba, square root tricktwo level tree
1000 - 10,000n2largest empty rectangle, Dijkstra, Prim (on dense graphs) 
300-500n3all pairs shortest paths, largest sum submatrix, naive matrix multiplication, matrix chain multiplication, gaussian elimination, network flow 
30-50n4, n5, n6  
25 - 403n/2, 2n/2meet in the middlehash tables (for set intersection)
15 - 242nsubset enumeration, brute force, dynamic programming with exponential states 
15 - 20n2 2ndynamic programming with exponential statesbitsets, hash_map
13-173ndynamic programming with exponential stateshash_map (to store the states)
11n!brute force, backtracking, next_permutation 
8nnbrute force, cartesian product 

These numbers aren't very precise, they assume in memory operations and some varying constant factors, but they do give a good starting point in your search for a solution that fits your problem and your data size.

Let's go through an example.

Suppose you work for a GPS company and your project is to improve their directions feature. In school you've learned about using Dijkstra's algorithm to find the shortest path between two nodes in a graph. Knowing these numbers you will understand that it will take seconds to process a graph with millions of edges given that Dijkstra implementations have m log n time complexity (where m is the number of edges and n the number of nodes).

Now you face a few questions:

How fast do you want your code to be? seconds? hundreds of milliseconds?

A response on the web feels fast if it takes less then 500 milliseconds. So let's pick half a second.

How big is the graph? Do you want to solve the problem for a city, a coutry or even a continent?

Each is about a magnitude larger than the other and will be solved by different approaches. Let's say we want to solve the problem for the whole of Europe.

Here are sizes for a few input sets:

inputEuropeUSA/CANUSA (Tiger)
#nodes18 029 72118 741 70524 278 285
#directed edges42 199 58747 244 84958 213 192
#road categories13134

Since we chose half a second to be our execution time and the size of our problem to be about 40 million edges it's clear from our table that m log n is too slow. So pure Dijkstra won't do. We need to look at how other algorithms like A star search or one based on Highway hierarchies behave for this problem.

Categorii:
remote content