14 numbers every developer should know
26 martie 2013
Algorithms and their complexity often occur in critical parts of computer systems, but I find that few engineers have a good understanding of how a O(n!) algorithm compares to a O(n5) one.
In the coding contest world, competitors think about these tradeoffs all the time. No wonder, there's a set of numbers every algorithm designer should know.
The table below shows the limits that can be reached in a few seconds by algorithms of different complexities, n being the input size. I've added some algorithms and data structure examples for each complexity class.
|maximum n||complexity||algorithms||data structures|
|1,000,000,000 and higher||log n, sqrt n||binary search, ternary search, fast exponentiation, euclid algorithm|
|10,000,000||n, n log log n, n log* n||set intersection, Eratosthenes sieve, radix sort, KMP, topological sort, Euler tour, strongly connected components, 2sat||disjoint sets, tries, hash_map, rolling hash deque|
|1,000,000||n log n||sorting, divide and conquer, sweep line, Kruskal, Dijkstra||segment trees, range trees, heaps, treaps, binary indexed trees, suffix arrays|
|100,000||n log2 n||divide and conquer||2d range trees|
|50,000||n1.585, n sqrt n||Karatsuba, square root trick||two level tree|
|1000 - 10,000||n2||largest empty rectangle, Dijkstra, Prim (on dense graphs)|
|300-500||n3||all pairs shortest paths, largest sum submatrix, naive matrix multiplication, matrix chain multiplication, gaussian elimination, network flow|
|30-50||n4, n5, n6|
|25 - 40||3n/2, 2n/2||meet in the middle||hash tables (for set intersection)|
|15 - 24||2n||subset enumeration, brute force, dynamic programming with exponential states|
|15 - 20||n2 2n||dynamic programming with exponential states||bitsets, hash_map|
|13-17||3n||dynamic programming with exponential states||hash_map (to store the states)|
|11||n!||brute force, backtracking, next_permutation|
|8||nn||brute force, cartesian product|
These numbers aren't very precise, they assume in memory operations and some varying constant factors, but they do give a good starting point in your search for a solution that fits your problem and your data size.
Let's go through an example.
Suppose you work for a GPS company and your project is to improve their directions feature. In school you've learned about using Dijkstra's algorithm to find the shortest path between two nodes in a graph. Knowing these numbers you will understand that it will take seconds to process a graph with millions of edges given that Dijkstra implementations have m log n time complexity (where m is the number of edges and n the number of nodes).
Now you face a few questions:
How fast do you want your code to be? seconds? hundreds of milliseconds?
A response on the web feels fast if it takes less then 500 milliseconds. So let's pick half a second.
How big is the graph? Do you want to solve the problem for a city, a coutry or even a continent?
Each is about a magnitude larger than the other and will be solved by different approaches. Let's say we want to solve the problem for the whole of Europe.
Here are sizes for a few input sets:
|#nodes||18 029 721||18 741 705||24 278 285|
|#directed edges||42 199 587||47 244 849||58 213 192|
Since we chose half a second to be our execution time and the size of our problem to be about 40 million edges it's clear from our table that m log n is too slow. So pure Dijkstra won't do. We need to look at how other algorithms like A star search or one based on Highway hierarchies behave for this problem.