Interview puzzle: Count distinct (2)

Cosmin Negruseri
24 februarie 2017

Here's a fun question from Marius Dumitran.

Website visit counter: Given n log lines from a website, design a datastructure that can answer count_distinct queries efficiently. Log lines are given in sorted order and contain a timestamp (int64) and a user_id (int_64). count_distinct queries have a two int64 parameters: start_time and end_time. For each query you have to return the number of distinct users that have visited the website in the time range [start_time, end_time].

For example given the log lines
(0, 1)
(1, 2)
(2, 2)
(3, 1)
(4, 3)
(5, 1)
(6, 2)
(7, 3)
And the query count_distinct(1, 5), you should return 3, since we have user 2 visiting twice in that time range, user 1 visiting twice as well and user 3 visiting only once.

remote content