Interview puzzle: Count distinct (2)
24 februarie 2017
Here's a fun question from Marius Dumitran.
Website visit counter: Given n log lines from a website, design a datastructure that can answer count_distinct queries efficiently. Log lines are given in sorted order and contain a timestamp (int64) and a user_id (int_64). count_distinct queries have a two int64 parameters: start_time and end_time. For each query you have to return the number of distinct users that have visited the website in the time range [start_time, end_time].
For example given the log lines
And the query count_distinct(1, 5), you should return 3, since we have user 2 visiting twice in that time range, user 1 visiting twice as well and user 3 visiting only once.