Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I now that projects like http://druid.io use HLL to do timeseries analytics. Can someone explain or link to an explanation about how counting sets cardinality can be used for time series analysis? (Do I need to add I haven't find a satisfactory one myself? :).

Naively I think it could be done by factoring in a timestamp on the things to count:

  PFADD SOMEHLLVAR "#{timestamp1}#{event1}" "#{timestamp}#{event2}"
  PFADD SOMEHLLVAR "#{timestamp2}#{event3}" "#{timestamp}#{event2}"

  PFCOUNT "#{timestamp1}#{event1}" # -> 1 
  PFCOUNT "#{timestamp2}#{event1}" # -> 0
etc..

The problem with doing that is that you would need to iterate through each second in a given range to find out the count of specific events... Also, seems like a pretty wasteful way of encoding time <edit> thinking about it is probably not wasteful in the sense the hll size should be the same, but probably sub-optimal some other way



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: