Skip to main content
Version: 0.11

The stats namespace

The stats module contains functions for aggregating statistical measures of various events.

Size

When using stats aggregate functions size in memory becomes an important factor from a capacity planning perspective. The exact size of a window using aggregates depends on three main factors:

  • The size of the dimension identifier. I.e. if the window is identified by the string "window" it will require that amount of memory related to this. If it is identified by an array of 10.000 elements all reading "window" it will use (about) 10.000 times that size.
  • The unit size of each aggregate used in the window. We will try to give an estimate of size for each aggregate but please be aware that those are not always exact as they can depend on the data they hold.
  • The number of groups, if grouping is configured. Each group will maintain a separate window of data

For aggregates we'll provide an "order of magnitude" and a growth rate if applicable.

For example Fixed, 10 bytes indicate that the size doesn't grow and is in the order of two digit bytes. We try to give pessimistic estimates where possible.

Functions

aggr::stats::count() -> int

  • size: Fixed, 10 bytes

Counts the number of events aggregated in the current windowed operation.

aggr::stats::count() # number of items in the window

aggr::stats::min(int|float) -> int|float

  • size: Fixed, 10 bytes

Determines the smallest event value in the current windowed operation.

aggr::stats::min(event.value)

aggr::stats::max(int|float) -> int|float

  • size: Fixed, 10 bytes

Determines the largest event value in the current windowed operation.

aggr::stats::max(event.value)

aggr::stats::sum(int|float) -> int|float

  • size: Fixed, 10 bytes

Determines the arithmetic sum of event values in the current windowed operation.

aggr::stats::sum(event.value)

aggr::stats::var(int|float) -> float

  • size: Fixed, 100 bytes

Calculates the sample variance of event values in the current windowed operation.

aggr::stats::var(event.value)

aggr::stats::stdev(int|float) -> float

  • size: Fixed, 100 bytes

Calculates the sample standard deviation of event values in the current windowed operation.

aggr::stats::stdev(event.value)

aggr::stats::mean(int|float) -> float

  • size: Fixed, 100 bytes

Calculates the stastical mean of the event values in the current windowed operation.

aggr::stats::mean(event.value)

aggr::stats::hdr(int|float) -> record

  • size: Fixed, 100 Kilo Bytes (note: this strongly depends on configuration, and can be estimated more correctly with this formula)

Uses a High Dynamic Range ( HDR ) Histogram to calculate all basic statistics against the event values sin the current windowed operation. The function additionally interpolates percentiles or quartiles based on a configuration specification passed in as an argument to the aggregater function.

The HDR Histogram trades off memory utilisation for accuracy and is configured internally to limit accuracy to 2 significant decimal places.

aggr::stats::hdr(event.value, ["0.5","0.75","0.9","0.99","0.999"])

Example output:

{
"min": 1,
"max": 100,
"count": 100,
"mean": 50.5,
"stdev": 28.866_070_047_722_12,
"var": 833.25,
"percentiles": {
"0.5": 50,
"0.9": 90,
"0.95": 95,
"0.99": 99,
"0.999": 100,
"0.9999": 100
}
}

aggr::stats::dds(int|float) -> record

  • size: Fixed, 10 Kilo Bytes (estimate based on this paper)

Uses a Distributed data-stream Sketch ( DDS (paper) Histogram to calculate count, min, max, mean and quartiles with quartile relative-error accurate over the range of points in the histogram. The DDS histogram trades off accuracy ( to a very low error and guaranteed low relative error ) and unlike HDR histograms does not need bounds specified.

aggr::stats::dds(event.value, ["0.5","0.75","0.9","0.99","0.999"])

Example output:

{
"count": 100,
"sum": 5050.0,
"min": 1.0,
"max": 100.0,
"mean": 50.5,
"percentiles": {
"0.5": 50.0,
"0.9": 89.2,
"0.95": 94.7,
"0.99": 98.6,
"0.999": 98.6,
"0.9999": 98.6,
}
}