- Implementing Splunk(Second Edition)
- Vincent Bumgarner James D. Miller
- 817字
- 2025-04-04 20:51:35
Using stats to aggregate values
While top
is very convenient, stats
is extremely versatile. The basic structure of a stats
statement is:
stats functions by fields
Many of the functions available in stats
mimic similar functions in SQL or Excel, but there are many functions unique to Splunk. The simplest stats
function is count
. Given the following query, the results will contain exactly one row, with a value for the field count:
sourcetype=tm1* error | stats count
Using the by
clause, stats
will produce a row per unique value for each field listed, which is similar to the behavior of top
. Run the following query:
sourcetype=tm1* error | stats count by date_month date_wday
It will produce a table like that shown in the following screenshot:

There are a few things to notice about these results:
- The results are sorted against the values of the by fields, in this case date_month followed by date_wday. Unlike
top
, the largest value will not necessarily be at the top of the list. You can sort in the GUI simply by clicking on the field names at thetop
of the table, or by using thesort
command. - There is no limit to the number of rows produced. The number of rows will equal all possible combinations of field values.
- The function results are displayed last. In the next example, we will add a few more functions, and this will become more obvious.
Using stats
, you can add as many by fields or functions as you want into a single statement. Let's run this query:
sourcetype=tm1* error | stats count avg(linecount) max(linecount) as "Slowest Time" by date_month date_wday
The results look like those in the following screenshot:

Let's step through every part of this query, just to be clear:
sourcetype=tm1* error
is the query itself| stats
starts thestats
commandcount
will return the number of eventsavg(linecount)
produces an average value of thelinecount
fieldmax(linecount)
asSlowest Time
finds the maximum value of thelinecount
field and places the value in a field calledSlowest Time
The quotes are necessary because the field name contains a space.
by
indicates that we are done listing functions and want to list the fields to slice the data by. If the data does not need to be sliced, by and the fields following it can be omitted.date_month
anddate_wday
are our fields for slicing the data. All functions are actually run against each set of data produced per each possible combination ofdate_month
anddate_user
.
If an event is missing a field that is referenced in a stats
command, you may not see the results you are expecting. For instance, when computing an average, you may wish for events missing a field to count as zeros in the average. Also, for events that do not contain a field listed in the by
fields, the event will simply be ignored.
To deal with both of these cases, you can use the fillnull
command to make sure that the fields you want exist. We will cover this in Chapter 6, Advanced Search Examples.
Let's look at another example, using a time-based function and a little trick. Let's say we wanted to know the most recent time at which a particular user saw an error each day.
We can use the following query:
sourcetype=tm1* Error TheUser="Admin" | stats count first(date_wday) max(_time) as _time by source
This query produces the following table:

Let's step through this example:
sourcetype=tm1* Error TheUser="Admin"
is the query that will find all errors logged by the user"Admin"
.| stats
is our command.count
shows how many times this user saw an error each day.first(date_wday)
gives us the weekday that was most recently logged for this user. This will be the most recent event, since results are returned in the order of the most recent first.max(_time)
as_time
returns the time at which the user most recently saw an error that day. This takes advantage of three aspects of time in Splunk:_time
is always present in raw events. As discussed in Chapter 2, Understanding Search, the value is the number of seconds since January 1, 1970, UTC._time
is stored as a number and can be treated as such.- If there is a field called
_time
in the results, Splunk will always display the value as the first column of a table in the time zone selected by the user.
by source
is our field to split the results against and in this example it is by datasource
or the error log file that the error(s) were found in.
We have only seen a few functions in stats
. There are dozens of functions and some advanced syntax that we will touch upon in later chapters. The simplest way to find the full listing is to search with your favorite search engine for the Splunk stats
functions.