How to filter and breakdown arrays with HogQL
Oct 05, 2023
Arrays (AKA lists) are a useful way to store multiple values related to each other under the same key. Although arrays can be a bit tricky to utilize with standard PostHog filters, HogQL expressions unlock the ability to make full use of them.
This tutorial shows you how to access arrays in your data, use them in breakdowns, and set up filters with and for them.
Fun fact: A group of hedgehogs is known as an array. This is an inspiration for our name and mascot.
Getting or creating arrays
Before you do anything, you need access to the arrays in your data. There are a few ways to do this in insights:
Select a property that is already an array like
properties.$active_feature_flags
.Split a string with
splitByChar
. For example, splitting the autocapture element chain withsplitByChar('.', elements_chain)
.Extract an array from JSON with
JSONExtractArrayRaw
orJSONExtractKeys
.
Once you can access the array, you can breakdown or filter using it.
Breaking down arrays
Once we have our array, we can break it down further. To do this, create an insight, click "Add breakdown" under "Breakdown by," and select HogQL. Here we can write our HogQL expressions to break down arrays.
The most useful breakdown with arrays is arrayJoin
. This is a special expression that "unfolds" an array into multiple rows. and helps us access each value from within an array. For example, to get a count of the usage of different $active_feature_flags
, you can use this HogQL expression breakdown:
arrayJoin(JSONExtractArrayRaw(properties.$active_feature_flags ?? '[]'))
Adding
?? '[]'
is a "nullish coalescing operator" that provides a default value to your properties. In this case, it provides an empty array ifproperties.$active_feature_flags
is null or undefined.
More useful array breakdowns include:
arrayUniq
for counting the number of unique values in the array.arrayDistinct
which returns an array containing only the distinct elements. You can combine this witharrayJoin
to get counts for distinct values within arrays.arrayElement(arr, 1)
for breaking down the first elements in the array.arrayElement(arr, -1)
for breaking down the last elements in the array.
Once you get individual number or string values from the breakdown, you can modify these values further too. Read our HogQL expression docs for more useful functions for doing this.
Filtering arrays
Once you break down your array, you might find it contains values you don't care about. You can use filters to remove them. Filtering arrays is relatively similar to breaking down, except you must provide values you want to filter out.
To add a filter, click the filter dropdown next to your data series, click "Add filter," select HogQL, and add your expression. You can also use filters in the events explorer tab.
To start, you can remove empty arrays with a notEmpty()
check. For example, to remove empty arrays from the $active_feature_flags
by filtering for the HogQL expression:
notEmpty(JSONExtractArrayRaw(properties.$active_feature_flags ?? '[]'))
If you want to filter arrays for specific values you can use has
, hasAll
, and hasAny
:
has()
filters for arrays containing an element.has(arr, 'test')
matches arrays containing'test'
.hasAll()
filters for arrays containing all of another array of elements.hasAll(arr, ['test', 'cool'])
matches arrays containing both'test'
and'cool'
.hasAny()
filters for arrays containing any of another array of elements.hasAny(arr, ['test', 'cool'])
matches arrays containing either'test'
or'cool'
.
Notes:
If you get the error "Nested type Array(String) cannot be inside Nullable type" (or similar), you can wrap your property value in a
assumeNotNull
expression likehas(assumeNotNull(arr), 'test')
.When selecting a number, you need to wrap it in single quotes. For example,
has(arr, '1')
matches arrays containing the number1
.
If you want to filter arrays by the first, last, or specific index value, you can use arrayElement(arr, n)
again too.
Further reading
- Using HogQL for advanced time and date filters
- Using HogQL for advanced breakdowns
- How to do time-based breakdowns (hour, minute, real time)