❗️ This guide is intended only for development of PostHog itself. If you're looking for documentation on writing HogQL queries, go to HogQL.
HogQL is our layer on top of ClickHouse SQL which provides nice features such as:
- Automatic person/group/etc property joins depending on the team/context
- Customisable database schema per team
- Flexible AST-powered templating language for building queries.
Query templates
HogQL queries are built up from AST (Abstract Syntax Tree) nodes.
You can build the nodes yourself, or use the helpers parse_expr
and parse_select
to convert HogQL strings into AST nodes:
from posthog.hogql import astfrom posthog.hogql.query import execute_hogql_queryfrom posthog.hogql.parser import parse_expr, parse_selectnum_last_days = 2stmt = parse_select("select event, timestamp from events where {where} limit 100",{'where': parse_expr('timestamp > interval {days} days',{ 'days': ast.Constant(value=num_last_days) })})query_result = execute_hogql_query(query=stmt, team=team, query_type="used in logs")query_result.results == [...]query_result.columns == ['event', 'timestamp'] # might be useful if you select '*'
Few things to note:
parse_select
parses fullSELECT
queries whileparse_expr
parses any expression (1+1
orevent
or even a subquery(select 1)
). It's not possible to parse parts of a select query, such aslimit 10
.- Placeholders like
{where}
are just nodes of typeast.Placeholder(field='where')
. You can leave them in, and callstmt = replace_placeholders(stmt, { where: parse_expr('1') })
later. - We wrote one AST node ourselves:
ast.Constant(value=num_last_days)
. We did it to santise the value by make sure it's treated as a constant. We might simplify constants further (e.g.parse_const
or just{days: 2}
), but we're not there yet.
AST nodes
If you want more control, you can build the AST nodes directly. The same query above can be written as:
from posthog.hogql import astfrom posthog.hogql.query import execute_hogql_queryfrom posthog.hogql.parser import parse_exprnum_last_days = 2stmt = ast.SelectQuery(select=[ast.Field(chain="event"), ast.Field(chain="timestamp")],select_from=ast.JoinExpr(table=ast.Field(chain=["events"])),where=parse_expr("timestamp > interval {days} day",{ 'days': ast.Constant(value=num_last_days) }),limit=ast.Constant(value=100),)query_result = execute_hogql_query(query=stmt, team=team, query_type="used in logs")query_result.results == [...]query_result.columns == ['event', 'timestamp'] # might be useful if you select '*'
You can mix and match parse_expr
and ast
nodes as you please. The example above still took a shortcut for the where clause because it was easier to write.
Database schema and features
The HogQL database schema is in flux. You will soon be able to explore it in the PostHog app itself.
The most up to date resource is hogql/database.py on Github. At the time of writing, these tables were available:
class Database(BaseModel):# Users can query from the tables belowevents: EventsTable = EventsTable()persons: PersonsTable = PersonsTable()person_distinct_ids: PersonDistinctIdTable = PersonDistinctIdTable()session_recording_events: SessionRecordingEvents = SessionRecordingEvents()cohort_people: CohortPeople = CohortPeople()static_cohort_people: StaticCohortPeople = StaticCohortPeople()
Some tables have some fields that are actually "lazy tables". When accessed they will add a join to the table. The events table is such an example:
class EventsTable(Table):uuid: StringDatabaseField = StringDatabaseField(name="uuid")event: StringDatabaseField = StringDatabaseField(name="event")properties: StringJSONDatabaseField = StringJSONDatabaseField(name="properties")timestamp: DateTimeDatabaseField = DateTimeDatabaseField(name="timestamp")team_id: IntegerDatabaseField = IntegerDatabaseField(name="team_id")distinct_id: StringDatabaseField = StringDatabaseField(name="distinct_id")elements_chain: StringDatabaseField = StringDatabaseField(name="elements_chain")created_at: DateTimeDatabaseField = DateTimeDatabaseField(name="created_at")# lazy table that adds a join to the persons tablepdi: LazyTable = LazyTable(from_field="distinct_id", table=PersonDistinctIdTable(), join_function=join_with_max_person_distinct_id_table)# person fields on the event itselfpoe: EventsPersonSubTable = EventsPersonSubTable()# These are swapped out if the user has PoE enabledperson: FieldTraverser = FieldTraverser(chain=["pdi", "person"])person_id: FieldTraverser = FieldTraverser(chain=["pdi", "person_id"])
If you access pdi.person.properties.$browser
, we make a join via persons
(this is a HogQL table name, not ClickHouse name). We do a bunch of argmax
magic in the join, and inline all accessed properties within the subquery for performance. For the user, it looks just like simple property access.
If you access poe.properties.$browser
, we will actually access the field person_properties
on the events table.
In practice, you should avoid both and access person.properties.$browser
, which will choose the right approach for you.
Add new tables and fields as needed! Just make sure each table has a team_id
column.