Overview
Pendo Reseller Adopt End Users (AEUs) may want to utilize the API in order to push data from their Reseller Adopt Accounts to another source.
Only Reseller Adopt Subscription Admins have the ability to create API keys. They can create API keys by navigating to Settings > Integrations. If the Integrations tab is unavailable, the API has not been enabled for your Account. Please contact your Pendo Reseller Adopt Partner to inquire about API access.
Note: If you sign into your account from somewhere other than adopt.pendo.io you will need to update the endpoint URLs to match the custom hostname.
Aggregation API Documentation
The Aggregation API can be reached via the following endpoint:
https://adopt.pendo.io/api/v1/aggregation
Aggregations are a query language for accessing your data. They take sources of data and apply operators to do computations, and are written in JSON.
It is worth noting, the aggregations endpoint is not intended to be a bulk export feature, so breaking up aggregations by different time ranges may be necessary if you hit sizing or timeout limits.
Data querying is performed using a flexible aggregation pipeline modeled after MongoDB. Each step of the pipeline takes rows as its input and output zero or more rows. The basic unit is a row which contains named fields (possible nested) fields. At a high level, an aggregation consists of
- A row source and time specification
- A pipeline for row processing
All input data rows for the specified time period are fed into the aggregation pipeline. The rows which those pipeline yields are the result of the aggregation; a pipeline may result in no rows or in many rows depending on the pipeline and the input.
If the time specification expands into 24 separate time periods (one for each hour of a day, perhaps) then 24 independent pipelines are run, and the aggregation will have 24 sets of results. Each of those results in independent of the others.
Nested Fields
Example structure
{ "name": { "first": "bugs", "last": "bunny" } }
Wherever a field name is needed, a dotted list of field names can be used to specify nested objects.
Nested specifiers can be used to create new fields as well; any missing intermediate objects will be initialized.
In the example to the right, the field name.last
has the value bunny.
Data Stores
Data Stores | |
---|---|
Visitors | General information on all visitors ever seen. |
Events | Each interaction with your application by a visitor. This includes click, page load, metadata, and guide events. The definitions of pages and guides. |
Expressions
A number of pipeline operators support expressions. Expressions follow basic C expression syntax along with function calls. Supported operators are ==
, !=
, &&
, ||
, <
, >
, <=
, >=
, +
, -
, /
, *
, %
, and !
.
All numeric value are converted to float's for computational purposes, and C order of operation is used.
Parenthesis ()
may be used for grouping expressions.
The comparison operators can be used to compare strings, the logical operators and comparison work on boolean values, and the equality operators can compare two lists.
For the ||
and &&
operators, null
is treated equivalently to false
.
The ==
and !=
operators treat null
as a distinct value (so null == null
but null != false
) and any other operator which has null
as a parameter returns null
as its value (so 1 + null == null
). Looking up a field that does not exist in a row results in null
.
Values may be extracted from list by using normal JavaScript style indexes such as listField[index]
; sublists can be created using JavaScript style slices such as listField[start:end]
. Additionally negative indexes can be used to look up items from the end; listField[-2]
will return the next to last item in the list. This follows Python style syntax.
Arrays of values are supported, such as ["a","b","c"]
or [1,1+2,sum(a)]
. All array elements must be of the same type. You may select an element from the array using normal Javascript style indexes (i.e. ["a","b","c"][2]
) as defined in the preceding paragraph.
Expression primitives are numbers (with or without a decimal point), strings enclosed in double quotation marks, true
, false
, and field names (using dotted notation to specify nested fields). The currently supported functions are listed below.
Functions | |
---|---|
ceil(val) |
returns the ceiling of numeric value val null if val == null error if val is non-numeric/non-null |
contains(listValue, item) |
returns true if item is in the specified list |
contains(haystackString, needleString) |
returns true if the needle is contained in the haystack (both as strings) |
floor(val) |
returns the floor of numeric value val null if val == null error if val is non-numeric/non-null |
formatTime(goFormatString, timeInMillis) |
formats the given time using the go-style go format string examples. |
hash(string) |
returns a 32 bit hash value for the given string |
if(testExpression, ifTrue, [ifFalse]) |
returns the ifTrue value if testExpression is true returns the ifFalse value if testExpression is false
|
isNull(field) |
returns true if field is null or does not exist returns false otherwise |
len(listField) |
returns the length of the specified list |
list(field1, field2, field3) |
returns a list containing the named fields |
listInRange(listField, minValue, maxValue) |
returns a list containing the elements in listField which are between minValue and maxValue (inclusive) |
map() |
explained in greater detail below |
now() |
returns the current time in milliseconds past the epoch |
reverse(list) |
returns the specified list , reversed |
round(val) |
returns the the numeric value val rounded to the nearest integer returns null if val == null returns error if val is non-numeric/non-null |
split(s, sep) |
splits the string s into a list of strings using string sep as the field separator |
startOfPeriod(periodName, timeStamp) |
returns the first timestamp in the period which includes the passed timestamp period can be one of "hourly", "daily", "weekly", or "monthly" |
startsWith(haystack, needle) |
returns true if the string haystack begins with the string needle returns false if haystack is nil
|
sum(listValue) |
returns the sum of the items in the specified list, ignoring non-numeric elements |
The map()
function
Two parameter function
map(users, users.name)
When applied to the row
{
"users": [
{ "id": 5, "name": "mickey" },
{ "id": "2", "name": "minnie" }
]
}
Example response
["mickey", "minnie"]
The map()
function comes in two variants, which take either two or three parameters. The simpler form is map(listField, expression)
, and it applies the given expression to each item in the specified list, and returns a list of the results.
The listField
must specify a list of objects. This means a list of integers will not work.
> Three parameter function
map(x, names, len(x))
When applied to the row
{ "names": ["bugs", "daffy"] }
Example response
[ 4, 5 ]
The three parameter form of map()
is more general and works on lists of any type. While it works on any type, it's important to note that it processes lists of objects slower than the two parameter form.
The first parameter is an identifier which the third parameter (the expression) can use to reference the particular value being mapped.
Both versions will return null if the list parameter is null.
Source specification
Definition
{
"source": {
"sourceType": {SOURCE PARAMS}
},
"timeSeries": {
"first": timeInMsOrExpression,
"count": numberOfPeriods,
"period": periodName
}
}
The source
key must be specified as the first step in the pipeline unless spawn
is used; then no source
can exist. This tells the aggregation what data to feed into the pipeline.
Details | |
---|---|
sourceType | Source of data for the aggregation. |
Row source specification
Think of row sources as tables, to borrow a term from relational databases. Once you understand what’s available in the row sources, you’ll be well on your way to knowing what’s possible overall.
When you return row source data, you can slice and dice it in a number of ways.
The short answer to “what can we access with the API” is: all the major entities, day/hour summaries for page usage, and event level data for guide activity and polls.
The sourceType
specifies what kind of data is being aggregated. Each specific type has specific parameters. If no parameters are provided, the empty {}
can be replaced with null
.
Row sources specify what types of rows should go into the pipeline, and are normally parameterized. The time period for the rows is not specified here; it is part of the time series specification instead.
Sources | |
---|---|
events |
Returns summary data for all events on the system in a time period. Accepts an event class or list of event classes { "eventClass" : "web"|"ios"|["web","ios"] } to return, "web" by default. |
guideEvents |
Returns all guide events (guideSeen /dismissed /advanced ) for requested time period. Can be limited by guideId or guideId and guideStepId . |
guidesSeen |
Returns firstSeenAt , lastSeenAt , lastState , and seenCount for each visitor that saw the requested guide(s) in a time period. Parameter must be { "guideId" : guideId, "guideStepId" : guideStepId } ; leaving out guide (blank/null/unsupplied) shows rows for all steps on all guides, and leaving out step shows all steps on the requested guide. |
pageEvents |
Covers all pages unless pageId is provided to narrow the results to a single pageId. Returns results for requested time period. Accepts an event class or list of event classes { "eventClass" : "web"|"ios"|["web","ios"] } to return, "web" by default. |
pollEvents |
Returns all poll responses for requested time period. Can be limited by guideId or guideId and pollId . |
pollsSeen |
Returns poll responses in a time period. If visitor has multiple responses to the same poll, only the most recent is returned. Parameter must be { "guideId" : guideId, "pollId" : pollId }
|
visitors |
All visitors with their metadata. Parameter may be { "identified" : true } to eliminate anonymous visitors |
Time Series
Example time series for July 10, 2015 through July 12, 2015
{
"timeSeries": {
"period": "dayRange",
"first": 1436553419000,
"count": 3
}
}
Example time series for three hours ago to (and including) the current hour.
{
"timeSeries": {
"period": "hourRange",
"first": "now() - 3 * 60601000",
"last": "now()"
}
}
The timeSeries
consists of the length of each time period and a timestamp for the first period in milliseconds since the epoch. If first
is a string it will be evaluated as an expression to derive a timestamp.
It also requires either a count of the number of periods to include in the aggregation or an expression which evaluates to the last time period to include in the aggregation.
Details | |
---|---|
period |
dayRange or hourRange
|
first | The timestamp in milliseconds after the epoch for the start of period. |
count | Number of periods to include in the aggregation |
Full time periods are always evaluated. Even if the first
time does not coincide with the beginning of a period, all of the data for the period which contains time first
will be used.
For the dayRange
period, the day aligns with the time zone for your account. This does mean that there will occasionally by dayRange
periods which include 23 or 25 hours of data because of the switch to and from daylight savings time. If this is undesirable, hourRange
may be used, though there will be a significant performance impact when large amounts of data are being evaluated.
Events - grouped
The following row sources require a date range (relative or absolute) and period (day). They return a row of data for each unique combination of day/hour, visitorId, accountId, server name, and IP address.
Attributes | |
---|---|
events |
All recorded click and pageview events (tagged or untagged) |
pageEvents |
All recorded pageviews matching tagged pages |
The events
row source
Definition
{ "source": { "events": null, "timeSeries": { "first": "{timeInMsOrExpression}", "count": "{numberOfPeriods}", "period": "{periodName}" } } }
Example request
curl -X POST
https://adopt.pendo.io/api/v1/aggregation
-H 'content-type: application/json'
-H 'x-pendo-integration-key: [add_your_integration_key_here]'
-d '{"response":{"mimeType":"application/json"},"request":{"pipeline":[{"source":{"events":null,"timeSeries":{"first":"1506977216000","count":-30,"period":"dayRange"}}}]}}'
Example response
{ "results": [ { "accountId": "Kappa Trust", "visitorId": "59", "numEvents": 12, "numMinutes": 1, "server": "www.some-domain.com", "remoteIp": "59.89.251.103", "parameters": null, "userAgent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36", "day": 1506830400000, "pageId": "allevents" } ], "startTime": 1504324800000 }
- Returns a row for each unique combination of day/hour, visitorId, accountId, server name, IP address, and user agent.
- Encompasses all events, whether tagged or untagged (note the pageId “allevents”)
- Useful for measuring total time, days active, etc.
- Optionally pass an
eventClass
ofweb
oriOS
to specify which event bucket to return.
The pageEvents
row source
Definition
{ "source": { "pageEvents": { "pageId": "{pageId}" }, "timeSeries": { "first": "{timeInMsOrExpression}", "count": "{numberOfPeriods}", "period": "{periodName}" } } }
Example request
curl -X POST
https://adopt.pendo.io/api/v1/aggregation
-H 'content-type: application/json'
-H 'x-pendo-integration-key: [add_your_integration_key_here]'
-d '{"response":{"mimeType":"application/json"},"request":{"pipeline":[{"source":{"pageEvents":{"pageId":"aJ0KOADQZVL5h9zQhiQ0q26PNTM"},"timeSeries":{"first":"1506977216000","count":-30,"period":"dayRange"}}}]}}'
Example response
{ "results": [ { "accountId": "Acme Corp", "visitorId": "10", "numEvents": 1, "numMinutes": 1, "server": "www.some-domain.com", "remoteIp": "110.227.245.175", "parameters": null, "userAgent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36", "day": 1506484800000, "pageId": "aJ0KOADQZVL5h9zQhiQ0q26PNTM" } ] }
- Includes events which can be matched to tagged page rules
- Returns a row for each unique combination of page ID, day/hour, visitorId, accountId, server name, IP address, and user agent.
- Accepted parameters are
null
for all page data orpageId:PAGE_ID
for a specific page.
Events - ungrouped
For some events, we expose ungrouped events. These are visible for a specified time period.
Sources | |
---|---|
guideEvents |
Interactions your visitors have with guides |
guidesSeen |
Summary of when visitors see guides |
guidesSeenEver |
Summary of when visitors see guides |
pollEvents |
Interactions your visitors have with polls |
pollsSeen |
Summary of when and how visitors respond to polls |
pollsSeenEver |
Summary of when and how visitors respond to polls |
The guideEvents
row source
Definition
{ "source": { "guideEvents": null, "timeSeries": { "first": "{timeInMsOrExpression}", "count": "{numberOfPeriods}", "period": "{periodName}" } } }
Example request
curl -X POST
https://adopt.pendo.io/api/v1/aggregation
-H 'content-type: application/json'
-H 'x-pendo-integration-key: [add_your_integration_key_here]'
-d '{"response":{"mimeType":"application/json"},"request":{"pipeline":[{"source":{"guideEvents":null,"timeSeries":{"first":"1506977216000","count":-10,"period":"dayRange"}}}]}}'
Example response
{ "accountIds": [ "Acme" ], "browserTime": 1462796392383, "country": "US", "type": "guideAdvanced", "guideId": "He3BLNt44i0xwO5AEJVIijEswro", "guideStepId": "4TaMV6TLY1JmKIFkprqOcQllU8k", "latitude": 35.823483, "loadTime": 0, "longitude": -78.825562, "region": "nc", "remoteIp": "209.136.209.34", "serverName": "your.app.com", "userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/601.5.17 (KHTML, like Gecko) Version/9.1 Safari/601.5.17", "visitorId": "johndoe@acme.com", "accountId": "Acme" }
- Returns one event row for every guide event that occurs. This is not a summary or bucket structure like previous examples.
- Type field values include
guideAdvanced
,guideSeen
, andguideDismissed
- Accepted parameters include
guideId
andguideStepId
to drill down into a specific guide or step. -
pollEvents
behaves similiarly and provides one row for every poll response that occurs. You can specifypollId
as a parameter to that source.
The guideSeen
row source
Definition
{ "source": { "guidesSeen": { "guideId": "{guideId}", "guideStepId": "{guideStepId}" }, "timeSeries": { "first": "{timeInMsOrExpression}", "count": "{numberOfPeriods}", "period": "{periodName}" } } }
Exmaple request
curl -X POST
https://adopt.pendo.io/api/v1/aggregation
-H 'content-type: application/json'
-H 'x-pendo-integration-key: [add_your_integration_key_here]'
-d '{"response":{"mimeType":"application/json"},"request":{"pipeline":[{"source":{"guidesSeen":{"guideId":"nWtgylnMqWeB4DynO5duF9ako-M","guideStepId":"mHO55y6WJvJ0CnsmUgAtiGtgobA"},"timeSeries":{"first":"1506977216000","count":-10,"period":"dayRange"}}}]}}'
Example response
{ "firstSeenAt": 1489673759222, "guideId": "nWtgylnMqWeB4DynO5duF9ako-M", "guideStepId": "mHO55y6WJvJ0CnsmUgAtiGtgobA", "lastAdvancedAutoAt": 0, "lastDismissedAutoAt": 0, "lastSeenAt": 1489674347693, "lastState": "advanced", "seenCount": 5, "visitorId": "avery" }
- Returns a row for each unique combination of Visitor ID, guide ID, and step ID
- You can additionally supply
guideId
andguideStepId
to drill into a specific guide or step. - The
guidesSeenEver
row source can also be used for the same output, but is time insensitive and does not require a timeSeries.
The pollEvents
row source
Definition
{ "source": { "pollEvents": null, "timeSeries": { "first": "{timeInMsOrExpression}", "count": "{numberOfPeriods}", "period": "{periodName}" } } }
Example request
curl -X POST
https://adopt.pendo.io/api/v1/aggregation
-H 'content-type: application/json'
-H 'x-pendo-integration-key: [add_your_integration_key_here]'
-d '{"response":{"mimeType":"application/json"},"request":{"pipeline":[{"source":{"pollEvents":null,"timeSeries":{"first":"1506977216000","count":-10,"period":"dayRange"}}}]}}'
Example response
{ "accountIds": [ "Acme" ], "browserTime": 1462796392383, "country": "US", "type": "pollResponse", "guideId": "He3BLNt44i0xwO5A201IijEswro", "latitude": 35.823483, "loadTime": 0, "longitude": -78.825562, "pollId": "z7y94y62v3c", "region": "nc", "remoteIp": "209.136.209.34", "serverName": "your.app.com", "userAgent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36", "visitorId": "user@acme.com", "accountId": "Acme", "pollResponse": "You guys rock!" }
- Returns one row for every poll response that occurs.
- Accepted parameters include
guideId
andpollId
to drill down into a specific poll.
The pollsSeen
row source
Definition
{ "source": { "pollsSeen": { "guideId": "{guideId}", "pollId": "{pollId}" }, "timeSeries": { "first": "{timeInMsOrExpression}", "count": "{numberOfPeriods}", "period": "{periodName}" } } }
Example request
curl -X POST
https://adopt.pendo.io/api/v1/aggregation
-H 'content-type: application/json'
-H 'x-pendo-integration-key: [add_your_integration_key_here]'
-d '{"response":{"mimeType":"application/json"},"request":{"pipeline":[{"source":{"pollsSeen":{"guideId":"1wz09ZEAFp-HVl78Iw8oxpwbRZA","pollId":"3isny1qp7t2"},"timeSeries":{"first":"1506977216000","count":-10,"period":"dayRange"}}}]}}'
Example response
{ "accountId": "1", "guideId": "1wz09ZEAFp-HVl78Iw8oxpwbRZA", "pollId": "3isny1qp7t2", "response": 0, "time": 1500305598629, "visitorId": "1" }
- Returns poll responses in a time period
- Only the most recent poll response for a visitor is displayed i.e. if a visitor has responded to the same poll multiple times,
pollsSeen
will only return most recent response. To return all responses, usepollEvents
. - You must supply
guideId
andpollId
as parameters to drill into specific poll events - The
pollsSeenEver
row source can also be used for the same output, but is time insensitive and does not require a timeSeries.
Entities
Entities are the nouns of Aggregations.
Entities | |
---|---|
guides |
In-app messages you show your visitors |
pages |
Sets of rules that identify individual pages in your product (defined by a URL rule) |
visitors |
Your product’s users |
The guides
row source
Definition
{ "source": { "guides": null } }
Example request
curl -X POST
https://adopt.pendo.io/api/v1/aggregation
-H 'content-type: application/json'
-H 'x-pendo-integration-key: [add_your_integration_key_here]'
-d '{"response":{"mimeType":"application/json"},"request":{"pipeline":[{"source":{"guides":null}}]}}'
Example response
{ "attributes": { ... }, "audience": [ ... ], "createdAt": 1440449676899, "createdByUser": { "first": "Someone", "id": "5269565705551872", "last": "Lastname", "role": 8, "username": "someone@yourcompany.com" }, "expiresAfter": 1440472740000, "id": "BF_p40WvYWC2o81pvsdt1kjcIkI", "isMultiStep": false, "lastUpdatedAt": 1460734696429, "lastUpdatedByUser": { "first": "", "id": "4985730428305408", "last": "", "role": 8, "username": "someone@yourcompany.com" }, "launchMethod": "auto", "name": "2015-08-24 - Maintenance tonight.", "state": "disabled", "stateHistory": [ ... ], "steps": [ { "advanceMethod": "button", "attributes": { "height": 395, "width": 440 }, "content": "", "createdByUser": { "first": "Someone", "id": "5269565705551872", "last": "Lastname", "role": 8, "username": "someone@yourcompany.com" }, "elementPathRule": "", "guideId": "BF_p40WvYWC2o81pvsdt1kjcIkI", "id": "Itfe6aDRNtgx9J1XisQwaCxDS-A", "lastUpdatedAt": 1460734696429, "lastUpdatedByUser": { "first": "", "id": "4985730428305408", "last": "", "role": 8, "username": "someone@yourcompany.com" }, "name": "", "rank": 5000000, "resetAt": 0, "type": "lightbox" } ] }
steps
include a single item for lightboxes, tooltips, banners, etc. and multiple items for walkthroughs.
The pages
row source
Definition
{ "source": { "pages": null } }
Example request
curl -X POST
https://adopt.pendo.io/api/v1/aggregation
-H 'content-type: application/json'
-H 'x-pendo-integration-key: [add_your_integration_key_here]'
-d '{"response":{"mimeType":"application/json"},"request":{"pipeline":[{"source":{"pages":null}}]}}'
Example response
{ "results": [ { "color": "gray", "createdAt": 1505938241596, "createdByUser": { "first": "Adam", "id": "5197072786522112", "last": "Lohner", "role": 8, "username": "somePerson@email.com" }, "dirty": false, "group": { "color": ".groupColor01", "createdAt": 1505937447295, "createdByUser": { "first": "", "id": "", "last": "", "role": 0, "username": "" }, "description": "", "id": "DEFAULTGROUP_01_", "items": [ ], "lastUpdatedAt": 1505938327653, "lastUpdatedByUser": { "first": "Adam", "id": "5197072786522112", "last": "Lohner", "role": 8, "username": "somePerson@email.com" }, "length": 6, "name": "Dashboard" }, "id": "aJ0KOADQZVL5h9zQhiQ0q26PNTM", "kind": "Page", "lastUpdatedAt": 1505938241596, "lastUpdatedByUser": { "first": "Adam", "id": "5197072786522112", "last": "Lohner", "role": 8, "username": "somePerson@email.com" }, "name": "Dashboard", "rootVersionId": "aJ0KOADQZVL5h9zQhiQ0q26PNTM", "rules": [ { "designerHint": "http://some-domain.com/", "parsedRule": "^https?://[^/]/?(?:;[^#])?(?:\?[^#])?(?:#.)?$", "rule": "//*" } ], "rulesjson": "", "stableVersionId": "aJ0KOADQZVL5h9zQhiQ0q26PNTM-20170920201041.596687476", "validThrough": 1506981600000 } ] }
Tagged pages. Returns one row for each tagged page.
Visitors
Definition
{ "source": { "visitors": null } }
Example request
curl -X POST
https://adopt.pendo.io/api/v1/aggregation
-H 'content-type: application/json'
-H 'x-pendo-integration-key: [add_your_integration_key_here]'
-d '{"response":{"mimeType":"application/json"},"request":{"pipeline":[{"source":{"visitors":null}}]}}'
Example response
{ "results": [ { "metadata": { "agent": { "email": "ac@Cras.co.uk", "ipaddress": "184.169.45.4", "language": "en_US" }, "auto": { "accountid": "Epsilon united", "accountids": [ "Epsilon united" ], "firstvisit": 1506965912312, "id": "84", "idhash": 2012371633, "lastbrowsername": "Chrome", "lastbrowserversion": "61.0.3163", "lastoperatingsystem": "Mac OS X", "lastservername": "some-domain.com", "lastupdated": 1506976690418, "lastuseragent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36", "lastvisit": 1506976690112, "nid": 84 } }, "visitorId": "84" } ] }
The visitors
source returns one row per visitor ever seen and contains various metadata about the visitor. Each visitor has a unique visitorId
.
Accepted parameters are null
or {"identified":true}
to eliminate anonymous visitors.
Metadata | |
---|---|
auto | data automatically calculated |
custom | custom field data |
salesforce | data synced from Salesforce |
segment | data synced from Segment |
Pipeline
Example sytnax #1
{ "cat": null }
Example sytnax #2
{ "field": "firstName", "operator": "==", "value": "John" }
Each step is of the form { "name" : parameterObject }
.
The source rows are transformed into result rows by the pipeline; the final rows are the result of the aggregation step. Pipelines can be arbitrarily long, and are represented as an ordered list of individual steps.
Operators
Operators | |
---|---|
cat |
Takes no parameters, and returns each row without changing it. |
count |
Counts the number of input rows and returns a single row of the form { "count" : 456 }
|
accumulate |
Keeps a running total of more than one field, and includes the current value of the total in each row. Parameters are of the forum { "totalField1" : "sourceField1" }
|
eval |
Inserts new fields based on one or more expressions. Format is an { "fieldName1" : "expression" }
|
identified |
Eliminates rows whose named field looks like an anonymous visitorId { "identified" : "visitorId" } where visitorId is the field name to check. |
limit |
Returns the first N rows and discards the rest { "limit" : 5 }
|
select |
Builds a new row (while discarding the old one) containing one or more fields. Format is an { "fieldName1" : "expression" }
|
set |
Sets fields to constant values { "set" : { "field1" : 5, "field2.nested" : 20 } }
|
sort |
Sorts the rows by the specified field list { "sort" : [ "fieldName", "anotherFieldName" ] } Prepend a - to the front of a field name to reverse sort by that field. |
unmarshal |
Expands an embedded JSON string into sub-objects. |
useragent |
Expands a user agent string into a parsed set of strings. |
The spawn
operation
Definition
{ "spawn": [ [ { "source": { "events": null, "timeSeries ": { "period": "dayRange", "first": 1404745200000, "count": 7 } } }, { "cat": null } ], [ { "source": { "events": null, "timeSeries ": { "period": "month", "first": 1404745200000, "count": 7 } } }, { "count": null } ] ] }
The spawn
operator behaves like fork
, but each nested pipeline must specify its own source. This allows multiple sources to be used to create a single stream of rows which are processed by the remainder of the pipe.
Only sources which cause a single pipeline to be instantiated may be used; this means dayRange
, hourRange
, or any source whose count
is 1 is available.
The timeSeries
for the source is part of the source specification for the spawn
pipeline.
The switch
operation
Definition
{ "switch": { "newField": { "existingField": [ { "value": "small", "<": 10 }, { "value": "medium", ">=": 10, "<": 20 }, { "value": "large", ">=": 20, "<": 50 } ] } } }
The switch operation allows a field to checked against any number of conditions, and have a new field set in the row based on which condition the original field matches.
For each row the existingField
will be checked to see which set of conditions its value matches.
It must match all of the conditions for newField
to be set to the value indicated. If no conditions match, or if existingField
is not set at all, the row is skipped.
For the above input, the row { "existingField" : 15 }
will become { "existingField" : 15, "newField" : "medium" }
The filter
operation
Definition
{ "filter": "fieldName == 5 || len(anotherField) > 3" }
The filter
pipeline operator evaluates an expression and eliminates rows that for which the expression is false.
The fork
operation
The following calculates the total time for each
accountId
andvisitorId
{ "fork": [ [ { "group": { "group": [ "visitorId" ], "fields": [ { "time": { "sum": "numMinutes" } } ] } } ], [ { "group": { "group": [ "accountId" ], "fields": [ { "time": { "sum": "numMinutes" } } ] } } ] ] }
The fork
operator allows the pipeline to be split into multiple pipelines, with each of the child pipelines receiving all of the rows for that step.
Note that the same row is sent to all of the child pipelines; it is not a copy of the row and any changes to the row made in one pipeline will be visible in all pipelines.
The rows which result from each pipeline are all sent through the remainder of the parent pipeline. Forks may be nested, however this has not been well-tested (aka - no promises).
Empty pipelines (which would pass through the rows without modifying them) are not permitted, but a pipeline with a single cat
operation performs identically.
The join
operation
Definition
{ "join": { "fields": [ "firstName", "lastNme" ], "width": 3 } }
The join
operator merges multiple rows together based on one or more keys. If there are field conflicts, the values from rows encountered later in the join will win. The optional width
argument allows the user to tell join
how many rows are expected to make up the final result. Once that many rows with the same join key have been seen, the merged row is passed to the next operation in the pipeline. If more rows are later seen with that join key, they are treated as a new join operation. Any join keys which have had less than width
rows are processed once all input rows have been seen.
The group
operation
Example request to calculate how much time each identified visitor has spent on the site over 30 days
{ "requestId": "last30daysVisitorTime", "timeSeries": { "period": "dayRange", "count": 30, "first": 1404745200000 }, "source": { "events": null }, "pipeline": [ { "identified": "visitorId" }, { "group": { "group": [ "visitorId" ], "fields": [ { "minutes": { "sum": "numMinutes" } } ] } } ] }
NOTE: The count value of 180 has to be a long enough period to cover the entire time the guide has been in existence.
The group
operator supports flexible grouping (similar to GROUP BY
in SQL) and simple aggregations of fields. It supports the following parameters:
Parameters | |
---|---|
group |
List of fields whose values match in each grouping |
fields |
Set of fields whose values are aggregated across the group |
stringSort |
Result rows will be sorted by the group key. The group key is treated as a string for sort purposes (so "2" > "10"). This provides a stable ordering for the result. |
Each resulting row consists of all of the fields specified by both the group and fields specification.
Aggregation fields consist of field name to use in the output row, an aggregation name, and a field name from the input row. For example, to count the rows in a group use { "outputField" : { "count" : "inputField" } }
will count the number of rows which match (here the field name must be specified, but is irrelevant). The following row aggregators are supported:
Aggregators | |
---|---|
count |
Counts the number of rows in the group. If a field name is it counts the unique occurrences of that field. |
listAverage |
Average the individual values in a list arguement and return a list of those averages. |
max |
Returns the largest value of the specified (numeric) field. |
min |
Returns the smallest value of the specified (numeric) field. |
sum |
Adds up the values of the specified field. |
first |
Returns the first value of the specified field (note: "first" is only defined as deterministically as the input order is). |
Count the number of unique
visitorIds
use the following pipeline:
[ { "group": { "group": [ "visitorId" ] } }, { "visitorCount": { "count": null } } ]
To get the number of events for each
visitorId
use:
[ { "group": { "group": [ "visitorId" ], "fields": { "eventCount": { "count": "event" } } } } ]
The reduce
operation
Count the number of rows and add up the minutes fields in the rows:
[ { "reduce": { "eventCount": { "count": null }, "totalMinutes": { "sum": "numMinutes" } } } ]
The reduce
operator collects all rows into a single row, and performs aggregations on fields. The aggregators and format are identical to the fields
parameter for the group
pipeline operation.
Example request to calculate the total number of events and the total number of minutes for a thirty day period.
{ "requestId": "last30DaysEventAndMinuteTotals", "timeSeries": { "period": "dayRange", "count": 30, "first": 1404745200000 }, "source": { "events": null }, "pipeline": [ { "reduce": [ { "events": { "sum": "numEvents" } }, { "minutes": { "sum": "numMinutes" } } ] } ] }
Count the number of unique visitors in a dataset
{ "requestId": "numberOfUniqueVisitors", "timeSeries": { "period": "dayRange", "count": 30, "first": 1404745200000 }, "source": { "events": null }, "pipeline": [ { "reduce": [ { "reduce": { "visitors": { "count": "visitorId" } } } ] } ] }
Example request using
group
andreduce
to count the number of visitors in a set of events and the total number of minutes used by those visitor:
{ "requestId": "lastDaysEventAndMinuteTotals", "timeSeries": { "period": "dayRange", "count": 30, "first": 1404745200000 }, "source": { "events": null }, "pipeline": [ { "group": { "group": [ "visitorId" ], "fields": { "visitorMinutes": { "sum": "numMinutes" } } }, "reduce": { "visitorCount": { "count": "numVisitors" }, "totalMinutes": { "sum": "visitorMinutes" } } } ] }
The segment
operation
Definition
{ "segment": { "id": "id" } }
Segments are used to preselect groups of visitors. Preexisting segments filter rows via the segment
operator.
An input row is passed if the visitorId
in the input row match an item in the segment.
The bulkExpand
operation
Expand visitors into a subdocument called
info
{ "bulkExpand": { "info": { "visitor": "visitorId" } } }
The following data types support expand: account
and visitor
.
The unwind
operation
To
unwind
the following:
{
"name": "fred",
"items": [
{ "v": "first" },
{ "v": "second" },
{ "v": "third" }
]
}
Try:
{
"unwind": {
"field": "items",
"index": "itemIndex"
}
}
Response would yield:
[{
"name": "fred",
"items": { "v": "first" },
"itemIndex": 0
}, {
"name": "fred",
"items": { "v": "second" },
"itemIndex": 1
}, {
"name": "fred",
"items": { "v": "third" },
"itemIndex": 2
}]
The unwind
operator expands a list of items into separate rows. All other items are preserved, and an index may optionally be added.
Add
"prefix" : true
and you will get:
[
{
"name": "fred",
"items": [{ "v": "first" }],
"itemIndex": 0
}, {
"name": "fred",
"items": [{ "v": "first" }, { "v": "second" }],
"itemIndex": 1
}, {
"name": "fred",
"items": [{ "v": "first" }, { "v": "second" }, { "v": "third" }],
"itemIndex": 2
}
]
Alternatively, if everything to the left of each items is important, unwind
can preserve those items.
If the list to unwind is empty then no rows will be emitted unless { "keepEmpty" : true }
is specified, in which case the original row will be preserved.
The useragent
operation
Definition
{ "useragent": { "output": "input" } }
The useragent
operation parses the user agent string in the field input
and renders it into a object in output
.
Output | |
---|---|
name |
the browser's name (e.g. "MSIE", "Chrome", etc.) |
version |
the version of the browser |
os |
the operating system (e.g. "Windows", "Mac OS X", "GNU/Linux", etc.) |
type |
the type of user agent (e.g. "Browser", "Crawler", etc.) |
Multi-App
Using the
pages
source as an example, without querying for a specific app, thesource
looks like this:
{ "source": { "pages": null } }
To execute Aggregations specific to an app, you'll need to change the source
value within the aggregation from null
- which is an indicator to return all records for all apps associated with the sub - to include an appId
value.
Including the appId
scopes the results to the data associated with just that application within the subscription.
For all multi-app Aggregations, use the primary app's Integration API key for the value for x-pendo-integration-key
in the Header.
appId
To query on a specific app,
source
will look like this:
{ "source": { "pages": { "appId": 5629499534213120 } } }