# Aggregation API Documentation The Aggregation API can be reached via the following endpoint: ```plaintext https://{WHITE_LABEL_URL}/api/v1/aggregation ``` Aggregations are a query language for accessing your data. They take sources of data and apply operators to do computations, and are written in JSON. It is worth noting, the aggregations endpoint is not intended to be a bulk export feature, so breaking up aggregations by different time ranges may be necessary if you hit sizing or timeout limits. Data querying is performed using a flexible aggregation pipeline modeled after MongoDB. Each step of the pipeline takes rows as its input and output zero or more rows. The basic unit is a row which contains named fields (possible nested) fields. At a high level, an aggregation consists of - A row source and time specification - A pipeline for row processing All input data rows for the specified time period are fed into the aggregation pipeline. The rows which those pipeline yields are the result of the aggregation; a pipeline may result in no rows or in many rows depending on the pipeline and the input. If the time specification expands into 24 separate time periods (one for each hour of a day, perhaps) then 24 independent pipelines are run, and the aggregation will have 24 sets of results. Each of those results in independent of the others. ## Nested Fields > Example structure ```json { "name": { "first": "bugs", "last": "bunny" } } ``` Wherever a field name is needed, a dotted list of field names can be used to specify nested objects. Nested specifiers can be used to create new fields as well; any missing intermediate objects will be initialized. In the example to the right, the field `name.last` has the value bunny. ## Data Stores | Data Stores | | | ----------- | ------- | | Visitors | General information on all visitors ever seen. | | Events | Each interaction with your application by a visitor. This includes click, page load, metadata, and guide events. The definitions of **pages** and **guides**. | ## Expressions A number of pipeline operators support expressions. Expressions follow basic C expression syntax along with function calls. Supported operators are `==`, `!=`, `&&`, `||`, `<`, `>`, `<=`, `>=`, `+`, `-`, `/`, `*`, `%`, and `!`. All numeric value are converted to float's for computational purposes, and C order of operation is used. Parenthesis `()` may be used for grouping expressions. The comparison operators can be used to compare strings, the logical operators and comparison work on boolean values, and the equality operators can compare two lists. For the `||` and `&&` operators, `null` is treated equivalently to `false`. The `==` and `!=` operators treat `null` as a distinct value (so `null == null` but `null != false`) and any other operator which has `null` as a parameter returns `null` as its value (so `1 + null == null`). Looking up a field that does not exist in a row results in `null`. Values may be extracted from list by using normal JavaScript style indexes such as `listField[index]`; sublists can be created using JavaScript style slices such as `listField[start:end]`. Additionally negative indexes can be used to look up items from the end; `listField[-2]` will return the next to last item in the list. This follows Python style syntax. Arrays of values are supported, such as `["a","b","c"]` or `[1,1+2,sum(a)]`. All array elements must be of the same type. You may select an element from the array using normal Javascript style indexes (i.e. `["a","b","c"][2]`) as defined in the preceding paragraph. Expression primitives are numbers (with or without a decimal point), strings enclosed in double quotation marks, `true`, `false`, and field names (using dotted notation to specify nested fields). The currently supported functions are listed below. | Functions | | | --------- | ----------- | | `ceil(val)` | returns the ceiling of numeric value val
`null` if `val == null`
`error` if `val` is non-numeric/non-null | | `contains(listValue, item)` | returns `true` if `item` is in the specified list | | `contains(haystackString, needleString)` | returns `true` if the `needle` is contained in the haystack (both as strings) | | `floor(val)` | returns the floor of numeric value `val`
`null` if `val == null`
error if `val` is non-numeric/non-null | | `formatTime(goFormatString, timeInMillis)` | formats the given time using the go-style go format string examples. | | `hash(string)` | returns a 32 bit hash value for the given string | | `if(testExpression, ifTrue, [ifFalse])` | returns the `ifTrue` value if `testExpression` is `true`
returns the `ifFalse` value if `testExpression` is `false` | | `isNull(field)` | returns `true` if `field` is null or does not exist
returns `false` otherwise | | `len(listField)` | returns the length of the specified list | | `list(field1, field2, field3)` | returns a list containing the named fields | | `listInRange(listField, minValue, maxValue)` | returns a list containing the elements in `listField` which are between `minValue` and `maxValue` (inclusive) | | `map()` | explained in greater detail below | | `now()` | returns the current time in milliseconds past the epoch | | `reverse(list)` | returns the specified `list`, reversed | | `round(val)` | returns the the numeric value `val` rounded to the nearest integer
returns `null` if `val == null`
returns `error` if `val` is non-numeric/non-null | | `split(s, sep)` | splits the string `s` into a list of strings using string `sep` as the field separator | | `startOfPeriod(periodName, timeStamp)` | returns the first `timestamp` in the period which includes the passed `timestamp`
`period` can be one of "hourly", "daily", "weekly", or "monthly" | | `startsWith(haystack, needle)` | returns `true` if the string `haystack` begins with the string `needle`
returns `false` if `haystack` is `nil` | | `sum(listValue)` | returns the `sum` of the items in the specified list, ignoring non-numeric elements | ### The `map()` function > Two parameter function ```js map(users, users.name) ``` > When applied to the row ```json { "users": [{ "id": 5, "name": "mickey" }, { "id": "2", "name": "minnie" }] } ``` > Example response ```json ["mickey", "minnie"] ``` The `map()` function comes in two variants, which take either two or three parameters. The simpler form is `map(listField, expression)`, and it applies the given expression to each item in the specified list, and returns a list of the results. The `listField` must specify a list of objects. This means a list of integers will not work.
> Three parameter function ```js map(x, names, len(x)) ``` > When applied to the row ```json { "names": ["bugs", "daffy"] } ``` > Example response ```json [ 4, 5 ] ``` The three parameter form of `map()` is more general and works on lists of any type. While it works on any type, it's important to note that it processes lists of objects slower than the two parameter form. The first parameter is an identifier which the third parameter (the expression) can use to reference the particular value being mapped. Both versions will return null if the list parameter is null. ## Source specification > Definition ```json { "source": { "sourceType": {SOURCE PARAMS} }, "timeSeries": { "first": timeInMsOrExpression, "count": numberOfPeriods, "period": periodName } } ``` The `source` key must be specified as the first step in the pipeline unless `spawn` is used; then no `source` can exist. This tells the aggregation what data to feed into the pipeline. | Details | | | ------- | ------- | | sourceType | Source of data for the aggregation. | ### Row source specification Think of row sources as tables, to borrow a term from relational databases. Once you understand what’s available in the row sources, you’ll be well on your way to knowing what’s possible overall. When you return row source data, you can slice and dice it in a number of ways. The short answer to “what can we access with the API” is: all the major entities, day/hour summaries for page usage, and event level data for guide activity and polls. The `sourceType` specifies what kind of data is being aggregated. Each specific type has specific parameters. If no parameters are provided, the empty `{}` can be replaced with `null`. Row sources specify what types of rows should go into the pipeline, and are normally parameterized. The time period for the rows is not specified here; it is part of the time series specification instead. | Sources | | | ----------- | -------- | | `events` | Returns summary data for all events on the system in a time period. Accepts an event class or list of event classes `{ "eventClass" : "web"|"ios"|["web","ios"] }` to return, "web" by default. | | `guideEvents` | Returns all guide events (`guideSeen`/`dismissed`/`advanced`) for requested time period. Can be limited by `guideId` or `guideId` and `guideStepId`. | | `guidesSeen` | Returns `firstSeenAt`, `lastSeenAt`, `lastState`, and `seenCount` for each visitor that saw the requested guide(s) in a time period. Parameter must be `{ "guideId" : guideId, "guideStepId" : guideStepId }`; leaving out guide (blank/null/unsupplied) shows rows for all steps on all guides, and leaving out step shows all steps on the requested guide. | | `pageEvents` | Covers all pages unless `pageId` is provided to narrow the results to a single pageId. Returns results for requested time period. Accepts an event class or list of event classes `{ "eventClass" : "web"|"ios"|["web","ios"] }` to return, "web" by default. | | `pollEvents` | Returns all poll responses for requested time period. Can be limited by `guideId` or `guideId` and `pollId`. | | `pollsSeen` | Returns poll responses in a time period. If visitor has multiple responses to the same poll, only the **most recent** is returned. Parameter must be `{ "guideId" : guideId, "pollId" : pollId }` | | `visitors` | All visitors with their metadata. Parameter may be `{ "identified" : true }` to eliminate anonymous visitors | | `visitorTags` | Parameter may be `{ "tagId" : tagId }` to limit results to a single `visitorTag`. Returns `{ "visitorId" : visitorId , "visitorTags": { tagId : true , ... }}` for each visitor with the requested tag (listing only the requested tag), or if none specified, any currently valid `visitorTag` (listing all tags on each). | ## Time Series > Example time series for July 10, 2015 through July 12, 2015 ```json { "timeSeries": { "period": "dayRange", "first": 1436553419000, "count": 3 } } ``` > Example time series for three hours ago to (and including) the current hour. ```json { "timeSeries": { "period": "hourRange", "first": "now() - 3 * 60*60*1000", "last": "now()" } } ``` The `timeSeries` consists of the length of each time period and a timestamp for the first period in milliseconds since the [epoch](https://www.epochconverter.com/). If `first` is a string it will be evaluated as an expression to derive a timestamp. It also requires either a count of the number of periods to include in the aggregation or an expression which evaluates to the last time period to include in the aggregation. | Details | | | ------- | ------- | | period | `dayRange` or `hourRange` | | first | The timestamp in milliseconds after the [epoch](https://www.epochconverter.com/) for the start of period. | | count | Number of periods to include in the aggregation | Full time periods are always evaluated. Even if the `first` time does not coincide with the beginning of a period, all of the data for the period which contains time `first` will be used. For the `dayRange` period, the day aligns with the time zone for your account. This does mean that there will occasionally by `dayRange` periods which include 23 or 25 hours of data because of the switch to and from daylight savings time. If this is undesirable, `hourRange` may be used, though there will be a significant performance impact when large amounts of data are being evaluated. ## Events - grouped The following row sources require a date range (relative or absolute) and period (day). They return a row of data for each unique combination of day/hour, visitorId, accountId, server name, and IP address. | Attributes | | | -------------- | ----------- | | `events` | All recorded click and pageview events (tagged or untagged) | | `pageEvents` | All recorded pageviews matching tagged pages | ### The `events` row source > Definition ```json { "source": { "events": null, "timeSeries": { "first": "{timeInMsOrExpression}", "count": "{numberOfPeriods}", "period": "{periodName}" } } } ``` > Example request ```bash curl -X POST \ https://{WHITE_LABEL_URL}/api/v1/aggregation \ -H 'content-type: application/json' \ -H 'x-api-integration-key: [add_your_integration_key_here]' \ -d '{"response":{"mimeType":"application/json"},"request":{"pipeline":[{"source":{"events":null,"timeSeries":{"first":"1506977216000","count":-30,"period":"dayRange"}}}]}}' ``` > Example response ```json { "results": [{ "accountId": "Kappa Trust", "visitorId": "59", "numEvents": 12, "numMinutes": 1, "server": "www.some-domain.com", "remoteIp": "59.89.251.103", "parameters": null, "userAgent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36", "day": 1506830400000, "pageId": "allevents" }], "startTime": 1504324800000 } ``` - Returns a row for each unique combination of day/hour, visitorId, accountId, server name, IP address, and user agent. - Encompasses all events, whether tagged or untagged (note the pageId “allevents”) - Useful for measuring total time, days active, etc. - Optionally pass an `eventClass` of `web` or `iOS` to specify which event bucket to return. ### The `pageEvents` row source > Definition ```json { "source": { "pageEvents": { "pageId": "{pageId}" }, "timeSeries": { "first": "{timeInMsOrExpression}", "count": "{numberOfPeriods}", "period": "{periodName}" } } } ``` > Example request ```bash curl -X POST \ https://{WHITE_LABEL_URL}/api/v1/aggregation \ -H 'content-type: application/json' \ -H 'x-api-integration-key: [add_your_integration_key_here]' \ -d '{"response":{"mimeType":"application/json"},"request":{"pipeline":[{"source":{"pageEvents":{"pageId":"aJ0KOADQZVL5h9zQhiQ0q26PNTM"},"timeSeries":{"first":"1506977216000","count":-30,"period":"dayRange"}}}]}}' ``` > Example response ```json { "results": [{ "accountId": "Acme Corp", "visitorId": "10", "numEvents": 1, "numMinutes": 1, "server": "www.some-domain.com", "remoteIp": "110.227.245.175", "parameters": null, "userAgent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36", "day": 1506484800000, "pageId": "aJ0KOADQZVL5h9zQhiQ0q26PNTM" } ] } ``` - Includes events which can be matched to tagged page rules - Returns a row for each unique combination of page ID, day/hour, visitorId, accountId, server name, IP address, and user agent. - Accepted parameters are `null` for all page data or `pageId:PAGE_ID` for a specific page. - Remember `expand` if you need additional page data ## Events - ungrouped For some events, we expose ungrouped events. These are visible for a specified time period. | Sources | | | ----------- | -------- | | `guideEvents` | Interactions your visitors have with guides | | `guidesSeen` | Summary of when visitors see guides | | `guidesSeenEver` | Summary of when visitors see guides | | `pollEvents` | Interactions your visitors have with polls | | `pollsSeen` | Summary of when and how visitors respond to polls | | `pollsSeenEver` | Summary of when and how visitors respond to polls | ### The `guideEvents` row source > Definition ```json { "source": { "guideEvents": null, "timeSeries": { "first": "{timeInMsOrExpression}", "count": "{numberOfPeriods}", "period": "{periodName}" } } } ``` > Example request ```bash curl -X POST \ https://{WHITE_LABEL_URL}/api/v1/aggregation \ -H 'content-type: application/json' \ -H 'x-api-integration-key: [add_your_integration_key_here]' \ -d '{"response":{"mimeType":"application/json"},"request":{"pipeline":[{"source":{"guideEvents":null,"timeSeries":{"first":"1506977216000","count":-10,"period":"dayRange"}}}]}}' ``` > Example response ```js { "accountIds": [ "Acme" ], "browserTime": 1462796392383, "country": "US", "type": "guideAdvanced", "guideId": "He3BLNt44i0xwO5AEJVIijEswro", "guideStepId": "4TaMV6TLY1JmKIFkprqOcQllU8k", "latitude": 35.823483, "loadTime": 0, "longitude": -78.825562, "region": "nc", "remoteIp": "209.136.209.34", "serverName": "your.app.com", "userAgent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/601.5.17 (KHTML, like Gecko) Version/9.1 Safari/601.5.17", "visitorId": "johndoe@acme.com", "accountId": "Acme" } ``` - Returns one event row for **every** guide event that occurs. This is not a summary or bucket structure like previous examples. - Type field values include `guideAdvanced`, `guideSeen`, and `guideDismissed` - Accepted parameters include `guideId` and `guideStepId` to drill down into a specific guide or step. - `pollEvents` behaves similiarly and provides one row for every poll response that occurs. You can specify `pollId` as a parameter to that source. ### The `guideSeen` row source > Definition ```json { "source": { "guidesSeen": { "guideId": "{guideId}", "guideStepId": "{guideStepId}" }, "timeSeries": { "first": "{timeInMsOrExpression}", "count": "{numberOfPeriods}", "period": "{periodName}" } } } ``` > Exmaple request ```bash curl -X POST \ https://{WHITE_LABEL_URL}/api/v1/aggregation \ -H 'content-type: application/json' \ -H 'x-api-integration-key: [add_your_integration_key_here]' \ -d '{"response":{"mimeType":"application/json"},"request":{"pipeline":[{"source":{"guidesSeen":{"guideId":"nWtgylnMqWeB4DynO5duF9ako-M","guideStepId":"mHO55y6WJvJ0CnsmUgAtiGtgobA"},"timeSeries":{"first":"1506977216000","count":-10,"period":"dayRange"}}}]}}' ``` > Example response ```json { "firstSeenAt": 1489673759222, "guideId": "nWtgylnMqWeB4DynO5duF9ako-M", "guideStepId": "mHO55y6WJvJ0CnsmUgAtiGtgobA", "lastAdvancedAutoAt": 0, "lastDismissedAutoAt": 0, "lastSeenAt": 1489674347693, "lastState": "advanced", "seenCount": 5, "visitorId": "avery" } ``` - Returns a row for each unique combination of Visitor ID, guide ID, and step ID - You can additionally supply `guideId` and `guideStepId` to drill into a specific guide or step. - The `guidesSeenEver` row source can also be used for the same output, but is time insensitive and does not require a timeSeries. ### The `pollEvents` row source > Definition ```json { "source": { "pollEvents": null, "timeSeries": { "first": "{timeInMsOrExpression}", "count": "{numberOfPeriods}", "period": "{periodName}" } } } ``` > Example request ```bash curl -X POST \ https://{WHITE_LABEL_URL}/api/v1/aggregation \ -H 'content-type: application/json' \ -H 'x-api-integration-key: [add_your_integration_key_here]' \ -d '{"response":{"mimeType":"application/json"},"request":{"pipeline":[{"source":{"pollEvents":null,"timeSeries":{"first":"1506977216000","count":-10,"period":"dayRange"}}}]}}' ``` > Example response ```js { "accountIds": [ "Acme" ], "browserTime": 1462796392383, "country": "US", "type": "pollResponse", "guideId": "He3BLNt44i0xwO5A201IijEswro", "latitude": 35.823483, "loadTime": 0, "longitude": -78.825562, "pollId": "z7y94y62v3c", "region": "nc", "remoteIp": "209.136.209.34", "serverName": "your.app.com", "userAgent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36", "visitorId": "user@acme.com", "accountId": "Acme", "pollResponse": "You guys rock!" } ``` - Returns one row for **every** poll response that occurs. - Accepted parameters include `guideId` and `pollId` to drill down into a specific poll. ### The `pollsSeen` row source > Definition ```json { "source": { "pollsSeen": { "guideId": "{guideId}", "pollId": "{pollId}" }, "timeSeries": { "first": "{timeInMsOrExpression}", "count": "{numberOfPeriods}", "period": "{periodName}" } } } ``` > Example request ```bash curl -X POST \ https://{WHITE_LABEL_URL}/api/v1/aggregation \ -H 'content-type: application/json' \ -H 'x-api-integration-key: [add_your_integration_key_here]' \ -d '{"response":{"mimeType":"application/json"},"request":{"pipeline":[{"source":{"pollsSeen":{"guideId":"1wz09ZEAFp-HVl78Iw8oxpwbRZA","pollId":"3isny1qp7t2"},"timeSeries":{"first":"1506977216000","count":-10,"period":"dayRange"}}}]}}' ``` > Example response ```json { "accountId": "1", "guideId": "1wz09ZEAFp-HVl78Iw8oxpwbRZA", "pollId": "3isny1qp7t2", "response": 0, "time": 1500305598629, "visitorId": "1" } ``` - Returns poll responses in a time period - Only the **most recent** poll response for a visitor is displayed i.e. if a visitor has responded to the same poll multiple times, `pollsSeen` will only return most recent response. To return **all** responses, use `pollEvents`. - You must supply `guideId` and `pollId` as parameters to drill into specific poll events - The `pollsSeenEver` row source can also be used for the same output, but is time insensitive and does not require a timeSeries. ## Entities Entities are the nouns of Aggregations. | Entities | | | -------- | -------- | | `guides` | In-app messages you show your visitors | | `pages` | Sets of rules that identify individual pages in your product (defined by a URL rule) | | `visitors` | Your product’s users | ### The `guides` row source > Definition ```json { "source": { "guides": null } } ``` > Example request ```bash curl -X POST \ https://{WHITE_LABEL_URL}/api/v1/aggregation \ -H 'content-type: application/json' \ -H 'x-api-integration-key: [add_your_integration_key_here]' \ -d '{"response":{"mimeType":"application/json"},"request":{"pipeline":[{"source":{"guides":null}}]}}' ``` > Example response ```json { "attributes": { ... }, "audience": [ ... ], "createdAt": 1440449676899, "createdByUser": { "first": "Someone", "id": "5269565705551872", "last": "Lastname", "role": 8, "username": "someone@yourcompany.com" }, "expiresAfter": 1440472740000, "id": "BF_p40WvYWC2o81pvsdt1kjcIkI", "isMultiStep": false, "lastUpdatedAt": 1460734696429, "lastUpdatedByUser": { "first": "", "id": "4985730428305408", "last": "", "role": 8, "username": "someone@yourcompany.com" }, "launchMethod": "auto", "name": "2015-08-24 - Maintenance tonight.", "state": "disabled", "stateHistory": [ ... ], "steps": [{ "advanceMethod": "button", "attributes": { "height": 395, "width": 440 }, "content": "", "createdByUser": { "first": "Someone", "id": "5269565705551872", "last": "Lastname", "role": 8, "username": "someone@yourcompany.com" }, "elementPathRule": "", "guideId": "BF_p40WvYWC2o81pvsdt1kjcIkI", "id": "Itfe6aDRNtgx9J1XisQwaCxDS-A", "lastUpdatedAt": 1460734696429, "lastUpdatedByUser": { "first": "", "id": "4985730428305408", "last": "", "role": 8, "username": "someone@yourcompany.com" }, "name": "", "rank": 5000000, "resetAt": 0, "type": "lightbox" }] } ``` `steps` include a single item for lightboxes, tooltips, banners, etc. and multiple items for walkthroughs. ### The `pages` row source > Definition ```json { "source": { "pages": null } } ``` > Example request ```bash curl -X POST \ https://{WHITE_LABEL_URL}/api/v1/aggregation \ -H 'content-type: application/json' \ -H 'x-api-integration-key: [add_your_integration_key_here]' \ -d '{"response":{"mimeType":"application/json"},"request":{"pipeline":[{"source":{"pages":null}}]}}' ``` > Example response ```json { "results": [{ "color": "gray", "createdAt": 1505938241596, "createdByUser": { "first": "Adam", "id": "5197072786522112", "last": "Lohner", "role": 8, "username": "somePerson@email.com" }, "dirty": false, "group": { "color": ".groupColor01", "createdAt": 1505937447295, "createdByUser": { "first": "", "id": "", "last": "", "role": 0, "username": "" }, "description": "", "id": "DEFAULTGROUP_01_", "items": [], "lastUpdatedAt": 1505938327653, "lastUpdatedByUser": { "first": "Adam", "id": "5197072786522112", "last": "Lohner", "role": 8, "username": "somePerson@email.com" }, "length": 6, "name": "Dashboard" }, "id": "aJ0KOADQZVL5h9zQhiQ0q26PNTM", "kind": "Page", "lastUpdatedAt": 1505938241596, "lastUpdatedByUser": { "first": "Adam", "id": "5197072786522112", "last": "Lohner", "role": 8, "username": "somePerson@email.com" }, "name": "Dashboard", "rootVersionId": "aJ0KOADQZVL5h9zQhiQ0q26PNTM", "rules": [{ "designerHint": "http://some-domain.com/", "parsedRule": "^https?://[^/]*/?(?:;[^#]*)?(?:\\?[^#]*)?(?:#.*)?$", "rule": "//*" }], "rulesjson": "", "stableVersionId": "aJ0KOADQZVL5h9zQhiQ0q26PNTM-20170920201041.596687476", "validThrough": 1506981600000 }] } ``` Tagged pages. Returns one row for each tagged page. > Definition using `expand` ```json [{ "source": { "pages": null } }, { "expand": { "page": { "page": "pageId" } } } ] ``` Although pages can be accessed directly as a source, for convenience we recommend accessing page entity data using the `expand` pipeline map operator. This is handy for including page data when using other sources (like `pageEvents`). ### Visitors > Definition ```json { "source": { "visitors": null } } ``` > Example request ```bash curl -X POST \ https://{WHITE_LABEL_URL}/api/v1/aggregation \ -H 'content-type: application/json' \ -H 'x-api-integration-key: [add_your_integration_key_here]' \ -d '{"response":{"mimeType":"application/json"},"request":{"pipeline":[{"source":{"visitors":null}}]}}' ``` > Example response ```json { "results": [{ "metadata": { "agent": { "email": "ac@Cras.co.uk", "ipaddress": "184.169.45.4", "language": "en_US" }, "auto": { "accountid": "Epsilon united", "accountids": [ "Epsilon united" ], "firstvisit": 1506965912312, "id": "84", "idhash": 2012371633, "lastbrowsername": "Chrome", "lastbrowserversion": "61.0.3163", "lastoperatingsystem": "Mac OS X", "lastservername": "some-domain.com", "lastupdated": 1506976690418, "lastuseragent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36", "lastvisit": 1506976690112, "nid": 84 } }, "visitorId": "84" }] } ``` The `visitors` source returns one row per visitor ever seen and contains various metadata about the visitor. Each visitor has a unique `visitorId`. Accepted parameters are `null` or `{"identified":true}` to eliminate anonymous visitors. | Metadata | | | -------- | ----------- | | auto | data automatically calculated | | custom | custom field data | | salesforce | data synced from Salesforce | | segment | data synced from Segment | > Definition using `expand` ```json [{ "source": { "pages": null } }, { "expand": { "visitor": { "visitor": "visitorId" } } } ] ``` Visitor data can also be accessed using the `expand` pipeline map operator. This is handy for including visitor data when using other sources (e.g. `pageEvents` sources). ## Pipeline > Example sytnax #1 ```json { "cat": null } ``` > Example sytnax #2 ```json { "field": "firstName", "operator": "==", "value": "John" } ``` Each step is of the form `{ "name" : parameterObject }`. The source rows are transformed into result rows by the pipeline; the final rows are the result of the aggregation step. Pipelines can be arbitrarily long, and are represented as an ordered list of individual steps. ## Operators | Operators | | | --------- | ------- | | `cat` | Takes no parameters, and returns each row without changing it. | | `count` | Counts the number of input rows and returns a single row of the form `{ "count" : 456 }` | | `accumulate` | Keeps a running total of more than one field, and includes the current value of the total in each row. Parameters are of the forum `{ "totalField1" : "sourceField1" }` | | `eval` | Inserts new fields based on one or more expressions. Format is an `{ "fieldName1" : "expression" }` | | `identified` | Eliminates rows whose named field looks like an anonymous visitorId `{ "identified" : "visitorId" }` where visitorId is the field name to check. | | `limit` | Returns the first N rows and discards the rest `{ "limit" : 5 }` | | `select` | Builds a new row (while discarding the old one) containing one or more fields. Format is an `{ "fieldName1" : "expression" }` | `set` | Sets fields to constant values `{ "set" : { "field1" : 5, "field2.nested" : 20 } }` | | `sort` | Sorts the rows by the specified field list `{ "sort" : [ "fieldName", "anotherFieldName" ] }` Prepend `a -` to the front of a field name to reverse sort by that field. | | `unmarshal` | Expands an embedded JSON string into sub-objects. | | `useragent` | Expands a user agent string into a parsed set of strings. | ### The `spawn` operation > Definition ```json { "spawn": [ [{ "source": { "events": null, "timeSeries ": { "period": "dayRange", "first": 1404745200000, "count": 7 } } }, { "cat": null } ], [{ "source": { "events": null, "timeSeries ": { "period": "month", "first": 1404745200000, "count": 7 } } }, { "count": null } ] ] } ``` The `spawn` operator behaves like `fork`, but each nested pipeline must specify its own source. This allows multiple sources to be used to create a single stream of rows which are processed by the remainder of the pipe. Only sources which cause a single pipeline to be instantiated may be used; this means `dayRange`, `hourRange`, or any source whose `count` is 1 is available. The `timeSeries` for the source is part of the source specification for the `spawn` pipeline. ### The `switch` operation > Definition ```json { "switch": { "newField": { "existingField": [{ "value": "small", "<": 10 }, { "value": "medium", ">=": 10, "<": 20 }, { "value": "large", ">=": 20, "<": 50 } ] } } } ``` The switch operation allows a field to checked against any number of conditions, and have a new field set in the row based on which condition the original field matches. For each row the `existingField` will be checked to see which set of conditions its value matches. It must match all of the conditions for `newField` to be set to the value indicated. If no conditions match, or if `existingField` is not set at all, the row is skipped. For the above input, the row `{ "existingField" : 15 }` will become `{ "existingField" : 15, "newField" : "medium" }` ### The `filter` operation > Definition ```json { "filter": "fieldName == 5 || len(anotherField) > 3" } ``` The `filter` pipeline operator evaluates an expression and eliminates rows that for which the expression is false. ### The `fork` operation > The following calculates the total time for each `accountId` and `visitorId` ```json { "fork": [ [{ "group": { "group": [ "visitorId" ], "fields": [{ "time": { "sum": "numMinutes" } }] } }], [{ "group": { "group": [ "accountId" ], "fields": [{ "time": { "sum": "numMinutes" } }] } }] ] } ``` The `fork` operator allows the pipeline to be split into multiple pipelines, with each of the child pipelines receiving all of the rows for that step. Note that the same row is sent to all of the child pipelines; it is not a copy of the row and any changes to the row made in one pipeline will be visible in all pipelines. The rows which result from each pipeline are all sent through the remainder of the parent pipeline. Forks may be nested, however this has not been well-tested (aka - no promises). Empty pipelines (which would pass through the rows without modifying them) are not permitted, but a pipeline with a single `cat` operation performs identically. ### The `join` operation > Definition ```json { "join": { "fields": [ "firstName", "lastNme" ], "width": 3 } } ``` The `join` operator merges multiple rows together based on one or more keys. If there are field conflicts, the values from rows encountered later in the join will win. The optional `width` argument allows the user to tell `join` how many rows are expected to make up the final result. Once that many rows with the same join key have been seen, the merged row is passed to the next operation in the pipeline. If more rows are later seen with that join key, they are treated as a new join operation. Any join keys which have had less than `width` rows are processed once all input rows have been seen. ### The `group` operation > Example request to calculate how much time each identified visitor has spent on the site over 30 days ```json { "requestId": "last30daysVisitorTime", "timeSeries": { "period": "dayRange", "count": 30, "first": 1404745200000 }, "source": { "events": null }, "pipeline": [{ "identified": "visitorId" }, { "group": { "group": [ "visitorId" ], "fields": [{ "minutes": { "sum": "numMinutes" } }] } } ] } ``` > **NOTE:** The count value of 180 has to be a long enough period to cover the entire time the guide has been in existence. The `group` operator supports flexible grouping (similar to `GROUP BY` in SQL) and simple aggregations of fields. It supports the following parameters: | Parameters | | | --------- | ------- | | `group` | List of fields whose values match in each grouping | | `fields` | Set of fields whose values are aggregated across the group | | `stringSort` | Result rows will be sorted by the group key. The group key is treated as a string for sort purposes (so "2" > "10"). This provides a stable ordering for the result. | Each resulting row consists of all of the fields specified by both the group and fields specification. Aggregation fields consist of field name to use in the output row, an aggregation name, and a field name from the input row. For example, to count the rows in a group use `{ "outputField" : { "count" : "inputField" } }` will count the number of rows which match (here the field name must be specified, but is irrelevant). The following row aggregators are supported: | Aggregators | | | ----------- | --- | | `count` | Counts the number of rows in the group. If a field name is it counts the unique occurrences of that field. | | `listAverage` | Average the individual values in a list arguement and return a list of those averages. | | `max` | Returns the largest value of the specified (numeric) field. | | `min` | Returns the smallest value of the specified (numeric) field. | | `sum` | Adds up the values of the specified field. | | `first` | Returns the first value of the specified field (note: "first" is only defined as deterministically as the input order is). | > Count the number of unique `visitorIds` use the following pipeline: ```json [{ "group": { "group": [ "visitorId" ] } }, { "visitorCount": { "count": null } } ] ``` > To get the number of events for each `visitorId`use: ```json [{ "group": { "group": [ "visitorId" ], "fields": { "eventCount": { "count": "event" } } } }] ``` ### The `reduce` operation > Count the number of rows and add up the minutes fields in the rows: ```json [{ "reduce": { "eventCount": { "count": null }, "totalMinutes": { "sum": "numMinutes" } } }] ``` The `reduce` operator collects all rows into a single row, and performs aggregations on fields. The aggregators and format are identical to the `fields` parameter for the `group` pipeline operation. > Example request to calculate the total number of events and the total number of minutes for a thirty day period. ```json { "requestId": "last30DaysEventAndMinuteTotals", "timeSeries": { "period": "dayRange", "count": 30, "first": 1404745200000 }, "source": { "events": null }, "pipeline": [{ "reduce": [{ "events": { "sum": "numEvents" } }, { "minutes": { "sum": "numMinutes" } } ] }] } ``` > Count the number of unique visitors in a dataset ```json { "requestId": "numberOfUniqueVisitors", "timeSeries": { "period": "dayRange", "count": 30, "first": 1404745200000 }, "source": { "events": null }, "pipeline": [{ "reduce": [{ "reduce": { "visitors": { "count": "visitorId" } } }] }] } ``` > Example request using `group` and `reduce` to count the number of visitors in a set of events and the total number of minutes used by those visitor: ```json { "requestId": "lastDaysEventAndMinuteTotals", "timeSeries": { "period": "dayRange", "count": 30, "first": 1404745200000 }, "source": { "events": null }, "pipeline": [{ "group": { "group": [ "visitorId" ], "fields": { "visitorMinutes": { "sum": "numMinutes" } } }, "reduce": { "visitorCount": { "count": "numVisitors" }, "totalMinutes": { "sum": "visitorMinutes" } } }] } ``` ### The `segment` operation > Definition ```json { "segment": { "id": "id" } } ``` Segments are used to preselect groups of visitors. Preexisting segments filter rows via the `segment` operator. An input row is passed if the `visitorId` in the input row match an item in the segment. ### The `expand` operation > Expand visitors into a subdocument called `info` ```json { "expand": { "info": { "visitor": "visitorId" } } } ``` **NOTE:** The implementation of expand is not well optimized and should not be used for large numbers of expansions. See `bulkExpand` below. The `expand` operator does as its name implies - it expands fields into subdocuments based on an id. The step is of the following form: `{ expand : { outputFieldName : { dataType : idFieldName } } }`. The following data types support expand: `page`, `visitor`. ### The `bulkExpand` operation The `bulkExpand` operator behaves identically to `expand`, but is optimized for large numbers of visitors and accounts. The following data types support expand: `account` and `visitor`. ### The `unwind` operation > To `unwind` the following: ```json { "name": "fred", "items": [{ "v": "first" }, { "v": "second" }, { "v": "third" } ] } ``` > Try: ```json { "unwind": { "field": "items", "index": "itemIndex" } } ``` > Response would yield: ```json { "name": "fred", "items": { "v": "first" }, "itemIndex": 0 } { "name": "fred", "items": { "v": "second" }, "itemIndex": 1 } { "name": "fred", "items": { "v": "third" }, "itemIndex": 2 } ``` The `unwind` operator expands a list of items into separate rows. All other items are preserved, and an index may optionally be added. > Add `"prefix" : true` and you will get: ```json [{ "name": "fred", "items": [{ "v": "first" }], "itemIndex": 0 }, { "name": "fred", "items": [{ "v": "first" }, { "v": "second" }], "itemIndex": 1 }, { "name": "fred", "items": [{ "v": "first" }, { "v": "second" }, { "v": "third" }], "itemIndex": 2 }] ``` Alternatively, if everything to the left of each items is important, `unwind` can preserve those items. If the list to unwind is empty then no rows will be emitted unless `{ "keepEmpty" : true }` is specified, in which case the original row will be preserved. ### The `useragent` operation > Definition ```json { "useragent": { "output": "input" } } ``` The `useragent` operation parses the user agent string in the field `input` and renders it into a object in `output`. | Output | | | ------ | -------- | | `name` | the browser's name (e.g. "MSIE", "Chrome", etc.) | | `version` | the version of the browser | | `os` | the operating system (e.g. "Windows", "Mac OS X", "GNU/Linux", etc.) | | `type` | the type of user agent (e.g. "Browser", "Crawler", etc.) | ## Multi-App > Using the `pages` source as an example, without querying for a specific app, the `source` looks like this: ```json { "source": { "pages": null } } ``` To execute Aggregations specific to an app, you'll need to change the `source` value within the aggregation from `null` - which is an indicator to return all records for all apps associated with the sub - to include an `appId` value. Including the `appId` scopes the results to the data associated with just that application within the subscription. For all multi-app Aggregations, use the primary app's Integration API key for the value for `x-api-integration-key` in the Header. ### appId > To query on a specific app, `source` will look like this: ```json { "source": { "pages": { "appId": 5629499534213120 } } } ```