Metrics in CosmosDB (DocumentDB-Focus)

Update 19.11.2018: Checkout my experimental implementation of a Metrics-Client Library for CosmosDB on github.

With Azure Monitor you have a single way to access metrics for a lot of supported azure resources. My personal guess on how the implemented Azure Monitor is that each Azure Resource provider (like Microsoft.DocumentDB) built an addon-provider „microsoft.insights“ that is used by Azure Monitor.

Azure Monitor only supports these metrics for CosmosDB. The Azure Monitor REST API documentation to query metrics describes the input and output values.

Of these I found these most relevant for the DocumentDB-API:
– Consumed request units: TotalRequestUnits (Unit:Count, Aggregation: count/total)
– Number of requests made: TotalRequests (Unit:Count, Aggregation: count/total)

which can be filtered by
– DatabaseName
– CollectionName
– Region
– StatusCode

Update: During development of the sample I noticed that the documentation is quite outdated. The new Version „2018-01-01“ supports additional metrics that are not documented on the page above. Here the DocumentDB relevant ones:
– AvailableStorage
– DataUsage
– DocumentCount
– DocumentQuota
– IndexUsage
– ReplicationLatency
– ServiceAvailability

(Un)Fortunately CosmosDB provides other interesting metrics as well that cannot be retrieved by Azure Montior. You might have noticed additional metric data in the metrics blade of CosmosDB like:

  • Available Storage
  • Data Size
  • Index Size
  • Max RUs per Second (filterd on a collections partition in a specific region)
  • Observed Read Latency
  • Observed Write Latency

These and more metrics can be retrieved directly from the CosmosDB Azure Resource Provider. This was/is the „old way“ to retrieve metrics before the arrival of Azure Monitor, if i got that right. The reference chapter describes all the various metrics you can consume from CosmosDB Resource Provider.

While Azure Monitor provides a nice nuget package (enable preview!) that you can use to access Azure Monitor metrics, you need to work with the REST-API to access the other metrics.

In this article I will focus on the DocumentDB metrics retrieved by REST (also the Azure Monitor ones). You can find a Azure Monitor Sample with .NET Core here.

Structure of Uri(s) for the Metric REST interfaces

The {resourceUri} is actually a path to the requested Azure resource. Azure Monitor basically always uses the path to the CosmosDB account. If you work directly with the Azure Resource Provider of CosmosDB you need the other paths to:

Resource Uri – Definition {resourceUri}

Resource Uri -> Database Account

This path is basically used whenever we work with Azure Monitor REST-API
– subscriptions/{subscriptionId}/resourceGroups/{rgName}/providers/Microsoft.DocumentDb/databaseAccounts/{cosmosDBAccountName}

Resource Uri -> DocumentDB Database
– subscriptions/{subscriptionId}/resourceGroups/{rgName}/providers/Microsoft.DocumentDb/databaseAccounts/{cosmosDBAccountName}/databases/{databaseResourceId}

Resource Uri -> DocumentDB Collection
(Mostly used in Azure resource Metric queries)
– subscriptions/{subscriptionId}/resourceGroups/{rgName}/providers/Microsoft.DocumentDb/databaseAccounts/{cosmosDBAccountName}/databases/{databaseResourceId}/collections/{collectionResourceId}

Resource Uri -> DocumentDB Collection Partition in a specific region
– subscriptions/{subscriptionId}/resourceGroups/{rgName}/providers/Microsoft.DocumentDb/databaseAccounts/{cosmosDBAccountName}/region/{regionName}/databases/{databaseResourceId}/collections/{collectionResourceId}/partitions

Region Names {regionName}

You can find out which regions your CosmosDB is available by querying the ARM REST API of the CosmosDB Account Azure Resource. Use the „Resources Get REST API“. For CosmosDB you find the documentation on how to retrieve the details of the CosmosDB Account Resource here.

The documentations misses out additional values in „properties“. While „enableAutomaticFailover“ and „enableMultipleWriteLocations“ (multimaster) is quite easy to guess I have no idea what „capabilities“ and „configurationOverrides“ will contain (maybe other API’s?):
– capabilites: []
– configurationOverrides: {}
– enableAutomaticFailover: false
– enableMultipleWriteLocations: false

Here a non exhaustive list of potential regions:

  • North Europe
  • West Europe

Request Examples

CosmosDB Resource Provider GET metrics sample

splitted in multiple rows for better reading
This request will fetch the „Available Storage“, „Data Size“, „Index Size“ in the time frame:
2018-10-10T06:55:00.000Z to 2018-10-10T07:55:00.000Z with a 5 minute interval (PT5M). Since the resource uri path points to a specific collection, only the data of this collection is retrieved!

    https://management.azure.com/
    subscriptions/12a34456-bc78-9d0e-fa1b-c2d3e4567890/resourceGroups/demoRG/
    providers/Microsoft.DocumentDb/databaseAccounts/demodocs/
    databases/6XAQAA==/collections/6XAQAITyPQA=/metrics?
    api-version=2014-04-01
    &$filter=
        (
        name.value eq 'Available Storage' or 
        name.value eq 'Data Size' or 
        name.value eq 'Index Size'
        ) and 
        startTime eq 2018-10-10T06%3A55%3A00.000Z and 
        endTime eq 2018-10-10T07%3A55%3A00.000Z and 
        timeGrain eq duration'PT5M'

Azure Monitor GET metrics sample

splitted in multiple rows for better reading

This request will fetch the „TotalRequests“-metric within the timespan from: 10.Oct. 2018 07:57 to 10.Oct. 2018 08:57 (one hour). The result will be delivered in 1 Minute invervals (PT1M). In this case we want all Databases, Collections, Regions and StatusCodes.

    https://management.azure.com/
    subscriptions/12a34456-bc78-9d0e-fa1b-c2d3e4567890/resourceGroups/demoRG/
    providers/Microsoft.DocumentDb/databaseAccounts/demodocs/
    providers/microsoft.insights/metrics?
    timespan=2018-10-10T07:57:00.000Z/2018-10-10T08:57:00.000Z
    &interval=PT1M
    &metric=TotalRequests
    &aggregation=total
    &$filter=DatabaseName eq '*' and CollectionName eq '*' and Region eq '*' and StatusCode eq '*'
    &api-version=2017-05-01-preview

The azure portal on the CosmosDB metrics blade currently uses this API-Version: 2017-05-01-preview. There is a more recent one „2018-01-01“. To get the supported API-Versions send in a wrong one :-).

Note that the new version requires „metricNames“ instead of „metric“!

    https://management.azure.com/
    subscriptions/12a34456-bc78-9d0e-fa1b-c2d3e4567890/resourceGroups/demoRG/
    providers/Microsoft.DocumentDb/databaseAccounts/demodocs/
    providers/microsoft.insights/metrics?
    timespan=2018-10-10T07:57:00.000Z/2018-10-10T08:57:00.000Z
    &interval=PT1M
    &metricNames=TotalRequests
    &aggregation=total
    &$filter=DatabaseName eq '*' and CollectionName eq '*' and Region eq '*' and StatusCode eq '*'
    &api-version=2018-01-01

Other intervals:
– PT1M (1 minute)
– PT5M (5 minutes)
– PT1H (1 hour)
– PT1D (1 day)
– P7D (7 days)

Azure ComosDB requests the Azure Portal uses for the metrics blade

Overview TAB

Number of requests (aggregated over 1 minute interval)

  • TYPE: Azure Monitor (microsoft.insights provider)
  • ResourceUri -> Database Account
  • API-Version -> 2017-05-01-preview
  • timespan -> 2018-10-10T07%3A57%3A00.000Z/2018-10-10T07%3A58%3A00.000Z
  • metric -> TotalRequests
  • aggregation -> total
  • interval -> PT1M
  • $filter ->
    DatabaseName eq 'databaseName' and 
    CollectionName eq '*' and 
    Region eq '*' and 
    ddStatusCode eq '*'

Number of requests (counted over 1 minute interval)

  • TYPE: Azure Monitor (microsoft.insights provider)
  • ResourceUri -> Database Account
  • API-Version -> 2017-05-01-preview
  • timespan-> 2018-10-10T06%3A58%3A00.000Z/2018-10-10T07%3A58%3A00.000Z
  • metric-> TotalRequests
  • aggregation-> count
  • interval-> PT1M
  • $filter ->
    DatabaseName eq 'databaseName' and 
    CollectionName eq 'colName' and 
    StatusCode eq '*'

Data and Index storage consumed

  • TYPE: Azure Resource Provider Metric
  • ResourceUri -> DocumentDB Collection
  • API-Version -> 2014-04-01
  • $filter->
    (
    name.value eq 'Available Storage' or 
    name.value eq 'Data Size' or 
    name.value eq 'Index Size'
    ) and 
    endTime eq 2018-10-10T07%3A55%3A00.000Z and 
    startTime eq 2018-10-10T06%3A55%3A00.000Z and 
    timeGrain eq duration'PT5M'

Documentation for fetching metrics from the Collection:

Max consumed RU/s per partition key range

  • TYPE: Azure Resource Provider Metric
  • ResourceUri -> DocumentDB Collection
  • API-Version -> 2014-04-01
  • $filter->
    (
        name.value eq 'Max RUs Per Second'
    ) and 
    endTime eq 2018-10-10T07%3A58%3A00.000Z and 
    startTime eq 2018-10-10T06%3A58%3A00.000Z and 
    timeGrain eq duration'PT1M'

Depending on the given resourceUri path the result will vary. The portal uses these three combinations of ResourceUri(s):

  • DocumentDB Database
  • DocumentDB Collection
  • DocumentDB Collection Partition in a specific region

You can find the respective documentation here:

For the „DocumentDB Collection Partition in a specific region“ I missed out the documented „partitionId“-value in my results. I got only „partitionKeyRangeId“. I also got a „region“-value for each entry in my value array. The portal uses the MAX value of all retrieved metric values to display the MAX-RUs for a partition.

Throughput TAB

Number of requests (aggregated over 1 minute interval)

see next section, uses the results from the same query

Number of requests exceeded capacity (aggregated over 1 minute interval) Status:429

This request had been used in the „Overview-Tab“ as well. The result is basically grouped by Database, Collection and Statuscode. So we can filter the 429 requests to get result we need.

  • TYPE: Azure Monitor (microsoft.insights provider)
  • ResourceUri -> Database Account
  • API-Version -> 2017-05-01-preview
  • timespan-> 2018-10-10T08%3A30%3A00.000Z/2018-10-10T09%3A30%3A00.000Z
  • metric-> TotalRequests
  • aggregation-> count
  • interval-> PT1M
  • $filter ->
    DatabaseName eq '*' and 
    CollectionName eq '*' and 
    StatusCode eq '*'

The generic result structure is documented here.

Within the the value the Metric ID will be „/subscriptions/12a34456-bc78-9d0e-fa1b-c2d3e4567890/resourceGroups/myRG/providers/Microsoft.DocumentDb/databaseAccounts/cosmosDBAccount/providers/Microsoft.Insights/metrics/TotalRequests“.

The timeseries array contains entries that basically represent the groups (by DBName, CollectionName and Status Code). Each group contains (in this case 60 items, one per minute PT1M) all the values for that group. The metadatavalues.name will be one of the following:

  • collectionname
  • databasename
  • statuscode

End to end observed read latency at the 99th percentile – metrics – Azure Resource

Latency for a 1KB document lookup operation observed in North Europe in the 99th percentile

  • TYPE: Azure Resource Provider Metric
  • ResourceUri -> Database Account
  • API-Version -> 2014-04-01
  • $filter->
    (
        name.value eq 'Observed Read Latency' or 
        name.value eq 'Observed Write Latency'
    ) and 
    endTime eq 2018-10-10T15%3A00%3A00.000Z and 
    startTime eq 2018-10-10T14%3A00%3A00.000Z and 
    timeGrain eq duration'PT5M'

I was really missing out a single page that describes all the metric possibilities with CosmosDB. I hope that this fills the gap.

Enjoy and have a great day!

AndiP