Metrics in CosmosDB (DocumentDB-Focus)

Update 19.11.2018: Checkout my experimental implementation of a Metrics-Client Library for CosmosDB on github.

With Azure Monitor you have a single way to access metrics for a lot of supported azure resources. My personal guess on how the implemented Azure Monitor is that each Azure Resource provider (like Microsoft.DocumentDB) built an addon-provider „microsoft.insights“ that is used by Azure Monitor.

Azure Monitor only supports these metrics for CosmosDB. The Azure Monitor REST API documentation to query metrics describes the input and output values.

Of these I found these most relevant for the DocumentDB-API:
– Consumed request units: TotalRequestUnits (Unit:Count, Aggregation: count/total)
– Number of requests made: TotalRequests (Unit:Count, Aggregation: count/total)

which can be filtered by
– DatabaseName
– CollectionName
– Region
– StatusCode

Update: During development of the sample I noticed that the documentation is quite outdated. The new Version „2018-01-01“ supports additional metrics that are not documented on the page above. Here the DocumentDB relevant ones:
– AvailableStorage
– DataUsage
– DocumentCount
– DocumentQuota
– IndexUsage
– ReplicationLatency
– ServiceAvailability

(Un)Fortunately CosmosDB provides other interesting metrics as well that cannot be retrieved by Azure Montior. You might have noticed additional metric data in the metrics blade of CosmosDB like:

  • Available Storage
  • Data Size
  • Index Size
  • Max RUs per Second (filterd on a collections partition in a specific region)
  • Observed Read Latency
  • Observed Write Latency

These and more metrics can be retrieved directly from the CosmosDB Azure Resource Provider. This was/is the „old way“ to retrieve metrics before the arrival of Azure Monitor, if i got that right. The reference chapter describes all the various metrics you can consume from CosmosDB Resource Provider.

While Azure Monitor provides a nice nuget package (enable preview!) that you can use to access Azure Monitor metrics, you need to work with the REST-API to access the other metrics.

In this article I will focus on the DocumentDB metrics retrieved by REST (also the Azure Monitor ones). You can find a Azure Monitor Sample with .NET Core here.

Structure of Uri(s) for the Metric REST interfaces

The {resourceUri} is actually a path to the requested Azure resource. Azure Monitor basically always uses the path to the CosmosDB account. If you work directly with the Azure Resource Provider of CosmosDB you need the other paths to:

Resource Uri – Definition {resourceUri}

Resource Uri -> Database Account

This path is basically used whenever we work with Azure Monitor REST-API
– subscriptions/{subscriptionId}/resourceGroups/{rgName}/providers/Microsoft.DocumentDb/databaseAccounts/{cosmosDBAccountName}

Resource Uri -> DocumentDB Database
– subscriptions/{subscriptionId}/resourceGroups/{rgName}/providers/Microsoft.DocumentDb/databaseAccounts/{cosmosDBAccountName}/databases/{databaseResourceId}

Resource Uri -> DocumentDB Collection
(Mostly used in Azure resource Metric queries)
– subscriptions/{subscriptionId}/resourceGroups/{rgName}/providers/Microsoft.DocumentDb/databaseAccounts/{cosmosDBAccountName}/databases/{databaseResourceId}/collections/{collectionResourceId}

Resource Uri -> DocumentDB Collection Partition in a specific region
– subscriptions/{subscriptionId}/resourceGroups/{rgName}/providers/Microsoft.DocumentDb/databaseAccounts/{cosmosDBAccountName}/region/{regionName}/databases/{databaseResourceId}/collections/{collectionResourceId}/partitions

Region Names {regionName}

You can find out which regions your CosmosDB is available by querying the ARM REST API of the CosmosDB Account Azure Resource. Use the „Resources Get REST API“. For CosmosDB you find the documentation on how to retrieve the details of the CosmosDB Account Resource here.

The documentations misses out additional values in „properties“. While „enableAutomaticFailover“ and „enableMultipleWriteLocations“ (multimaster) is quite easy to guess I have no idea what „capabilities“ and „configurationOverrides“ will contain (maybe other API’s?):
– capabilites: []
– configurationOverrides: {}
– enableAutomaticFailover: false
– enableMultipleWriteLocations: false

Here a non exhaustive list of potential regions:

  • North Europe
  • West Europe

Request Examples

CosmosDB Resource Provider GET metrics sample

splitted in multiple rows for better reading
This request will fetch the „Available Storage“, „Data Size“, „Index Size“ in the time frame:
2018-10-10T06:55:00.000Z to 2018-10-10T07:55:00.000Z with a 5 minute interval (PT5M). Since the resource uri path points to a specific collection, only the data of this collection is retrieved!

    https://management.azure.com/
    subscriptions/12a34456-bc78-9d0e-fa1b-c2d3e4567890/resourceGroups/demoRG/
    providers/Microsoft.DocumentDb/databaseAccounts/demodocs/
    databases/6XAQAA==/collections/6XAQAITyPQA=/metrics?
    api-version=2014-04-01
    &$filter=
        (
        name.value eq 'Available Storage' or 
        name.value eq 'Data Size' or 
        name.value eq 'Index Size'
        ) and 
        startTime eq 2018-10-10T06%3A55%3A00.000Z and 
        endTime eq 2018-10-10T07%3A55%3A00.000Z and 
        timeGrain eq duration'PT5M'

Azure Monitor GET metrics sample

splitted in multiple rows for better reading

This request will fetch the „TotalRequests“-metric within the timespan from: 10.Oct. 2018 07:57 to 10.Oct. 2018 08:57 (one hour). The result will be delivered in 1 Minute invervals (PT1M). In this case we want all Databases, Collections, Regions and StatusCodes.

    https://management.azure.com/
    subscriptions/12a34456-bc78-9d0e-fa1b-c2d3e4567890/resourceGroups/demoRG/
    providers/Microsoft.DocumentDb/databaseAccounts/demodocs/
    providers/microsoft.insights/metrics?
    timespan=2018-10-10T07:57:00.000Z/2018-10-10T08:57:00.000Z
    &interval=PT1M
    &metric=TotalRequests
    &aggregation=total
    &$filter=DatabaseName eq '*' and CollectionName eq '*' and Region eq '*' and StatusCode eq '*'
    &api-version=2017-05-01-preview

The azure portal on the CosmosDB metrics blade currently uses this API-Version: 2017-05-01-preview. There is a more recent one „2018-01-01“. To get the supported API-Versions send in a wrong one :-).

Note that the new version requires „metricNames“ instead of „metric“!

    https://management.azure.com/
    subscriptions/12a34456-bc78-9d0e-fa1b-c2d3e4567890/resourceGroups/demoRG/
    providers/Microsoft.DocumentDb/databaseAccounts/demodocs/
    providers/microsoft.insights/metrics?
    timespan=2018-10-10T07:57:00.000Z/2018-10-10T08:57:00.000Z
    &interval=PT1M
    &metricNames=TotalRequests
    &aggregation=total
    &$filter=DatabaseName eq '*' and CollectionName eq '*' and Region eq '*' and StatusCode eq '*'
    &api-version=2018-01-01

Other intervals:
– PT1M (1 minute)
– PT5M (5 minutes)
– PT1H (1 hour)
– PT1D (1 day)
– P7D (7 days)

Azure ComosDB requests the Azure Portal uses for the metrics blade

Overview TAB

Number of requests (aggregated over 1 minute interval)

  • TYPE: Azure Monitor (microsoft.insights provider)
  • ResourceUri -> Database Account
  • API-Version -> 2017-05-01-preview
  • timespan -> 2018-10-10T07%3A57%3A00.000Z/2018-10-10T07%3A58%3A00.000Z
  • metric -> TotalRequests
  • aggregation -> total
  • interval -> PT1M
  • $filter ->
    DatabaseName eq 'databaseName' and 
    CollectionName eq '*' and 
    Region eq '*' and 
    ddStatusCode eq '*'

Number of requests (counted over 1 minute interval)

  • TYPE: Azure Monitor (microsoft.insights provider)
  • ResourceUri -> Database Account
  • API-Version -> 2017-05-01-preview
  • timespan-> 2018-10-10T06%3A58%3A00.000Z/2018-10-10T07%3A58%3A00.000Z
  • metric-> TotalRequests
  • aggregation-> count
  • interval-> PT1M
  • $filter ->
    DatabaseName eq 'databaseName' and 
    CollectionName eq 'colName' and 
    StatusCode eq '*'

Data and Index storage consumed

  • TYPE: Azure Resource Provider Metric
  • ResourceUri -> DocumentDB Collection
  • API-Version -> 2014-04-01
  • $filter->
    (
    name.value eq 'Available Storage' or 
    name.value eq 'Data Size' or 
    name.value eq 'Index Size'
    ) and 
    endTime eq 2018-10-10T07%3A55%3A00.000Z and 
    startTime eq 2018-10-10T06%3A55%3A00.000Z and 
    timeGrain eq duration'PT5M'

Documentation for fetching metrics from the Collection:

Max consumed RU/s per partition key range

  • TYPE: Azure Resource Provider Metric
  • ResourceUri -> DocumentDB Collection
  • API-Version -> 2014-04-01
  • $filter->
    (
        name.value eq 'Max RUs Per Second'
    ) and 
    endTime eq 2018-10-10T07%3A58%3A00.000Z and 
    startTime eq 2018-10-10T06%3A58%3A00.000Z and 
    timeGrain eq duration'PT1M'

Depending on the given resourceUri path the result will vary. The portal uses these three combinations of ResourceUri(s):

  • DocumentDB Database
  • DocumentDB Collection
  • DocumentDB Collection Partition in a specific region

You can find the respective documentation here:

For the „DocumentDB Collection Partition in a specific region“ I missed out the documented „partitionId“-value in my results. I got only „partitionKeyRangeId“. I also got a „region“-value for each entry in my value array. The portal uses the MAX value of all retrieved metric values to display the MAX-RUs for a partition.

Throughput TAB

Number of requests (aggregated over 1 minute interval)

see next section, uses the results from the same query

Number of requests exceeded capacity (aggregated over 1 minute interval) Status:429

This request had been used in the „Overview-Tab“ as well. The result is basically grouped by Database, Collection and Statuscode. So we can filter the 429 requests to get result we need.

  • TYPE: Azure Monitor (microsoft.insights provider)
  • ResourceUri -> Database Account
  • API-Version -> 2017-05-01-preview
  • timespan-> 2018-10-10T08%3A30%3A00.000Z/2018-10-10T09%3A30%3A00.000Z
  • metric-> TotalRequests
  • aggregation-> count
  • interval-> PT1M
  • $filter ->
    DatabaseName eq '*' and 
    CollectionName eq '*' and 
    StatusCode eq '*'

The generic result structure is documented here.

Within the the value the Metric ID will be „/subscriptions/12a34456-bc78-9d0e-fa1b-c2d3e4567890/resourceGroups/myRG/providers/Microsoft.DocumentDb/databaseAccounts/cosmosDBAccount/providers/Microsoft.Insights/metrics/TotalRequests“.

The timeseries array contains entries that basically represent the groups (by DBName, CollectionName and Status Code). Each group contains (in this case 60 items, one per minute PT1M) all the values for that group. The metadatavalues.name will be one of the following:

  • collectionname
  • databasename
  • statuscode

End to end observed read latency at the 99th percentile – metrics – Azure Resource

Latency for a 1KB document lookup operation observed in North Europe in the 99th percentile

  • TYPE: Azure Resource Provider Metric
  • ResourceUri -> Database Account
  • API-Version -> 2014-04-01
  • $filter->
    (
        name.value eq 'Observed Read Latency' or 
        name.value eq 'Observed Write Latency'
    ) and 
    endTime eq 2018-10-10T15%3A00%3A00.000Z and 
    startTime eq 2018-10-10T14%3A00%3A00.000Z and 
    timeGrain eq duration'PT5M'

I was really missing out a single page that describes all the metric possibilities with CosmosDB. I hope that this fills the gap.

Enjoy and have a great day!

AndiP

Testing with Azure CosmosDB Emulator in Azure DevOps CI/CD Pipeline with ASP.NET Core

During local development, I often use Azure CosmosDB emulator instead of having a running instance in the cloud. Naturally, my unit tests also use the emulator.

Since our gated check-in requires all unit tests to complete we need to find a way to complete all tests in Azure DevOps.

Luckily there is this handy pipeline task from the Visual Studio Marketplace at hand!

In case you never heard of Azure DevOps (formerly known as VSTS) or Azure pipelines you can get a good overview here. You can sign up for free for Azure DevOps here.

Setting up an ASP.NET Core sample

For our purpose I have created a new MVC Asp.NET Core 2.1 web application in Visual Studio including an additional .NET Core MSTest project ‚EmulatorDemoTests‘ which will contain my unit tests.

In the sample code I use the slightly adapted class DocumentDBRepository from the ToDo-App CosmosDB sample. We add the following nuget packages:
– „Microsoft.Azure.DocumentDB.Core“
– „System.Configuration.ConfigurationManager“.

For the test project i created a file „.runsettings“, which is required to configure the access to our local cosmosdb instance.

<RunSettings>
    <TestRunParameters>
    <!-- Path to the local CosmosDB instance -->
    <Parameter name="endpoint" value="https://localhost:8081" />
    <!-- Wellknown Secret to acccess the emulator instance -->
    <Parameter name="authKey" value="C2y6yDjf5/R+ob0N8A7Cgv30VRDJIWEHLM+4QDU5DE2nQ9nDuVTqobD4b8mGGyPMbIZnqyMsEcaGQy67XIw/Jw==" />
    <!-- Database and collection name -->
    <Parameter name="database" value="demodb" />
    <Parameter name="collection" value="democol" />
  </TestRunParameters>
</RunSettings>

In order to have Visual Studio pick up that file, you need to set the Test Settings file with following commands:
– Test – Test Settings – Select Test Settings File (Select your .runsettings file)

In the „CosmosDBTest“ class I ensure that our DocumentDBRepository is initialized properly with the settings from .runsettings file:

[TestInitialize()]
public void CosmosDBInitialize()
{
    this._EndpointUrl = TestContext.Properties["endpoint"].ToString();
    this._AuthKey = TestContext.Properties["authKey"].ToString();
    this._DatabaseName = TestContext.Properties["database"].ToString();
    this._CollectionName = TestContext.Properties["collection"].ToString();
    DocumentDBRepository<Person>.Initialize(
            this._EndpointUrl, 
            this._AuthKey, 
            this._DatabaseName, 
            this._CollectionName);
}

I have written a simple test case which will suffice for our little sample.

[TestMethod]
public async Task TestInsertDocuments()
{
    var document = await DocumentDBRepository<Person>.CreateItemAsync(new Person
    {
        Age = 38,
        FirstName = "Andreas",
        LastName = "Pollak"
    });
    Assert.IsNotNull(document);
    Assert.IsFalse(string.IsNullOrEmpty(document.Id));

    var person = (await DocumentDBRepository<Person>.GetItemsAsync(
        p => p.LastName == "Pollak")).FirstOrDefault();

    Assert.IsNotNull(person);
    Assert.IsTrue(person.FirstName == "Andreas");
    Assert.IsTrue(person.LastName == "Pollak");
    Assert.IsTrue(person.Age == 38);
    await DocumentDBRepository<Person>.DeleteItemAsync(document.Id);
}

You can find the complete code on GitHub.

Setting up CI/CD pipeline in Azure DevOps

First of all you need an azure devops account. You can sign up for free for Azure DevOps here. After you have created a new DevOps Project (in this case with a GIT repository) you can add your source to the git repository like i did below:
Source Code

To enable Azure CosmosDB Emulator in you CI/CD pipelines you need to install the Azure DevOps pipeline task.
Navigate to the Azure CosmosDB Emulator Task in your browser and click „Get it free“ – Button. After authentication with either your organizational or Microsoft account you can choose the DevOps Account you want to install this task to. In case your devops account is located at https://youraccount.visualstudio.com your account will be listed as: „youraccount“.

Install to location

Click „Install“ and after the successful install „Proceed to organization“. Select your DevOps project.

Click the Pipelines menu and create a new pipeline by clicking on the „New pipeline“ button in the center of the screen.

Select Pipelines

First select your pipeline source. You have a varity of options to host your code including DevOps Repos, TFVC, GitHub, GitHub Enterprise, Subversion, BitBucket or any external git repository. In this case just use „Azure Repos Git“ as the source is stored in the DevOps project.

Select Source

Next choose from many available templates (which allow us also to build Python, Java,… code). Select „ASP.NET Core“ as our template:

Select AspNet Core Template

The initial pipeline looks like this. Since no emulator is running in the DevOps envionment the tests will fail.
Initial pipeline

And sure enough they fail:
Failed test run

To enable the Azure CosmosDB Emulator add an additional task to our devops pipeline. This task will run a docker container for windows containing an emulator instance. Since per default the agent host is Linux you need to switch the Build Agent from Linux to Windows:
Change Pipeline to Windows Agent

Now you can select the + sign in the agent job section and add a new task „Azure CosmosDB emulator“ from the „test“ category. Use drag n drop to move it between the tasks „Build“ and „Test“.

AddPipelineTask

It is important to know that the CosmosDB Emulator Task will export an pipeline variable „$(CosmosDbEmulator.Endpoint)“ which will contain the endpoint where the CosmosDB instance will be available.

You can configure your Emulator instance as you like. For example configure the consistency model or the amount of partitions to allocate,…

Now you need to configure the „Test .NET Core“ task to have the unit tests use the endpoint of the emulator you just created. While you can configure a runsettings file, there is currently no way to override parameters (see this open github issue).

Therefore you need to work around this limitation. First of all configure the test task to use a runsettings file that does not yet exist. Right now there is only a „.runnsettings“ file in that folder.

--configuration $(BuildConfiguration) --settings "$(Build.SourcesDirectory)\EmulatorDemo\EmulatorDemoTests\test.runsettings"

Configure NetCore Test Task

Next use a small powershell script task to create this file dynamically. Click on the +-Icon in the „Agent job“ section and find unter the category „Utility“ the task „PowerShell“. Place that script between „Run Azure CosmosDB Emulator“ and „Test .NET Core“.

Now you need to configure two pipeline parameters as environment variables within the script. Open the „Environment variables“ section and configure those values. Attention: Environment variables MUST NOT contain a „.“.

CosmosDbEmulatorEndpoint  = $(CosmosDbEmulator.Endpoint)
BuildSourcesDirectory     = $(Build.SourcesDirectory)

Set EnvVariables For PowerShell Script

Add your little script. Make sure to select Type = Inline. and copy the following script into the script text field.

# Create test.runsettings file
Write-Host CosmosDB Endpoint: $env:CosmosDbEmulatorEndpoint
Write-Host Source Path: $env:BuildSourcesDirectory

$sourceConfig = $env:BuildSourcesDirectory+"\EmulatorDemo\EmulatorDemoTests\.runsettings"

$parameter = Select-Xml -Path $sourceConfig -XPath '//Parameter[@name="endpoint"]'
$parameter.Node.value = $env:CosmosDbEmulatorEndpoint
$newFileName = $parameter.Path.Replace(".runsettings","test.runsettings")
$parameter.Node.OwnerDocument.Save($newFileName)

Add Script To Power Shell Task

And you are ready to roll! And the test run succeeds:

Test run succeeds

Have fun with Azure DevOps AndiP

Adding Changefeed support to CosmosDB Datamigration Tool

In a production environment I usually would use Azure Functions which use the Change Feed Processor Library internally to continuously push changes in my CosmosDB to other destination(s).

However, for some small testing scenarios, demos and also some coding fun, I decided to add ChangeFeed support to the Azure CosmosDB data migration tool (link to original version).

So with this special version of the Azure CosmosDB Data Migration Tool you have additional options for the DocumentDB-Source available unter „Advanced Options“:

Once you check „Use change feed of the collection“ you get following options:

You can select where you want start reading from the change feed. Either start at the creation time of the collection or select a specific date.  I admit I could have added a DateTime-Picker :-P.

At „File to read/store continuation tokens“ you can provide a file name to store the continuation tokens. If you re-run the wizard and provide the file again only the new updated documents will be processed.

Last but not least you need to set, if you want to update the provided continuation token file with the new continuation tokens which in most situations is desired.

Thnx

AndiP

Cosmos DB access control with users and permissions

Overview

I have written a small lab on using user permissions with CosmosDB. This article dwells a bit on the scenarios when to use permissions instead of masterkey and how you can use a set of permissions to grant access to multiple documents, collections, partitions at once.

Scenarios for user/permissions resource keys

Before we dive into the details of „How we can work with users/permissions to access data in Cosmos DB“ let’s first discuss some scenarios where that can be applied.

In an ideal scenario we ensure that our authenticated user can access our services only through a limited set of API’s as shown in this simplified architecture:

In this scenario we have two options.
Option A: Use the master key and ensure access to the right data in the data services depending on the users permissions.
Option B: Depending on the authenticated user create a set of user permissions (correlating it either directly 1:1 to a cosmos db user, or using a user per tenant) and use this to connect to Cosmos DB in your data services.

In the following scenarios we want to enable direct access to Cosmos DB from applications and services that are not under our full control. We need to ensure that those clients can only access data they are supposed to.

Let’s say the Vendor B needs very fast regional access to the data stored in Cosmos DB. She wants to avoid additional latency by running constantly through business services provided by Vendor A.

In this case Vendor A can grant access to Vendor B’s specific data in Cosmos DB by limiting the access to a single document, single tenant collection or a specific partition within a collection.

Vendor B can use that token/permissions to access the database directly without having access to data of other tenants.

Another scenario might be a mobile application that should be able to fetch the users profile/data quickly directly globally from Cosmos DB.

In this case the application service could provide a resourcetoken/permission to access the database directly. I personally do not like this scenario and would rather use a global distributed API governed by API-Management because that way I have more control in how the user accesses the data (metrics, throttling, additional protection mechanisms). Such a scenario is described with a Xamarin Forms application in the documentation including sample code.

Implementing Users/Permissions

Users are stored within the context of the database in Cosmos DB. Each user has a set of unique named permissions. To learn more about the structure within CosmosDB read this.

Each permission object consists out of

  • Token (string)… Access token to access cosmos db
  • Id (string)… unique permission id (for user)
  • PermissionMode… either Read or All
  • ResourceId (string)… ID of the resource the permission applies to
  • ResourceLink (string)… Self-Link to resource where perm apply.
  • ResourcePartitionKey (string)… PartitionKey of resource perm applies.

Once you have acquired a permission you need only to transfer the Token to the client that should access Cosmos DB. A token is represented as a string.

// token: "type=resource&ver=1&sig=uobSDos7JRdEUfj ... w==;"
string tokenToTransfer = permission.Token;

The client can create a connection to Cosmos DB using that token.

DocumentClient userClient = new DocumentClient(
    new Uri(docDBEndPoint), 
    transferedToken);

!Important! The token stays valid until it expires even if you delete the permission in Cosmos DB. The expiration time can be configured!

You can access multiple resources by providing a set of Permissions. Use the constructor allowing to pass a list of Permissions.
Serializing and Deserializing those permissions as JSON strings is a bit painfully:

Serialization:

// Serializing to JSON
MemoryStream memStream = new MemoryStream();
somePermission.SaveTo(memStream);
memStream.Position = 0L;
StreamReader sr = new StreamReader(memStream);
string jsonPermission = sr.ReadToEnd();

The serialized token looks like this:

{
    "permissionMode":"Read",
    "resource":"dbs/q0dwAA==/colls/q0dwAIDfWAU=/",
    "resourcePartitionKey":["apollak"],
    "id":"mydata",
    "_rid":"q0dwAHOyMQBzifcvvscGAA==",
    "_self":"dbs/q0dwAA==/users/q0dwAHOyMQA=/permissions/q0dw ... AA==/",
    "_etag":"\"00001500-0000-0000-0000-5aaa4a6b0000\"",
    "_token":"type=resource&ver=1&sig=uobSD ... 2kWYxA==;2L9WD ... Yxw==;",
    "_ts":1521109611
}

De-serialization:

memStream = new MemoryStream();
StreamWriter sw = new StreamWriter(memStream);
sw.Write(jsonPermission);
sw.Flush();
memStream.Position = 0L;
Permission somePermission = Permission.LoadFrom<Permission>(memStream);
sw.Close();

By adding multiple permissions to a list you can create a document client with access to multiple resources (f.e. 1..n documents, 1..n collections, 1..n partitions).

List<Permission> permList = new List<Permission>();
// adding two permissions
permList.Add(Permission.LoadFrom<Permission>(memStream));
permList.Add(lpollakDataPermission);
DocumentClient userClient = new DocumentClient(
    new Uri(docDBEndPoint,UriKind.Absolute), 
    permList);

Important things to know:
– If you restrict the partition with a Permission you MUST always set the partition key accessing CosmosDB!
– Permission IDs must be unique for each user and must not be longer than 255 characters
– Tokens expire after an hour per default. You can set an expiration starting by 10 minutes up to 24 hours. This is passed within the RequestOptions in seconds.
– Each time you read the permission from the permission feed of a user a new token gets generated

Example to customize the expiration time

lpollakDataPermission = await client.UpsertPermissionAsync(
    UriFactory.CreateUserUri(dbName, apollakUserid), 
    lpollakDataPermission, 
    new RequestOptions() { ResourceTokenExpirySeconds = 600});

Simple sample implementation as LAB

I have written a small LAB that you can use with CosmosDB Emulator to play with the permissions. There is a student version to start from and also a finished version. In the project folder you will find a „ReadMe.md“ file describing the steps. Download Cosmos DB Security Lab

Learn more about securing CosmosDB

Azure Functions CosmosDB Output binding with Visual Studio 2017

Azure Functions CosmosDB Output binding

Create a new Azure Function in Visual Studio 2017

  1. Start Visual Studio 2017
  2. CTRL/STRG + Q
    a. Type „extensions“
    b. In the extensions and updates dialog search under Online for „Azure Functions and WebJobs Tools“
    c. Install extension
    d. Restart Visual Studio
  3. Create a new Cloud/Azure Functions Project in your „Work“-Folder.
    a. Name = „CBFuncAppLab“
  4. Right click project and select „Manage Nuget-Packages“
    a. Upgrade Microsoft.Net.SDK.Functions to 1.0.6
    b. Search online package: „Microsoft.Azure.WebJobs.Extensions.DocumentDB“ – 1.1.0-beta4
    c. DO NOT INSTALL „Microsoft.Azure.DocumentDB“! This will result in a binding issue, as WebJobs SDK uses a different instance of „Document“ class (v. 1.13.2)!

  5. CTRL+SHIFT+A to add a new file – Select „Azure Function Function“ (CTRL+E – function) under „Visual C# Items“
    a. Name the class „CosmosDBUpsertFunction.cs“
    b. Choose „Generic Webhook“ and click „OK“

Let’s upsert a single document to the collection

For this we need to modify the code like this

namespace CBFuncAppLab
{
    public static class CosmosDBUpsertFunction
    {
        [FunctionName("CosmosDBUpsertFunction")]
        public static object Run(
            [HttpTrigger(WebHookType = "genericJson")]
            HttpRequestMessage req,
            TraceWriter log,
            [DocumentDB("demodb", "democol",
                        CreateIfNotExists =true, 
                        ConnectionStringSetting = "myCosmosDBConnection")] 
                        out dynamic document)
        {
            log.Info($"Webhook was triggered!");
            var task = req.Content.ReadAsStringAsync();
            task.Wait();
            string jsonContent = task.Result;
            dynamic data = JsonConvert.DeserializeObject(jsonContent);
            if (data!=null)
            {
                document = data;
                return req.CreateResponse(HttpStatusCode.OK, new
                {
                    greeting = $"Will upsert document!"
                });
            }
            else
                return req.CreateResponse(HttpStatusCode.BadRequest, new
                {
                    error = "Document was empty!"
                });
        }
    }
}

Make sure you have declared all using statements

using System.Net;
using Newtonsoft.Json;
using System.Net.Http;
using System.Threading.Tasks;
using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Host;

Open ‚local.settings.json‘ and configure the your connection string

{
  "IsEncrypted": false,
  "Values": {
    "AzureWebJobsStorage": "UseDevelopmentStorage=true",
    "AzureWebJobsDashboard": "",
    "myCosmosDBConnection": "AccountEndpoint=https://YOURACCOUNT.documents.azure.com:443/;AccountKey=MASTERKEY-READWRITE;"
  }
}

Either in Azure Portal or with a tool like „DocumentDB Studio“ create a new collection „democol“ within „demodb“ Database in your CosmosDB Account.

Let’s try it!

Start your Azure Function App in the Debugger and fire up a tool like „Postman“.
In the Output Console you can find the URL for your Webhook. In my case:

Use Postman to do a POST request on this URL with a JSON document as body. F.E:

{
    "id":"apollak",
    "Name":"Andreas"
}

Verify the output in the console window and if the document got written to your documents collection.

Let’s upsert multiple documents to the collection

Now lets output multiple documents in one call. For this we use a parameter with the IAsyncCollector interface. For synchron methods you can use ICollector instead. Alter the code like so:

[FunctionName("CosmosDBUpsertFunction")]
public static async Task<object> Run(
    [HttpTrigger(WebHookType = "genericJson")]
    HttpRequestMessage req, TraceWriter log,
    [DocumentDB("demodb", "Items", 
    CreateIfNotExists = true, 
    ConnectionStringSetting = "myCosmosDBConnection")] 
    IAsyncCollector<dynamic> documents)
{
    log.Info($"Webhook was triggered!");

    string jsonContent = await req.Content.ReadAsStringAsync();
    dynamic data = JsonConvert.DeserializeObject(jsonContent);
    if (data!=null)
    {
        foreach(var document in data)
            await documents.AddAsync(document);

        return req.CreateResponse(HttpStatusCode.OK, new
        {
            greeting = $"Will upsert documents!"
        });
    }
    else
        return req.CreateResponse(HttpStatusCode.BadRequest, new
        {
            error = "No documents where provided!"
        });
}

Run this sample again and post an array of JSON Documents instead. Verify if the function works and the documents have been upserted to the CosmosDB collection.

[{
    "id":"apollak",
    "Name":"Andreas",
    "Likes":"Pizza"
},
{
    "id":"smueller",
    "Name":"Susi"
}]

Thnx and Enjoy
AndiP

Digging into CosmosDB storage

**updated section ‘Will it work the other way around’ on 25.07.2017 **

Well, not how the data is stored internally, but rather how CosmosDB seems to handle data that is stored and accessed via Graph, Table or MongoDB API. Each of these collections/graphs that have been created with the new available APIs can be still accessed with the “native” DocumentDB SQL API.  To date it remains a mystery for me if the CosmosDB Team just uses the “classic” api to provide all these alternate APIs on top or if uses some magic behind.

Please note that while accessing CosmosDB Graph/Table/MongoDB data  with DocumentDB SQL is quite interesting it is not something to use in production and probably not supported by Microsoft. Microsoft might at any time change their way of storing this data in CosmosDB and your code might break.

The first three sections will describe “Graph-API”, “Table-API” and “MongoDB-API”. The 4th section explains how you can change the visual portal experience in Azure and in the 5th section I try to do create documents with DocumentDB API and try to query them with the graph API.

Graph-API

imageTo illustrate this on the Graph API, I created a simple graph by executing a series of commands inside the Graph Explorer.  For this article I only use two vertices (one with an additional property) and one edge which connects both vertices (see the upper two vertices in the image):

  • g.addV(‚place‘).property(’name‘,’Rivendell‘);
  • g.addV(‚place‘).property(’name‘,’Elronds Haus‘).property(‚FoodRating‘,’Delicious’);
  • g.V().has(’name‘,’Rivendell‘).addE(‚path‘).to(V().has(’name‘,’Elronds Haus‘)).property(‚weight‘,1.0).property(‚level‘,’difficult‘);

imageIn parallel I use Azure DocumentDB Studio to connect to my graph with the regularly DocumentDB SQL API. If we select “Elronds Haus” in Graph Explorer we can see the Vertex-ID and the properties to this vertex.

In Azure DocDB Studio we can now issue a query on the collection to reveal the vertex for “Elronds Haus”. To reduce complexity I removed the internal fields like _ts, _etag,_self,… in the images.

  • select * from c where c.id=“ae5668e1-0e29-4412-b3ea-a84b2eb68104″

 

imageId and Label of the vertex is just stored as normal JSON fields, but the properties are stored as a combination of _value and some unique property id field. The edge interestingly stores it’s properties different and more easy to query with DocDB SQL.

image

Where we can find all paths with a weight of 1 easy with

  • select * from c where c.label=“path“ and c.weight=1

we need to issue a more complex query to find a specific vertex by a property value. I’m not sure why they decided to store these properties as array, but maybe this is required for some graph functionality I am not aware of yet.

  • SELECT * FROM c WHERE c.label = „place“  AND c.name[0]._value=“Elronds Haus“
  • SELECT v FROM v JOIN c IN v.name WHERE v.label = „place“  AND c._value = „Elronds Haus“

The edges themselves can be easily discovered by querying the “_isEdge” field. The linked vertices for the edge are stored in the following fields:

  • _sink, _sinkLabel… Id and Lable of the IN-Vertex
  • _vertexId, _vertexLabel… Id and lable of the OUT-Vertex

In this video Rimma Nehme (@rimmanehme) mentioned at 37:20 that the SQL-Extensions will be available at a later point in time enabling you to query the graph with SQL instead of gremlin.

Table API

In this case I use the new Table API of CosmosDB. While you have the same namespace and same API in .NET you need to replace the old Storage nugget package with a new one to have it work.

imageI created three instances of PersonEntity that derive from TableEntity as you would expect with the Storage Table API. The store the race of the person as partitionkey and a combination of first and last name as row key. As soon as you create a new table with the CreateIfNotExistsAsync() method a new database “TablesDB” will be created in CosmosDB with a collection named after your table.

imageKeep in mind that the API will create a new collection for every table! It might be better from a cost perspective to store various kinds of documents into one collection!

As soon as we add the entities to the table we can see them in DocumentDB Studio. Because we used a combination of first and last name as rowkey we can see that the rowkey repesents the “id” field of the CosmosDB entry.

While you can now query more properties than just RowKey and ParitionKey always use the PartitionKey to avoid costly partition scans! You could do a LINQ query like this:

image

imageNow lets load one document in DocumentDB Studio and examine how the data is stored for TableAPI. Again I removed all the internal properties like _ts,_rid,…

What instantly pops into our eye is the use of the $ sign which will cause some trouble constructing DocDB SQL statements as we will see.  Like in the graph API we have multiple fields defining a property. I find this more approachable as this naming reduces the size of the document. (“_value” vs “$v”). The partitionkey is stored as $pk.

CosmosDB stores the type of the properties within it’s $t field. Where 2=string, 16=integer, 1=double.

To query for Bilbo we need to escape the $ character in our query:

  • SELECT * from p where p[‚FirstName‘][‚$v‘] = ‚Bilbo‘ and p[‚$pk‘]=’Hobbit‘

imageTo query the document with LINQ you need to build the entity like in the image. Then you create a typed LINQ query:

  • var queryable2 = client.CreateDocumentQuery<PersonT>(collection.SelfLink, feedOptions).Where( doc => (doc.FirstName.v==“Bilbo“) );

 

MongoDB

First we will create two simple entries with the native MongoDB Client where I add two documents. The second document also uses an ISOData type for date/time. You can see that MongoDB also stores the ID as ObjectId type.

image

There seem to be some issues with other BSON Types though. For example there is an article mentioning some compatibility issues with SiteCore and CosmosDB MongoDB API and I believe it is related to the yet unsupported BSON Type BsonInt32. As far as I have seen (I lost the article in the web Sad smile) currently only ObjectId and ISODate are supported types in CosmosDB MongoDB API.

imageAgain if we now examine those two documents in Azure DocumentDB Studio we can see that id is stored twice. First as “id” and second as [“_id”][“$oid”]. Another way to declare the data type of fields. The field of type ISODate is stored as EPOCH value.

 

 

Switching the portal experience

This will with all APIs except with MongoDB. The reason for this might be legacy Smile. If you take a look at the ARM-Template to create GraphAPI, TableAPI, MongoAPI and DocumentDB API you will notice that while as CosmosDB with MongoAPI has set “kind” property to MongoDB, all others have set it to GlobalDocumentDB.

imageimage

imageAll other APIS rely on the  tags collection within the resource definition. So to change the Table Experience to Graph Experience is to remote the “defaultExperience:Table” tag  and add a new “defaultExperience:Graph” in the portal and reload the page.

image

Will it work the other way around? ** updated 25.07.2017 **

Now we have figured that out, I wonder if I can take a “normal” DocumentDB API collection, fill it with data that looks like what we have created with GraphAPI. Then change the experience to GraphAPI and see if we can access the data via Gremlin.

For that purpose I have set up a brand new CosmosDB with DocumentDB API “demobuilddocs” in the portal. I am again using the Document DB Studio to create a new collection and three documents to it (You can download the documents here!).

Expressed in gremlin this would be (25.07.2017: replaced ö,ü with oe and ue):

  • g.addV(‚place‘).property(’name‘,’Hobbithoehle‘); 
  • g.addV(‚place‘).property(’name‘,’Gasthaus Zum gruenen Drachen‘); 
  • g.V().has(’name‘,’Hobbithöhle‘).addE(‚path‘).to(V().has(’name‘,’Gasthaus Zum gruenen Drachen‘)).property(‚weight‘,2.0);

imageIn DocumentDB Studio I create a new single partitioned collection “democol” with 400 RU/s for imagedemobuilddocs”. Then I create the three documents with CTRL+N (Create document in context menu of collection). So that’s it.

Finally we will change the defaultExperience tag for “demobuilddocs” in the portal to “Graph”:image

Refresh the portal page and navigate to the Data Explorer (Preview). Et voila Smile:

image

Next try that with GraphExplorer and it works all fine as well.

image 

imageNow lets try that with the cloud bash and gremlin client. (Spoiler: Will break! – No it won’t. It would break if we used ö,ü,… in the JSON). First I copy my hobbit.yaml connection configuration to doccb.yaml and edit it with nano to point to  the graph url “demobuilddocs.graphs.azure.com”. Please note that GraphExplorer which uses the .NET CosmosDB SDK will connect to “demobuilddocs.documents.azure.com”. Then I add the path to my collection and the primary key as password (I have truncated that for security reasons).

image

Now I run my gremlin client (I have installed that with a tar ball in my cloud bash) and connect to my graph database:

image

And lets query for edges and see how that works.

image

**updated 25.07.2017**
Now when we read the vertices we will get the correct result.

image

Now lets try that with vertices and see it break with an Decoder Exception. It is missing some strange close marker for OBJECT.
If however our JSON contains mutated vowels (Umlaute) the the decode will fail with an exception:

image

A .NET implementation btw like GraphExplore can handle the mutated vowels without any problem. But you might want to look out for this error in a Java Client. If you examine the exception you can see that the resulting json misses some closing brackets.

image

This will need some further inspection, but I am closing out my article for today. Stay tuned,…

P.S: If you create a new vertex in the gremlin console, you can query that with no problems. But if you replace that document by just changing the value of a property with DocumentDB Studio you have the same error if you query the vertex with the gremlin console. Obviously Microsoft is storing more data than meets the eye Smile. On the other hand it is interesting to see the .NET Client SDK to work.

Keep digging and have a great weekend

AndiP

CosmosDB – Build 2017 Recap Vienna

AzureCosmosDBViennaSpeakerAuf der Veranstaltung //BUILD on Tour gestern bei Microsoft Österreich durfte ich einen Vortrag zum Thema “CosmosDB” halten.

Die Slides zu meinem CosmosDB Vortrag könnt ihr hier herunterladen. Auf GitHub findet ihr meinen Source Code zum CosmosDB Vortrag und hier das Azure CosmosDB Graph Explorer Beispiel.

Hier auch die Gremlin Queries für das Beispiel aus The Hobbit.

Aja, nachdem es für einige Verwirrung gesorgt hat. In CosmosDB gibt es nur Request Units (RU) pro Sekunde/Minute und keine Orks Smile. Obwohl es hätte was.

Bester Sager eines Teilnehmers in dem Vortrag zu den UWP Apps: “Helene Fischer? Naja ohne Ton gehts” *haha*

Viel Spaß

AndiP