Digging into CosmosDB storage

**updated section ‘Will it work the other way around’ on 25.07.2017 **

Well, not how the data is stored internally, but rather how CosmosDB seems to handle data that is stored and accessed via Graph, Table or MongoDB API. Each of these collections/graphs that have been created with the new available APIs can be still accessed with the “native” DocumentDB SQL API.  To date it remains a mystery for me if the CosmosDB Team just uses the “classic” api to provide all these alternate APIs on top or if uses some magic behind.

Please note that while accessing CosmosDB Graph/Table/MongoDB data  with DocumentDB SQL is quite interesting it is not something to use in production and probably not supported by Microsoft. Microsoft might at any time change their way of storing this data in CosmosDB and your code might break.

The first three sections will describe “Graph-API”, “Table-API” and “MongoDB-API”. The 4th section explains how you can change the visual portal experience in Azure and in the 5th section I try to do create documents with DocumentDB API and try to query them with the graph API.

Graph-API

imageTo illustrate this on the Graph API, I created a simple graph by executing a series of commands inside the Graph Explorer.  For this article I only use two vertices (one with an additional property) and one edge which connects both vertices (see the upper two vertices in the image):

  • g.addV(‚place‘).property(’name‘,’Rivendell‘);
  • g.addV(‚place‘).property(’name‘,’Elronds Haus‘).property(‚FoodRating‘,’Delicious’);
  • g.V().has(’name‘,’Rivendell‘).addE(‚path‘).to(V().has(’name‘,’Elronds Haus‘)).property(‚weight‘,1.0).property(‚level‘,’difficult‘);

imageIn parallel I use Azure DocumentDB Studio to connect to my graph with the regularly DocumentDB SQL API. If we select “Elronds Haus” in Graph Explorer we can see the Vertex-ID and the properties to this vertex.

In Azure DocDB Studio we can now issue a query on the collection to reveal the vertex for “Elronds Haus”. To reduce complexity I removed the internal fields like _ts, _etag,_self,… in the images.

  • select * from c where c.id=“ae5668e1-0e29-4412-b3ea-a84b2eb68104″

 

imageId and Label of the vertex is just stored as normal JSON fields, but the properties are stored as a combination of _value and some unique property id field. The edge interestingly stores it’s properties different and more easy to query with DocDB SQL.

image

Where we can find all paths with a weight of 1 easy with

  • select * from c where c.label=“path“ and c.weight=1

we need to issue a more complex query to find a specific vertex by a property value. I’m not sure why they decided to store these properties as array, but maybe this is required for some graph functionality I am not aware of yet.

  • SELECT * FROM c WHERE c.label = „place“  AND c.name[0]._value=“Elronds Haus“
  • SELECT v FROM v JOIN c IN v.name WHERE v.label = „place“  AND c._value = „Elronds Haus“

The edges themselves can be easily discovered by querying the “_isEdge” field. The linked vertices for the edge are stored in the following fields:

  • _sink, _sinkLabel… Id and Lable of the IN-Vertex
  • _vertexId, _vertexLabel… Id and lable of the OUT-Vertex

In this video Rimma Nehme (@rimmanehme) mentioned at 37:20 that the SQL-Extensions will be available at a later point in time enabling you to query the graph with SQL instead of gremlin.

Table API

In this case I use the new Table API of CosmosDB. While you have the same namespace and same API in .NET you need to replace the old Storage nugget package with a new one to have it work.

imageI created three instances of PersonEntity that derive from TableEntity as you would expect with the Storage Table API. The store the race of the person as partitionkey and a combination of first and last name as row key. As soon as you create a new table with the CreateIfNotExistsAsync() method a new database “TablesDB” will be created in CosmosDB with a collection named after your table.

imageKeep in mind that the API will create a new collection for every table! It might be better from a cost perspective to store various kinds of documents into one collection!

As soon as we add the entities to the table we can see them in DocumentDB Studio. Because we used a combination of first and last name as rowkey we can see that the rowkey repesents the “id” field of the CosmosDB entry.

While you can now query more properties than just RowKey and ParitionKey always use the PartitionKey to avoid costly partition scans! You could do a LINQ query like this:

image

imageNow lets load one document in DocumentDB Studio and examine how the data is stored for TableAPI. Again I removed all the internal properties like _ts,_rid,…

What instantly pops into our eye is the use of the $ sign which will cause some trouble constructing DocDB SQL statements as we will see.  Like in the graph API we have multiple fields defining a property. I find this more approachable as this naming reduces the size of the document. (“_value” vs “$v”). The partitionkey is stored as $pk.

CosmosDB stores the type of the properties within it’s $t field. Where 2=string, 16=integer, 1=double.

To query for Bilbo we need to escape the $ character in our query:

  • SELECT * from p where p[‚FirstName‘][‚$v‘] = ‚Bilbo‘ and p[‚$pk‘]=’Hobbit‘

imageTo query the document with LINQ you need to build the entity like in the image. Then you create a typed LINQ query:

  • var queryable2 = client.CreateDocumentQuery<PersonT>(collection.SelfLink, feedOptions).Where( doc => (doc.FirstName.v==“Bilbo“) );

 

MongoDB

First we will create two simple entries with the native MongoDB Client where I add two documents. The second document also uses an ISOData type for date/time. You can see that MongoDB also stores the ID as ObjectId type.

image

There seem to be some issues with other BSON Types though. For example there is an article mentioning some compatibility issues with SiteCore and CosmosDB MongoDB API and I believe it is related to the yet unsupported BSON Type BsonInt32. As far as I have seen (I lost the article in the web Sad smile) currently only ObjectId and ISODate are supported types in CosmosDB MongoDB API.

imageAgain if we now examine those two documents in Azure DocumentDB Studio we can see that id is stored twice. First as “id” and second as [“_id”][“$oid”]. Another way to declare the data type of fields. The field of type ISODate is stored as EPOCH value.

 

 

Switching the portal experience

This will with all APIs except with MongoDB. The reason for this might be legacy Smile. If you take a look at the ARM-Template to create GraphAPI, TableAPI, MongoAPI and DocumentDB API you will notice that while as CosmosDB with MongoAPI has set “kind” property to MongoDB, all others have set it to GlobalDocumentDB.

imageimage

imageAll other APIS rely on the  tags collection within the resource definition. So to change the Table Experience to Graph Experience is to remote the “defaultExperience:Table” tag  and add a new “defaultExperience:Graph” in the portal and reload the page.

image

Will it work the other way around? ** updated 25.07.2017 **

Now we have figured that out, I wonder if I can take a “normal” DocumentDB API collection, fill it with data that looks like what we have created with GraphAPI. Then change the experience to GraphAPI and see if we can access the data via Gremlin.

For that purpose I have set up a brand new CosmosDB with DocumentDB API “demobuilddocs” in the portal. I am again using the Document DB Studio to create a new collection and three documents to it (You can download the documents here!).

Expressed in gremlin this would be (25.07.2017: replaced ö,ü with oe and ue):

  • g.addV(‚place‘).property(’name‘,’Hobbithoehle‘); 
  • g.addV(‚place‘).property(’name‘,’Gasthaus Zum gruenen Drachen‘); 
  • g.V().has(’name‘,’Hobbithöhle‘).addE(‚path‘).to(V().has(’name‘,’Gasthaus Zum gruenen Drachen‘)).property(‚weight‘,2.0);

imageIn DocumentDB Studio I create a new single partitioned collection “democol” with 400 RU/s for imagedemobuilddocs”. Then I create the three documents with CTRL+N (Create document in context menu of collection). So that’s it.

Finally we will change the defaultExperience tag for “demobuilddocs” in the portal to “Graph”:image

Refresh the portal page and navigate to the Data Explorer (Preview). Et voila Smile:

image

Next try that with GraphExplorer and it works all fine as well.

image 

imageNow lets try that with the cloud bash and gremlin client. (Spoiler: Will break! – No it won’t. It would break if we used ö,ü,… in the JSON). First I copy my hobbit.yaml connection configuration to doccb.yaml and edit it with nano to point to  the graph url “demobuilddocs.graphs.azure.com”. Please note that GraphExplorer which uses the .NET CosmosDB SDK will connect to “demobuilddocs.documents.azure.com”. Then I add the path to my collection and the primary key as password (I have truncated that for security reasons).

image

Now I run my gremlin client (I have installed that with a tar ball in my cloud bash) and connect to my graph database:

image

And lets query for edges and see how that works.

image

**updated 25.07.2017**
Now when we read the vertices we will get the correct result.

image

Now lets try that with vertices and see it break with an Decoder Exception. It is missing some strange close marker for OBJECT.
If however our JSON contains mutated vowels (Umlaute) the the decode will fail with an exception:

image

A .NET implementation btw like GraphExplore can handle the mutated vowels without any problem. But you might want to look out for this error in a Java Client. If you examine the exception you can see that the resulting json misses some closing brackets.

image

This will need some further inspection, but I am closing out my article for today. Stay tuned,…

P.S: If you create a new vertex in the gremlin console, you can query that with no problems. But if you replace that document by just changing the value of a property with DocumentDB Studio you have the same error if you query the vertex with the gremlin console. Obviously Microsoft is storing more data than meets the eye Smile. On the other hand it is interesting to see the .NET Client SDK to work.

Keep digging and have a great weekend

AndiP

CosmosDB – Build 2017 Recap Vienna

AzureCosmosDBViennaSpeakerAuf der Veranstaltung //BUILD on Tour gestern bei Microsoft Österreich durfte ich einen Vortrag zum Thema “CosmosDB” halten.

Die Slides zu meinem CosmosDB Vortrag könnt ihr hier herunterladen. Auf GitHub findet ihr meinen Source Code zum CosmosDB Vortrag und hier das Azure CosmosDB Graph Explorer Beispiel.

Hier auch die Gremlin Queries für das Beispiel aus The Hobbit.

Aja, nachdem es für einige Verwirrung gesorgt hat. In CosmosDB gibt es nur Request Units (RU) pro Sekunde/Minute und keine Orks Smile. Obwohl es hätte was.

Bester Sager eines Teilnehmers in dem Vortrag zu den UWP Apps: “Helene Fischer? Naja ohne Ton gehts” *haha*

Viel Spaß

AndiP

Global Azure Bootcamp 2017 Nachlese

 

Wissensturm in Linz 2017 (c) by Andreas PollakWie jedes Jahr fand auch heuer das Global Azure Bootcamp  für Österreich im Linzer Wissensturm statt. Wie immer , dank Rainer Stropek und Karin Huber, ausgezeichnet organisiert und mit spannenden Vorträgen versehen.

Diesmal durfte ich gleich zwei Vorträge zu den Themen “Azure API Management” und “Azure DocumentDB” beisteuern.

Speaker Foto (c) by Rainer StropekUnterhalb findet ihr die Verweise zu den Vortragsslides. Dort könnt ihr auch die Codebeispiele von GitHub herunterladen.

 

Slidedecks & Source

Viel Spaß und bis demnächst

AndiP

Direct Methods and MQTT Box in action

This is to anyone who wants to connect MQTT-Box to Azure IOT-Hub. In my last post I was changing the Microsoft.Azure.Devices.Client implementation to enable Direct Methods. In this article I am showing how to setup MQTT-Box to be able to receive Direct Messages there.

First and foremost you need to create a new IoT-Hub instance with “DeviceManagmentPreview” enabled. If you struggle with this have a look at this post.  To create the shared access signatures we use Device Explorer which you can get here. After you have set up IOT-Hub and downloaded MQTT-Box and Device Explorer start up those tools.

Copy the connection string from the IOT-Hub SAS-policy “iothubowner” to the connection info in Device Explorer:

image

Switch to Management-Tab and click “Create…” to create a new Device. Lets name it “myDeviceId”. Leave the suggested keys and click “Create”.   Click button “SAS Token…”. Select your device, set the time to live (TTL) to 1 and click “Generate”.

image

Copy the value of the SharedAccessSignature (everything after “SharedAccessSignature=”). The result should look like this

SharedAccessSignature sr=<yourIOTHubName>.azure-devices.net%2Fdevices%2F<yourDeviceId>&sig=<signature>&se=<se>

Then start MQTT-Box and create a new MQTT Client

image

Copy the SAS into the password field. Set the following fields and settings:

MQTT Client Name = Something
MQTT Client Id = <your Device ID>
Append timestamp to MQTT client Id? = NO
Protocol = mqtts / tsl
Host = <yourIOTHubName>.azuredevices.net:8883
UserName = <yourIOTHubName>.azure-devices.net/<your Device ID>/DeviceClientType=azure-iot-device%2F1.1.0-dtpreview&api-version=2016-09-30-preview

Click “Save” – You should get a green “Connected”-Button

image

Finally to receive the messages set the “Topic to subscribe” in the yellow box to:

  • $iothub/methods/POST/#

Then hit the “Subscribe”-Button. Now you need to start the node sample from this article to send a direct message to your device on your IOT-Hub. As a result you will receive the message in you MQTT-Box.
image

Of course, since we do not reply here the node client writes out a timeout error after a while. If you want however to send a successful response have a look at the “Topics to publish” section in MQTT Box

Prepare the “Topic to publish”: $iothub/methods/res/200/?$rid=
Payload Type: Strings /JSON / XML /Characters
Payload: “Hello”

Now use the node client to send a direct message again. Have a look at the results in the orange box and quickly copy the number after “$rid=”. After the second call it should be “2”. In the image above it is “1”.

Add this RID-Number to your Topic to publish. In this case: $iothub/methods/res/200/?$rid=2

Hit “Publish”. The message should pop up below:
image

In your node window you will get the result:
image

Enjoy
AndiP

Enable IOT Hub DeviceManagement Preview

imageI do not know if it is just my portal which is missing the “Enable Device Management PREVIEW” – Checkbox when creating a new IOT-Hub. It is still described in the article “Tutorial: Get started with device management” from Juan Perez .

You still can create it with an ARM Template by setting the “Feature”-Field to “DeviceManagement”.

I have written an ARM Template for an IOT-Hub with Device Management which you can download here. Be aware though, that this only works in certain regions like NorthEurope (but now WestEurope).

Enjoy
AndiP

Direct Methods with IOTHub in C#

There is a new preview in town. With this you can invoke a direct method on a device. Currently only MQTT devices are supported in this scenario. There is a nice article with some NodeJS samples. When Roman Kiss posted on the Azure Forum that he would like to write his simulated device in C# I thought this might be an nice opportunity to figure out why this does not work.

Well the answer is pretty simple: It is not yet implemented in the C# SDK.

But being me I decided to make the “impossible” possible (for the fun sake of it). First I did pull the complete preview of the Azure IOT Sdks  from github. Then I spend some time in figuring out what the NodeJS implementation does. I love debugging JavaScript *sigh*.

And then I quickly modded (aka hacked) the Microsoft.Azure.Devices.Client (Be aware that this is not an optimal solution Smile). These are the changes I made:

Microsoft.Azure.Devices.Client – MqttIotHubAdapter

sealed class MqttIotHubAdapter : ChannelHandlerAdapter
...
const string TelemetryTopicFormat = "devices/{0}/messages/events/";
// ADDED =>
const string MethodTopicFilterFormat = "$iothub/methods/POST/#";
const string MethodTopicFormat = "$iothub/methods/res/{0}/?$rid={1}";

Microsoft.Azure.Devices.Client – MqttIotHubAdapter – Connect Function

This was the most difficult to find out, because I did not expect this “hack”. Expect the unexpectable!
async void Connect(IChannelHandlerContext context)
{
...
var connectPacket = new ConnectPacket
{
ClientId = this.deviceId,
HasUsername = true,
// CHANGED => You need to add this weird suffix to make it work!
Username = this.iotHubHostName + "/" + this.deviceId + "/DeviceClientType=azure-iot-device%2F1.1.0-dtpreview&api-version=2016-09-30-preview",
HasPassword = !string.IsNullOrEmpty(this.password),

Microsoft.Azure.Devices.Client – MqttIotHubAdapter – SubscribeAsync Function
Here I added the method topic subscription!

async Task SubscribeAsync(IChannelHandlerContext context)
{
if (this.IsInState(StateFlags.Receiving) || this.IsInState(StateFlags.Subscribing))
{
return;
}

this.stateFlags |= StateFlags.Subscribing;

this.subscribeCompletion = new TaskCompletionSource();
string topicFilter = CommandTopicFilterFormat.FormatInvariant(this.deviceId);
var subscribePacket = new SubscribePacket(Util.GetNextPacketId(), new SubscriptionRequest(topicFilter, this.mqttTransportSettings.ReceivingQoS));
System.Diagnostics.Debug.WriteLine($"Topic filter: {topicFilter}");
await Util.WriteMessageAsync(context, subscribePacket, ShutdownOnWriteErrorHandler);
await this.subscribeCompletion.Task;

// ADDED => WE are using the const I decleared earlier to construct the topicFilter
this.subscribeCompletion = new TaskCompletionSource();
topicFilter = MethodTopicFilterFormat.FormatInvariant(this.deviceId);
System.Diagnostics.Debug.WriteLine($"Topic filter: {topicFilter}");
subscribePacket = new SubscribePacket(Util.GetNextPacketId(), new SubscriptionRequest(topicFilter, this.mqttTransportSettings.ReceivingQoS/*QualityOfService.AtMostOnce*/));
await Util.WriteMessageAsync(context, subscribePacket, ShutdownOnWriteErrorHandler);
await this.subscribeCompletion.Task;
// <= ADDED

}
Microsoft.Azure.Devices.Client – MqttIotHubAdapter –SendMessageAsync Function
Since we do want to acknowledge the arrival of the method we need to modify this too:
async Task SendMessageAsync(IChannelHandlerContext context, Message message)
{
// CHANGED => For our publish message we need to send to a different topic
string topicName = null;
if (message.Properties.ContainsKey("methodName"))
topicName = string.Format(MethodTopicFormat, message.Properties["status"], message.Properties["requestID"]);
else
topicName = string.Format(TelemetryTopicFormat, this.deviceId);
// <= CHANGED

PublishPacket packet = await Util.ComposePublishPacketAsync(context, message, this.mqttTransportSettings.PublishToServerQoS, topicName);
...
Microsoft.Azure.Devices.Client – MqttTransportHandler – ReceiveAsync Function
Since we do not get a lockToken with the Methodcall, we should not enqueue the Null in our completionQueue
public override async Task<Message> ReceiveAsync(TimeSpan timeout)
{
...
Message message;
lock (this.syncRoot)
{
this.messageQueue.TryDequeue(out message);
message.LockToken = message.LockToken;
// Changed line below to exclude LockTokens that are null #HACK better check if it is a Method message
if ((message.LockToken != null)&&(this.qos == QualityOfService.AtLeastOnce) )
{
this.completionQueue.Enqueue(message.LockToken);
}
...
Microsoft.Azure.Devices.Client – Util– ComposePublishPacketAsync
A little change here to prevent that this method “destroys” our carefully constructed topic name earlier.
public static async Task<PublishPacket> ComposePublishPacketAsync(IChannelHandlerContext context, Message message, QualityOfService qos, string topicName)
{
var packet = new PublishPacket(qos, false, false);

// MODIFIED ==>
if (message.Properties.ContainsKey("methodName"))
packet.TopicName = topicName; // Make sure to keep our Topic Name
else
packet.TopicName = PopulateMessagePropertiesFromMessage(topicName, message);
// <== MODIFIED
...

Microsoft.Azure.Devices.Client – Util– PopulateMessagePropertiesFromPacket
And finally we need to populate our method Messages with properties like our requestID, methodName,…
public static void PopulateMessagePropertiesFromPacket(Message message, PublishPacket publish)
{
message.LockToken = publish.QualityOfService == QualityOfService.AtLeastOnce ? publish.PacketId.ToString() : null;

// MODIFIED ==>
Dictionary<string, string> properties = null;
if (publish.TopicName.StartsWith("$iothub/methods"))
{
var segments = publish.TopicName.Split('/');
properties = UrlEncodedDictionarySerializer.Deserialize(segments[4].Replace("?$rid", "requestID"), 0);
properties.Add("methodName", segments[3]);
properties.Add("verb", segments[2]);
}
else
properties = UrlEncodedDictionarySerializer.Deserialize(publish.TopicName, publish.TopicName.NthIndexOf('/', 0, 4) + 1);
// <== MODIFIED

foreach (KeyValuePair<string, string> property in properties)
{
...

Building the simulated device with the modded Microsoft.Azure.Devices.Client SDK
Just create a new Console application and reference the modded SDK
 
using Microsoft.Azure.Devices.Client;
using System;
using System.Collections.Generic;
using System.Text;

namespace DeviceClientCS
{
class Program
{
private static async void ReceiveCloudToDeviceMessageAsync(DeviceClient client,
string theDeviceID)
{
Console.WriteLine($"Receiving messages from Cloud for device {theDeviceID}");
while (true)
{
Message receivedMessage = await client.ReceiveAsync();
if (receivedMessage == null) continue;

Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine($"Received method ({receivedMessage.Properties["methodName"]}): {Encoding.ASCII.GetString(receivedMessage.GetBytes())} for device {theDeviceID} - Verb: {receivedMessage.Properties["verb"]}");
Console.ResetColor();

// ACKNOWLEDGE the method call
byte[] msg = Encoding.ASCII.GetBytes("Input was written to log.");
Message respondMethodMessage = new Message();
foreach (KeyValuePair<string, string> kv in receivedMessage.Properties)
respondMethodMessage.Properties.Add(kv.Key, kv.Value);
respondMethodMessage.Properties.Add("status", "200");
await client.SendEventAsync(respondMethodMessage);
}
}


static void Main(string[] args)
{
string deviceID= "myDeviceId";
string connectionString = "<Your device connection string goes here>";

DeviceClient client = DeviceClient.CreateFromConnectionString(connectionString, deviceID, TransportType.Mqtt);
ReceiveCloudToDeviceMessageAsync(client, deviceID);
Console.ReadLine();
}
}
}
 
And here is a final screen shot of my results:
Result
Cheers
AndiP