Digging into CosmosDB storage

**updated section ‘Will it work the other way around’ on 25.07.2017 **

Well, not how the data is stored internally, but rather how CosmosDB seems to handle data that is stored and accessed via Graph, Table or MongoDB API. Each of these collections/graphs that have been created with the new available APIs can be still accessed with the “native” DocumentDB SQL API.  To date it remains a mystery for me if the CosmosDB Team just uses the “classic” api to provide all these alternate APIs on top or if uses some magic behind.

Please note that while accessing CosmosDB Graph/Table/MongoDB data  with DocumentDB SQL is quite interesting it is not something to use in production and probably not supported by Microsoft. Microsoft might at any time change their way of storing this data in CosmosDB and your code might break.

The first three sections will describe “Graph-API”, “Table-API” and “MongoDB-API”. The 4th section explains how you can change the visual portal experience in Azure and in the 5th section I try to do create documents with DocumentDB API and try to query them with the graph API.

Graph-API

imageTo illustrate this on the Graph API, I created a simple graph by executing a series of commands inside the Graph Explorer.  For this article I only use two vertices (one with an additional property) and one edge which connects both vertices (see the upper two vertices in the image):

  • g.addV(‚place‘).property(’name‘,’Rivendell‘);
  • g.addV(‚place‘).property(’name‘,’Elronds Haus‘).property(‚FoodRating‘,’Delicious’);
  • g.V().has(’name‘,’Rivendell‘).addE(‚path‘).to(V().has(’name‘,’Elronds Haus‘)).property(‚weight‘,1.0).property(‚level‘,’difficult‘);

imageIn parallel I use Azure DocumentDB Studio to connect to my graph with the regularly DocumentDB SQL API. If we select “Elronds Haus” in Graph Explorer we can see the Vertex-ID and the properties to this vertex.

In Azure DocDB Studio we can now issue a query on the collection to reveal the vertex for “Elronds Haus”. To reduce complexity I removed the internal fields like _ts, _etag,_self,… in the images.

  • select * from c where c.id=“ae5668e1-0e29-4412-b3ea-a84b2eb68104″

 

imageId and Label of the vertex is just stored as normal JSON fields, but the properties are stored as a combination of _value and some unique property id field. The edge interestingly stores it’s properties different and more easy to query with DocDB SQL.

image

Where we can find all paths with a weight of 1 easy with

  • select * from c where c.label=“path“ and c.weight=1

we need to issue a more complex query to find a specific vertex by a property value. I’m not sure why they decided to store these properties as array, but maybe this is required for some graph functionality I am not aware of yet.

  • SELECT * FROM c WHERE c.label = „place“  AND c.name[0]._value=“Elronds Haus“
  • SELECT v FROM v JOIN c IN v.name WHERE v.label = „place“  AND c._value = „Elronds Haus“

The edges themselves can be easily discovered by querying the “_isEdge” field. The linked vertices for the edge are stored in the following fields:

  • _sink, _sinkLabel… Id and Lable of the IN-Vertex
  • _vertexId, _vertexLabel… Id and lable of the OUT-Vertex

In this video Rimma Nehme (@rimmanehme) mentioned at 37:20 that the SQL-Extensions will be available at a later point in time enabling you to query the graph with SQL instead of gremlin.

Table API

In this case I use the new Table API of CosmosDB. While you have the same namespace and same API in .NET you need to replace the old Storage nugget package with a new one to have it work.

imageI created three instances of PersonEntity that derive from TableEntity as you would expect with the Storage Table API. The store the race of the person as partitionkey and a combination of first and last name as row key. As soon as you create a new table with the CreateIfNotExistsAsync() method a new database “TablesDB” will be created in CosmosDB with a collection named after your table.

imageKeep in mind that the API will create a new collection for every table! It might be better from a cost perspective to store various kinds of documents into one collection!

As soon as we add the entities to the table we can see them in DocumentDB Studio. Because we used a combination of first and last name as rowkey we can see that the rowkey repesents the “id” field of the CosmosDB entry.

While you can now query more properties than just RowKey and ParitionKey always use the PartitionKey to avoid costly partition scans! You could do a LINQ query like this:

image

imageNow lets load one document in DocumentDB Studio and examine how the data is stored for TableAPI. Again I removed all the internal properties like _ts,_rid,…

What instantly pops into our eye is the use of the $ sign which will cause some trouble constructing DocDB SQL statements as we will see.  Like in the graph API we have multiple fields defining a property. I find this more approachable as this naming reduces the size of the document. (“_value” vs “$v”). The partitionkey is stored as $pk.

CosmosDB stores the type of the properties within it’s $t field. Where 2=string, 16=integer, 1=double.

To query for Bilbo we need to escape the $ character in our query:

  • SELECT * from p where p[‚FirstName‘][‚$v‘] = ‚Bilbo‘ and p[‚$pk‘]=’Hobbit‘

imageTo query the document with LINQ you need to build the entity like in the image. Then you create a typed LINQ query:

  • var queryable2 = client.CreateDocumentQuery<PersonT>(collection.SelfLink, feedOptions).Where( doc => (doc.FirstName.v==“Bilbo“) );

 

MongoDB

First we will create two simple entries with the native MongoDB Client where I add two documents. The second document also uses an ISOData type for date/time. You can see that MongoDB also stores the ID as ObjectId type.

image

There seem to be some issues with other BSON Types though. For example there is an article mentioning some compatibility issues with SiteCore and CosmosDB MongoDB API and I believe it is related to the yet unsupported BSON Type BsonInt32. As far as I have seen (I lost the article in the web Sad smile) currently only ObjectId and ISODate are supported types in CosmosDB MongoDB API.

imageAgain if we now examine those two documents in Azure DocumentDB Studio we can see that id is stored twice. First as “id” and second as [“_id”][“$oid”]. Another way to declare the data type of fields. The field of type ISODate is stored as EPOCH value.

 

 

Switching the portal experience

This will with all APIs except with MongoDB. The reason for this might be legacy Smile. If you take a look at the ARM-Template to create GraphAPI, TableAPI, MongoAPI and DocumentDB API you will notice that while as CosmosDB with MongoAPI has set “kind” property to MongoDB, all others have set it to GlobalDocumentDB.

imageimage

imageAll other APIS rely on the  tags collection within the resource definition. So to change the Table Experience to Graph Experience is to remote the “defaultExperience:Table” tag  and add a new “defaultExperience:Graph” in the portal and reload the page.

image

Will it work the other way around? ** updated 25.07.2017 **

Now we have figured that out, I wonder if I can take a “normal” DocumentDB API collection, fill it with data that looks like what we have created with GraphAPI. Then change the experience to GraphAPI and see if we can access the data via Gremlin.

For that purpose I have set up a brand new CosmosDB with DocumentDB API “demobuilddocs” in the portal. I am again using the Document DB Studio to create a new collection and three documents to it (You can download the documents here!).

Expressed in gremlin this would be (25.07.2017: replaced ö,ü with oe and ue):

  • g.addV(‚place‘).property(’name‘,’Hobbithoehle‘); 
  • g.addV(‚place‘).property(’name‘,’Gasthaus Zum gruenen Drachen‘); 
  • g.V().has(’name‘,’Hobbithöhle‘).addE(‚path‘).to(V().has(’name‘,’Gasthaus Zum gruenen Drachen‘)).property(‚weight‘,2.0);

imageIn DocumentDB Studio I create a new single partitioned collection “democol” with 400 RU/s for imagedemobuilddocs”. Then I create the three documents with CTRL+N (Create document in context menu of collection). So that’s it.

Finally we will change the defaultExperience tag for “demobuilddocs” in the portal to “Graph”:image

Refresh the portal page and navigate to the Data Explorer (Preview). Et voila Smile:

image

Next try that with GraphExplorer and it works all fine as well.

image 

imageNow lets try that with the cloud bash and gremlin client. (Spoiler: Will break! – No it won’t. It would break if we used ö,ü,… in the JSON). First I copy my hobbit.yaml connection configuration to doccb.yaml and edit it with nano to point to  the graph url “demobuilddocs.graphs.azure.com”. Please note that GraphExplorer which uses the .NET CosmosDB SDK will connect to “demobuilddocs.documents.azure.com”. Then I add the path to my collection and the primary key as password (I have truncated that for security reasons).

image

Now I run my gremlin client (I have installed that with a tar ball in my cloud bash) and connect to my graph database:

image

And lets query for edges and see how that works.

image

**updated 25.07.2017**
Now when we read the vertices we will get the correct result.

image

Now lets try that with vertices and see it break with an Decoder Exception. It is missing some strange close marker for OBJECT.
If however our JSON contains mutated vowels (Umlaute) the the decode will fail with an exception:

image

A .NET implementation btw like GraphExplore can handle the mutated vowels without any problem. But you might want to look out for this error in a Java Client. If you examine the exception you can see that the resulting json misses some closing brackets.

image

This will need some further inspection, but I am closing out my article for today. Stay tuned,…

P.S: If you create a new vertex in the gremlin console, you can query that with no problems. But if you replace that document by just changing the value of a property with DocumentDB Studio you have the same error if you query the vertex with the gremlin console. Obviously Microsoft is storing more data than meets the eye Smile. On the other hand it is interesting to see the .NET Client SDK to work.

Keep digging and have a great weekend

AndiP

CosmosDB – Build 2017 Recap Vienna

AzureCosmosDBViennaSpeakerAuf der Veranstaltung //BUILD on Tour gestern bei Microsoft Österreich durfte ich einen Vortrag zum Thema “CosmosDB” halten.

Die Slides zu meinem CosmosDB Vortrag könnt ihr hier herunterladen. Auf GitHub findet ihr meinen Source Code zum CosmosDB Vortrag und hier das Azure CosmosDB Graph Explorer Beispiel.

Hier auch die Gremlin Queries für das Beispiel aus The Hobbit.

Aja, nachdem es für einige Verwirrung gesorgt hat. In CosmosDB gibt es nur Request Units (RU) pro Sekunde/Minute und keine Orks Smile. Obwohl es hätte was.

Bester Sager eines Teilnehmers in dem Vortrag zu den UWP Apps: “Helene Fischer? Naja ohne Ton gehts” *haha*

Viel Spaß

AndiP