Introducing blobMetaDb

Recently I have been working on an open source .NET library idea for Windows Azure. Check blobmetadb out on GitHub.

blobmetadb watches your application’s requests to Azure Blob Storage (S3 of Microsoft, in case you’re not familiar) and keeps record of blobs you upload. By keeping a local database (Redis) of metadata of blobs, you can enumerate them, compute sizes of clusters very quickly. An example use case is, assume you have a Dropbox-like application in which you use Azure Blob Storage to store files uploaded by your users. To compute total disk space your user consumes, you can quickly query the local database instead of making a call to the cloud.

@ahmetalpbalkan oooh. Does it handle file sizes for uploaded stuff? Could theoretically use this to track user account storage limits.
— Justin Williams (@justin) March 12, 2014

This way, your requests will be served lightning fast (300x faster) and you will not be billed for requests like Blob Exists, Container Exists, List Blobs etc. So this will reduce your costs if you are making such queries at high volumes.

Checkout the README and wiki on the GitHub repository to get more idea about it. Basically you instead of making bare Storage Clients like:

client.CreateContainer(name)

you will use operationContext parameter to plug blobmetadb in and allow it to monitor the API traffic:

OperationContext context = new OperationContext();
context.ResponseReceived += blobTracker.UseReceivedResponseAsync;
client.CreateContainer(name, operationContext: ctx)

How it works

In the making of this, I got inspired from mimicdb, which does the same thing for Python S3 client named boto. It has a neat implementation since it just overrides methods of boto classes. I couldn’t do the same thing because Azure Storage Client classes are sealed, by design. However the method Azure provides to monitor API requests are sufficient for this use case, since we are just collecting metadata, which is available on request/response URI and headers.

Performance

Here is a quick benchmark to show how “List Blobs in a Container” operation is blazingly faster than API request:

If you use bare Storage Client tpo list blobs in a container of 20,000 blobs:

var account = new CloudStorageAccount (StorageCredentials ("ACCOUNT", "KEY"), true);
var blobs = client.GetContainerReference ("containerName").ListBlobs();
>>> Took 15.701 seconds.

but if you use Storage Client with blobmetadb:

var blobTracker = new BlobMetaTracker (DiscoveryMode.AllRequests, new RedisBlobMetadataStore (), account);
var blobs = blobTracker.ListBlobsAsync("containerName").Result;
>>> Took 0.0491 seconds.

So it’s really 300x faster!

There are in place strategies to reduce the read/write load on the blob metadata store the library works with (e.g. Redis).

More info

Hit me up on email or Twitter if you liked and would like to discuss on the idea, I would be glad. You’re welcomed to submit pull requests and contribute in other ways.

Disclaimer

Please note that, this project has nothing to do with Microsoft, neither it is supported or advised by Microsoft Windows Azure. This is just a side project or mine.

How it works

Performance

More info

Disclaimer

Leave your thoughts