Instance metadata service is a server available to virtual machines hosted on the cloud providers (often at http://169.254.169.254/). It provides useful information about the VM itself and its environment, which the VM typically does not have access to.

It is often used to configure and distinguish VM instances from each other in scripts and helps a great deal in bootstrapping cluster orchestrators such as Kubernetes, Mesos etc.

In a nutshell, the metadata server works like:

$ curl http://169.254.169.254/latest/meta-data/public-ipv4
54.165.97.141
$ curl http://169.254.169.254/latest/meta-data/instance-type
t2.micro

In this article, I looked at metadata service offerings of AWS EC2, Google Compute Engine and DigitalOcean to compare them. At the time of writing, Microsoft Azure does not provide a metadata service similar to these.

Table of Contents

  1. DigitalOcean: Highlights
  2. AWS EC2: Highlights
  3. Google Compute Engine: Highlights
  4. Feature Comparison Chart
  5. Performance benchmarks
  6. Conclusion

1. DigitalOcean: Highlights

Documentation: https://developers.digitalocean.com/…/metadata/

Although DigitalOcean is not a big player or a full blown cloud provider, their VPS offering is widely adopted and their lean approach to cloud instances (droplets) are very practical to use.

Good:

  • It is minimalist, simply because the environment is. It provides user-data (cloud-init), public IP, region etc.
  • The directory queries can be retrieved as JSON (instead of plain text) if you append .json to the URL.

Bad:

  • It could’ve made a DigitalOcean API token available on the metadata service to automate certain operations (such as scale up) within the droplet.
  • No dynamic metadata. user-data cannot be changed after the droplet is created.

2. AWS EC2: Highlights

Documentation: https://docs.aws.amazon.com/…/ec2-instance-metadata.html

Amazon Web Services was pretty much the first player in the cloud market, in fact they might as well be the ones who invented the whole concept of “instance metadata service” and the IP address 169.254.169.254.

Although it is very much the de-facto standard of metadata services, I found it not modern enough and it is not really dynamic.

Good:

  • The ami-launch-index field (goes on like 0, 1, 2, …) when multiple instances of the same AMI are launched. This can be useful only a little.
  • If there are IAM roles associated with the instance, security credentials are available on metadata service and it rotates them automatically.
    • However the metadata service does not take certain measures to protect them (read on for what GCE does).
  • AWS allows you to disable metadata service for a VM (fairly reasonable requirement for security and such).

Bad:

  • Tags provided for the instance on EC2 Management Console are not available on the metadata service. I wonder why.
  • AWS CLI does not automatically authenticate even though credentials are perhaps available in the metadata service. (EDIT: turns out this was my incompetence getting IAM roles right, it actually works)
  • Confusing versioning, they have version numbers like 1.0 and 2015-01-05, no way to tell which one is the newest. Luckily you can just say latest in the URL.
  • It seems like there is a JSON endpoint instance-identity/document, but it looks like a soup rather than a well-organized JSON document.

3. Google Compute Engine: Highlights

Documentation: https://cloud.google.com/compute/docs/metadata

Maybe it’s the advantage of being the last one joining the party, but GCE’s metadata service is just perfect. It provides a great deal of flexibility, it is very dynamic and yet still not rocket science.

Good:

  • Google allows you to set dynamic project-wide metadata (key-value pairs, up to 32k). Any project metadata is available to all VMs within the project. Imagine this as the shared metadata among members of a machine cluster.
  • Also, you can set custom instance metadata (k/v pairs and tags) on the instance and these will be available to the VM within 10 seconds. The “dynamic” aspect is a key differentiator.
  • gcloud command-line tool automatically authenticates and works out of the box when the VM is provisioned (for instance, you can delete the VM you are currently on). This is very neat.
  • Speaking of dynamic metadata, if you provide ?wait_for_change=true, the metadata service holds off on your request and returns a response when something changes (such as a new tag gets added or VM migration policy is changed) –although I could not get it working with external IP changes.
  • The metadata service makes transparent maintenance notices available when your VM is about to get rebooted or migrated. You can subscribe to these using ?wait_for_change=true.
  • Like AWS EC2, GCE also makes service credentials available on the metadata service (such as Storage, BigQuery) and it rotates these keys automatically.
  • As a security measure, to prevent accidental proxied access to the metadata service, it refuses to respond queries containing the X-Forwarded-For header. I think it is a nice touch.
  • Like DigitalOcean, you can get a JSON response by adding ?recursive=true to your request (although this does not work for tokens in instance/service-accounts/).

Bad:

  • You have to provide Metadata-Flavor: Google header all the time. I am not sure why this is needed.
  • There is an instance/virtualClock endpoint that is not documented. No big deal.
  • The VM description is available on the metadata service, but the disk description is not.

4. Feature Comparison Chart

Feature DO AWS GCE
cloud-init Yes Yes Yes
External IP Yes Yes Yes
SSH Public Keys Yes Yes Yes
Region/Zone Yes Yes Yes
Disks N/A Yes Yes
Machine type/size No Yes Yes
Dynamic custom metadata No No Yes
Watch for changes No No Yes
Security credentials No Yes Yes
JSON response format Yes Meh Yes
Ability to disable No Yes No

5.Performance Benchmarks

Metadata services are often meant to be used only once to bootstrap things or maybe a few times a day, so you don’t really care about performance. However, out of curiosity, I tested performance of these metadata services by sending 10,000 requests (100 requests in parallel) and see how they perform.

DigitalOcean has applied some form of throttling (should be based on an undocumented rate limit) in some test runs, but it often restored quickly afterwards.

$ ./boom -c 100 -n 10000 http://169.254.169.254/metadata/v1/id

Summary:
  Total:	4.1009 secs.
  Slowest:	0.2282 secs.
  Fastest:	0.0086 secs.
  Average:	0.0406 secs.
  Requests/sec:	2438.4929
  Total Data Received:	70000 bytes.
  Response Size per Request:	7 bytes.

Status code distribution:
  [200]	10000 responses

Response time histogram:
  0.009 [1]   |
  0.031 [2931]|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.053 [5418]|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.075 [1132]|∎∎∎∎∎∎∎∎
  0.096 [339] |∎∎
  0.118 [78]  |
  0.140 [6]   |
  0.162 [4]   |
  0.184 [20]  |
  0.206 [58]  |
  0.228 [13]  |

Google Compute Engine performs really well at this concurrency level. When I bump up the load and the concurrency, a long tail starts to show up and server gets slower, as expected. I observed no explicit throttling.

$ ./boom -c 100 -n 10000 -h 'X-Google-Metadata-Request:True' 'http://metadata.google.internal/computeMetadata/v1/instance/id'

Summary:
  Total:	1.7962 secs.
  Slowest:	0.2097 secs.
  Fastest:	0.0045 secs.
  Average:	0.0178 secs.
  Requests/sec:	5567.3540
  Total Data Received:	200000 bytes.
  Response Size per Request:	20 bytes.

Status code distribution:
  [200]	10000 responses

Response time histogram:
  0.005 [1]   |
  0.025 [9387]|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.046 [610] |∎∎
  0.066 [0]   |
  0.087 [0]   |
  0.107 [0]   |
  0.128 [0]   |
  0.148 [0]   |
  0.169 [0]   |
  0.189 [0]   |
  0.210 [2]   |

AWS EC2 Instance Metadata Service has performed far worse than the others under load and frequently returns HTTP 409 Conflict responses. I managed to get a fully successful run once I lowered concurrency level to <10.

$ ./boom -c 100 -n 10000 http://169.254.169.254/latest/meta-data/instance-id

Summary:
  Total:	45.6048 secs.
  Slowest:	7.4325 secs.
  Fastest:	0.0006 secs.
  Average:	0.4474 secs.
  Requests/sec:	218.1568
  Total Data Received:	2859403 bytes.
  Response Size per Request:	287 bytes.

Status code distribution:
  [200]	2086 responses
  [429]	7863 responses

Response time histogram:
  0.001 [1]   |
  0.744 [6570]|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  1.487 [3068]|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  2.230 [51]  |
  2.973 [8]   |
  3.717 [198] |∎
  4.460 [6]   |
  5.203 [0 ]  |
  5.946 [0]   |
  6.689 [2]	  |
  7.433 [45]  |

Error distribution:
  [51]	Get http://169.254.169.254/latest/meta-data/instance-id: EOF

6. Conclusion

It’s clear that Google Compute Engine instance metadata service is well thought out and carefully designed. I can see it being potentially useful in many scenarios such as cluster bootstrapping.

AWS EC2 and DigitalOcean do not support custom metadata and they are not very much dynamic, so that has been a big turn off for me.

I appreciate any comments, discussion and possibly comparisons with other environments such as OpenStack Nova.


Update: Made several fixes to the article based on Alex Yukhanov’s comments.