Instance metadata service is a server available to virtual machines hosted on the cloud providers (often at http://169.254.169.254/). It provides useful information about the VM itself and its environment, which the VM typically does not have access to.
It is often used to configure and distinguish VM instances from each other in scripts and helps a great deal in bootstrapping cluster orchestrators such as Kubernetes, Mesos etc.
In a nutshell, the metadata server works like:
$ curl http://169.254.169.254/latest/meta-data/public-ipv4 126.96.36.199 $ curl http://169.254.169.254/latest/meta-data/instance-type t2.micro
In this article, I looked at metadata service offerings of AWS EC2, Google Compute Engine and DigitalOcean to compare them. At the time of writing, Microsoft Azure does not provide a metadata service similar to these.
Table of Contents
- DigitalOcean: Highlights
- AWS EC2: Highlights
- Google Compute Engine: Highlights
- Feature Comparison Chart
- Performance benchmarks
1. DigitalOcean: Highlights
Although DigitalOcean is not a big player or a full blown cloud provider, their VPS offering is widely adopted and their lean approach to cloud instances (droplets) are very practical to use.
- It is minimalist, simply because the environment is. It provides user-data (cloud-init), public IP, region etc.
- The directory queries can be retrieved as JSON (instead of plain text) if you
.jsonto the URL.
- It could’ve made a DigitalOcean API token available on the metadata service to automate certain operations (such as scale up) within the droplet.
- No dynamic metadata.
user-datacannot be changed after the droplet is created.
2. AWS EC2: Highlights
Amazon Web Services was pretty much the first player in the cloud market, in fact they might as well be the ones who invented the whole concept of “instance metadata service” and the IP address 169.254.169.254.
Although it is very much the de-facto standard of metadata services, I found it not modern enough and it is not really dynamic.
field (goes on like 0, 1, 2, …) when multiple instances of the same AMI are launched. This can be useful only a little.
- If there are IAM roles associated with the instance, security credentials are
available on metadata service and it rotates them automatically.
- However the metadata service does not take certain measures to protect them (read on for what GCE does).
- AWS allows you to disable metadata service for a VM (fairly reasonable requirement for security and such).
- Tags provided for the instance on EC2 Management Console are not available on the metadata service. I wonder why.
- AWS CLI does not automatically authenticate even though credentials are perhaps available in the metadata service. (EDIT: turns out this was my incompetence getting IAM roles right, it actually works)
- Confusing versioning, they have version numbers like
2015-01-05, no way to tell which one is the newest. Luckily you can just say
latestin the URL.
- It seems like there is a JSON endpoint
, but it looks like a soup rather than a well-organized JSON document.
3. Google Compute Engine: Highlights
Maybe it’s the advantage of being the last one joining the party, but GCE’s metadata service is just perfect. It provides a great deal of flexibility, it is very dynamic and yet still not rocket science.
- Google allows you to set dynamic project-wide metadata (key-value pairs, up to 32k). Any project metadata is available to all VMs within the project. Imagine this as the shared metadata among members of a machine cluster.
- Also, you can set custom instance metadata (k/v pairs and tags) on the instance and these will be available to the VM within 10 seconds. The “dynamic” aspect is a key differentiator.
gcloudcommand-line tool automatically authenticates and works out of the box when the VM is provisioned (for instance, you can delete the VM you are currently on). This is very neat.
- Speaking of dynamic metadata, if you provide
, the metadata service holds off on your request and returns a response when something changes (such as a new tag gets added or VM migration policy is changed) –although I could not get it working with external IP changes.
- The metadata service makes transparent maintenance notices available when your VM is about to get rebooted or migrated. You can subscribe to these using
- Like AWS EC2, GCE also makes service credentials available on the metadata service (such as Storage, BigQuery) and it rotates these keys automatically.
- As a security measure, to prevent accidental proxied access to the metadata service, it refuses to respond queries containing the
header. I think it is a nice touch.
- Like DigitalOcean, you can get a JSON response by adding
to your request (although this does not work for tokens in
- You have to provide
header all the time. I am not sure why this is needed.
- There is an
endpoint that is not documented. No big deal.
- The VM description is available on the metadata service, but the disk description is not.
4. Feature Comparison Chart
|SSH Public Keys||Yes||Yes||Yes|
|Dynamic custom metadata||No||No||Yes|
|Watch for changes||No||No||Yes|
|JSON response format||Yes||Meh||Yes|
|Ability to disable||No||Yes||No|
Metadata services are often meant to be used only once to bootstrap things or maybe a few times a day, so you don’t really care about performance. However, out of curiosity, I tested performance of these metadata services by sending 10,000 requests (100 requests in parallel) and see how they perform.
DigitalOcean has applied some form of throttling (should be based on an undocumented rate limit) in some test runs, but it often restored quickly afterwards.
$ ./boom -c 100 -n 10000 http://169.254.169.254/metadata/v1/id Summary: Total: 4.1009 secs. Slowest: 0.2282 secs. Fastest: 0.0086 secs. Average: 0.0406 secs. Requests/sec: 2438.4929 Total Data Received: 70000 bytes. Response Size per Request: 7 bytes. Status code distribution:  10000 responses Response time histogram: 0.009  | 0.031 |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 0.053 |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 0.075 |∎∎∎∎∎∎∎∎ 0.096  |∎∎ 0.118  | 0.140  | 0.162  | 0.184  | 0.206  | 0.228  |
Google Compute Engine performs really well at this concurrency level. When I bump up the load and the concurrency, a long tail starts to show up and server gets slower, as expected. I observed no explicit throttling.
$ ./boom -c 100 -n 10000 -h 'X-Google-Metadata-Request:True' 'http://metadata.google.internal/computeMetadata/v1/instance/id' Summary: Total: 1.7962 secs. Slowest: 0.2097 secs. Fastest: 0.0045 secs. Average: 0.0178 secs. Requests/sec: 5567.3540 Total Data Received: 200000 bytes. Response Size per Request: 20 bytes. Status code distribution:  10000 responses Response time histogram: 0.005  | 0.025 |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 0.046  |∎∎ 0.066  | 0.087  | 0.107  | 0.128  | 0.148  | 0.169  | 0.189  | 0.210  |
AWS EC2 Instance Metadata Service has performed far worse than the others under load and frequently returns HTTP 409 Conflict responses. I managed to get a fully successful run once I lowered concurrency level to <10.
$ ./boom -c 100 -n 10000 http://169.254.169.254/latest/meta-data/instance-id Summary: Total: 45.6048 secs. Slowest: 7.4325 secs. Fastest: 0.0006 secs. Average: 0.4474 secs. Requests/sec: 218.1568 Total Data Received: 2859403 bytes. Response Size per Request: 287 bytes. Status code distribution:  2086 responses  7863 responses Response time histogram: 0.001  | 0.744 |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 1.487 |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 2.230  | 2.973  | 3.717  |∎ 4.460  | 5.203 [0 ] | 5.946  | 6.689  | 7.433  | Error distribution:  Get http://169.254.169.254/latest/meta-data/instance-id: EOF
It’s clear that Google Compute Engine instance metadata service is well thought out and carefully designed. I can see it being potentially useful in many scenarios such as cluster bootstrapping.
AWS EC2 and DigitalOcean do not support custom metadata and they are not very much dynamic, so that has been a big turn off for me.
I appreciate any comments, discussion and possibly comparisons with other environments such as OpenStack Nova.
Update: Made several fixes to the article based on Alex Yukhanov’s comments.