Parsing Docker image references

For a recent project I needed to extract the registry domain, repository name, and tag from a human-friendly Docker image reference. This turned out to be surprisingly difficult as existing tools routinely failed for images outside of the central Docker Hub.

The simplest valid image references consist of only a repository name, like nginx. However, if you run docker pull nginx the actual image pulled is library/nginx:latest. This is because some references get a special treatment which is not really documented but can be extracted from the Docker source code. For comparison, a full image reference looks like registry.company.com:2000/projects/sample:v2, optionally followed by a digest. This can be broken down into the parts [<registry>/]<repository>[:tag][@digest]. This seems simple enough, but the repository may itself contain forward slashes (e.g. projects/sample) making it difficult to separate the registry address from the repository name. Docker takes an pragmatic approach to this problem:

Look for the first forward slash in the reference.
1. If there is none, the registry defaults to registry-1.docker.io and library/ is prepended to the reference.
2. If a slash is found, only look at the part left of it.
  1. If this part is localhost or contains a dot (.) or colon (:) it is the registry’s address and gets removed from the reference.
  2. Otherwise, it is part of the repository name and the registry is registry-1.docker.io.
If the reference contains an at symbol (@) everything right of it is treated as a digest value and removed from the reference.
If the reference contains a colon (:) everything right of it is treated as a tag and removed from the reference.
The remainder is the repository name.

Below is a Python implementation of this algorithm.

def get_parts(reference):
    # Docker default values
    registry = "registry-1.docker.io"
    repository = reference
    tag = "latest"
    digest = None
    # Parse domain part, if any
    if "/" in reference:
        domain, remainder = reference.split("/", 1)
        if domain == "localhost" or "." in domain or ":" in domain:
            registry = domain
            repository = remainder
    # Separate image reference and digest
    if "@" in repository:
        repository, digest = repository.split("@", 1)
    # See if image contains a tag
    if ":" in repository:
        repository, tag = repository.split(":", 1)
    # Handle "familiar" Docker references
    if registry == "registry-1.docker.io" and "/" not in repository:
        repository = "library/" + repository
    return (registry, repository, tag, digest)