For a recent project I needed to extract the registry domain, repository name, and tag from a human-friendly Docker image reference. This turned out to be surprisingly difficult as existing tools routinely failed for images outside of the central Docker Hub.
The simplest valid image references consist of only a repository name, like nginx. However, if you run docker pull nginx the actual image pulled is library/nginx:latest. This is because some references get a special treatment which is not really documented but can be extracted from the Docker source code. For comparison, a full image reference looks like registry.company.com:2000/projects/sample:v2, optionally followed by a digest. This can be broken down into the parts [<registry>/]<repository>[:tag][@digest]. This seems simple enough, but the repository may itself contain forward slashes (e.g. projects/sample) making it difficult to separate the registry address from the repository name. Docker takes an pragmatic approach to this problem:
Below is a Python implementation of this algorithm.
def get_parts(reference):
# Docker default values
registry = "registry-1.docker.io"
repository = reference
tag = "latest"
digest = None
# Parse domain part, if any
if "/" in reference:
domain, remainder = reference.split("/", 1)
if domain == "localhost" or "." in domain or ":" in domain:
registry = domain
repository = remainder
# Separate image reference and digest
if "@" in repository:
repository, digest = repository.split("@", 1)
# See if image contains a tag
if ":" in repository:
repository, tag = repository.split(":", 1)
# Handle "familiar" Docker references
if registry == "registry-1.docker.io" and "/" not in repository:
repository = "library/" + repository
return (registry, repository, tag, digest)