licence-normaliser

licence-normaliser logo

Comprehensive license normalsation with a three-level hierarchy.

PyPI Version Supported Python versions Build Status Documentation Status llms.txt - documentation for LLMs MIT Coverage

licence-normaliser is a comprehensive license normalisation library that maps any license representation (SPDX tokens, URLs, prose descriptions) to a canonical three-level hierarchy.

Features

  • Three-level hierarchy - LicenseFamily → LicenseName → LicenseVersion.

  • Wide format support - SPDX tokens, URLs, prose descriptions.

  • Creative Commons support - Full CC family with versions and IGO variants.

  • Publisher-specific licenses - Springer, Nature, Elsevier, Wiley, ACS, and more.

  • File-driven data - Add aliases, URLs, and patterns by editing JSON files. No Python code changes required for new synonyms.

  • Pluggable parsers - Drop in a new parser class to ingest any external license registry. Parsers implement plugin interfaces (RegistryPlugin, URLPlugin, etc.).

  • Strict mode - Raise LicenseNotFoundError instead of silently returning "unknown".

  • Caching - LRU caching for performance.

  • CLI - Command-line interface with --strict and --explain support.

Hierarchy

The library uses a three-level hierarchy:

  1. LicenseFamily - broad bucket: "cc", "osi", "copyleft", "publisher-tdm", …

  2. LicenseName - version-free: "cc-by", "cc-by-nc-nd", "mit", "wiley-tdm"

  3. LicenseVersion - fully resolved: "cc-by-3.0", "cc-by-nc-nd-4.0"

Installation

With uv:

uv pip install licence-normaliser

Or with pip:

pip install licence-normaliser

Quick start

from licence_normaliser import normalise_license

v = normalise_license("CC BY-NC-ND 4.0")
str(v)                  # "cc-by-nc-nd-4.0"   ← LicenseVersion
str(v.license)          # "cc-by-nc-nd"       ← LicenseName
str(v.license.family)   # "cc"                ← LicenseFamily

Strict mode

By default, unresolvable inputs return an "unknown" result. Pass strict=True to raise LicenseNotFoundError instead:

from licence_normaliser import normalise_license
from licence_normaliser.exceptions import LicenseNotFoundError

# Silent fallback (default)
v = normalise_license("some-unknown-string")
v.family.key  # "unknown"

# Strict: raises on unresolvable input
try:
    v = normalise_license("some-unknown-string", strict=True)
except LicenseNotFoundError as exc:
    print(exc.raw)      # original input
    print(exc.cleaned)  # cleaned form that failed lookup

Trace / Explain

Set ENABLE_LICENCE_NORMALISER_TRACE=1 or pass trace=True to get resolution traces showing how the license was matched:

from licence_normaliser import normalise_license

# Via function
v = normalise_license("cc by-nc-nd 3.0 igo", trace=True)
print(v.explain())

# Via class
from licence_normaliser import LicenseNormaliser
ln = LicenseNormaliser(trace=True)
v = ln.normalise_license("MIT")
print(v.explain())

Output shows the resolution pipeline (alias → registry → url → prose → fallback) and which source file + line matched:

Input: 'cc by-nc-nd 3.0 igo' → 'cc by-nc-nd 3.0 igo'
  [✓] alias: 'cc by-nc-nd 3.0 igo' → 'cc-by-nc-nd-3.0-igo' (line 139 in aliases.json)

Result:
  version_key: 'cc-by-nc-nd-3.0-igo'
  name_key: 'cc-by-nc-nd'
  family_key: 'cc'

The trace can also be accessed via v._trace for programmatic use.

Batch normalisation

from licence_normaliser import normalise_licenses

results = normalise_licenses(["MIT", "Apache-2.0", "CC BY 4.0"])
for r in results:
    print(r.key)

# Strict batch - raises on first unresolvable
results = normalise_licenses(["MIT", "Apache-2.0"], strict=True)

Custom plugins

The LicenseNormaliser class lets you inject custom plugin classes for specialised use cases:

from licence_normaliser import LicenseNormaliser
from licence_normaliser.parsers.alias import AliasParser
from licence_normaliser.parsers.spdx import SPDXParser

# Use only SPDX + Alias plugins (no CC, no publisher URLs)
ln = LicenseNormaliser(
    registry=[SPDXParser],
    alias=[AliasParser],
    family=[AliasParser],
    name=[AliasParser],
    cache=True,
    cache_maxsize=8192,
)

# MIT resolves via SPDX parser
assert str(ln.normalise_license("MIT")) == "mit"

# CC BY resolves via Alias
assert str(ln.normalise_license("CC BY-NC-ND 4.0")) == "cc-by-nc-nd-4.0"

Note

Explicit plugin passing is optional — LicenseNormaliser() automatically loads defaults. Use the pattern above only if you need custom plugins or reduce number of plugins loaded.

For caching, LicenseNormaliser wraps the resolution method with lru_cache. Disable it by passing cache=False for debugging:

from licence_normaliser import LicenseNormaliser

ln = LicenseNormaliser(cache=False)
result = ln.normalise_license("MIT")

Update data (CLI)

licence-normaliser update-data --force
# Fetches fresh SPDX, OpenDefinition, OSI, CreativeCommons, and ScanCode JSONs

Integration tests (public API only)

All integration tests live in src/licence_normaliser/tests/test_integration.py and only import the public API.

CLI usage

Normalise a single license:

licence-normaliser normalise "MIT"
# Output: mit

licence-normaliser normalise --full "CC BY 4.0"
# Output:
# Key: cc-by-4.0
# URL: https://creativecommons.org/licenses/by/4.0/
# License: cc-by
# Family: cc

licence-normaliser normalise --strict "totally-unknown"
# Exits with code 1 and prints an error

Batch normalise:

licence-normaliser batch MIT "Apache-2.0" "CC BY 4.0"
licence-normaliser batch --strict MIT "Apache-2.0"

Exceptions

from licence_normaliser.exceptions import (
    LicenseNormaliserError,   # base class
    LicenseNotFoundError,     # raised by strict mode
)

Testing

All tests run inside Docker:

make test

To test a specific Python version:

make test-env ENV=py312

License

MIT

Author

Artur Barseghyan <artur.barseghyan@gmail.com>

Project documentation

Contents: