Skip to content

Main

Standardizes residue ID numbering in PDB files.

run_unify_numbering(pdb_path, output_path=None, reset_initial_resid=True)

Unifies the atom and residue numbering within a PDB file, ensuring sequential and consistent IDs.

This function reads a PDB file and renumbers the atom serial numbers and residue IDs to be sequential, starting from 1 for the first atom and the first residue encountered. It also handles chain identifiers, incrementing them upon encountering a "TER" record if reset_initial_resid is True. Duplicate "TER" statements are removed, and "ENDMDL" records trigger a reset of atom and residue numbering, as well as the chain identifier.

PARAMETER DESCRIPTION
pdb_path

The path to the input PDB file.

TYPE: str

output_path

The path to save the new PDB file with unified numbering. If None, no file is written, and the modified PDB lines are returned. Defaults to None.

TYPE: str | None DEFAULT: None

reset_initial_resid

If True (default), the residue numbering will start from 1 for the first residue in each chain. If False, the initial residue ID will be based on the original numbering in the PDB file for the first chain, and subsequent chains will continue sequentially.

TYPE: bool DEFAULT: True

RETURNS DESCRIPTION
Iterable[str]

An iterable of strings, where each string is a line from the PDB file with the unified atom and residue numbering.

RAISES DESCRIPTION
FileNotFoundError

If the specified pdb_path does not exist.

IOError

If there is an error reading the PDB file or writing to the output file.

Notes
  • The function iterates through the PDB lines, tracking the current residue and chain IDs.
  • When a "TER" record is encountered, it signifies the end of a chain, and the chain ID is incremented if reset_initial_resid is True.
  • "ENDMDL" records indicate the start of a new model, and all numbering is reset.
  • Atom serial numbers are simply incremented sequentially.
  • Residue IDs are unified within each chain, potentially resetting to 1 at the start of a new chain.

Examples:

To unify the numbering in "input.pdb" and save it to "output.pdb":

>>> unified_lines = run_unify_numbering("input.pdb", output_path="output.pdb")

To unify the numbering but keep the initial residue ID of the first chain:

>>> unified_lines = run_unify_numbering(
...     "input.pdb", output_path="output.pdb", reset_initial_resid=False
... )

To unify the numbering and only get the lines without saving to a file:

>>> unified_lines = run_unify_numbering("input.pdb")
>>> for line in unified_lines:
...     print(line.strip())