Main
Standardizes residue ID numbering in PDB files.
run_unify_numbering(pdb_path, output_path=None, reset_initial_resid=True)
¶
Unifies the atom and residue numbering within a PDB file, ensuring sequential and consistent IDs.
This function reads a PDB file and renumbers the atom serial numbers and residue IDs
to be sequential, starting from 1 for the first atom and the first residue encountered.
It also handles chain identifiers, incrementing them upon encountering a "TER" record
if reset_initial_resid
is True. Duplicate "TER" statements are removed, and "ENDMDL"
records trigger a reset of atom and residue numbering, as well as the chain identifier.
PARAMETER | DESCRIPTION |
---|---|
pdb_path
|
The path to the input PDB file.
TYPE:
|
output_path
|
The path to save the new PDB file with unified
numbering. If
TYPE:
|
reset_initial_resid
|
If
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Iterable[str]
|
An iterable of strings, where each string is a line from the PDB file with the unified atom and residue numbering. |
RAISES | DESCRIPTION |
---|---|
FileNotFoundError
|
If the specified |
IOError
|
If there is an error reading the PDB file or writing to the output file. |
Notes
- The function iterates through the PDB lines, tracking the current residue and chain IDs.
- When a "TER" record is encountered, it signifies the end of a chain, and
the chain ID is incremented if
reset_initial_resid
is True. - "ENDMDL" records indicate the start of a new model, and all numbering is reset.
- Atom serial numbers are simply incremented sequentially.
- Residue IDs are unified within each chain, potentially resetting to 1 at the start of a new chain.
Examples:
To unify the numbering in "input.pdb" and save it to "output.pdb":
To unify the numbering but keep the initial residue ID of the first chain:
>>> unified_lines = run_unify_numbering(
... "input.pdb", output_path="output.pdb", reset_initial_resid=False
... )
To unify the numbering and only get the lines without saving to a file: