Skip to content

Names

Module for modifying lines within PDB files, including functionalities for replacing atom and residue names.

This module provides a set of utility functions to manipulate the content of Protein Data Bank (PDB) files. It includes functions to perform general line modifications based on filtering criteria, as well as specific functions for replacing atom names and residue names within the ATOM and HETATM records of a PDB file. Additionally, it offers a function to standardize the atom names of water molecules.

modify_lines(pdb_lines, fn_process, fn_args, fn_filter=None, include=None, exclude=None)

General function to modify specific lines in a PDB file based on filtering.

This function iterates through a list of PDB lines and applies a processing function (fn_process) to lines that meet certain criteria defined by an optional filter function (fn_filter) and inclusion/exclusion lists.

PARAMETER DESCRIPTION
pdb_lines

An iterable of strings, where each string represents a line from a PDB file.

TYPE: Iterable[str]

fn_process

A callable function that takes a PDB line as its first argument, followed by the elements of fn_args, and returns a modified PDB line. This function is responsible for the actual modification of the line.

TYPE: Callable[[str, str, str, int | None, int], str]

fn_args

An iterable containing additional arguments to be passed to the fn_process function after the PDB line itself.

TYPE: Iterable[Any]

fn_filter

An optional callable function that takes a PDB line as input and returns a string. This string is then used to check against the include and exclude lists. If None, all ATOM and HETATM lines are processed.

TYPE: Callable[[str], str] | None DEFAULT: None

include

An optional list of strings. If fn_filter is provided, only lines for which the result of fn_filter is present in this list will be processed by fn_process.

TYPE: list[str] | None DEFAULT: None

exclude

An optional list of strings. If fn_filter is provided, lines for which the result of fn_filter is present in this list will not be processed by fn_process. Defaults to None.

TYPE: list[str] | None DEFAULT: None

RETURNS DESCRIPTION
list[str]

A list of modified PDB lines. Lines that did not meet the filtering criteria or were not ATOM or HETATM records are returned unchanged.

Notes
  • The fn_filter function should be designed to extract a specific piece of information from the PDB line (e.g., residue name, atom name) that can be used for inclusion or exclusion.
  • If both include and exclude are provided and a filtered value is present in both, the line will be processed if it's in include. Exclusion takes precedence if only exclude is provided.

Examples:

To replace "CA" atom names with "CB" only in residues named "GLY":

>>> pdb_lines = [
...     "ATOM      1  CA  GLY A   1       ...",
...     "ATOM      2  CB  ALA A   2       ...",
... ]
>>> def get_resname(line):
...     return parse_resname(line).strip()
>>> modified = modify_lines(
...     pdb_lines,
...     replace_in_pdb_line,
...     ("CA ", "CB ", 13, 17),
...     fn_filter=get_resname,
...     include=["GLY"],
... )
>>> for line in modified:
...     print(line)
ATOM      1  CB  GLY A   1       ...
ATOM      2  CB  ALA A   2       ...

replace_atom_names(pdb_lines, orig_atom_name, new_atom_name)

Replaces all occurrences of a specified original atom name with a new atom name in a list of PDB lines.

This function iterates through the provided PDB lines and, for each ATOM or HETATM record, it checks if the atom name matches the orig_atom_name. If it does, the atom name is replaced with the new_atom_name. The atom names are stripped of leading/trailing whitespace and left-justified to a length of 4 characters to ensure proper formatting in the PDB file.

PARAMETER DESCRIPTION
pdb_lines

An iterable of strings, where each string represents a line from a PDB file.

TYPE: Iterable[str]

orig_atom_name

The original atom name to be replaced.

TYPE: str

new_atom_name

The new atom name to replace the original one.

TYPE: str

RETURNS DESCRIPTION
list[str]

list[str]: A list of PDB lines with the specified atom names replaced.

Examples:

>>> pdb_lines = [
...     "ATOM      1  CA  ALA A   1       ...",
...     "ATOM      2  CB  ALA A   1       ...",
... ]
>>> modified_lines = replace_atom_names(pdb_lines, "CA", "CB")
>>> for line in modified_lines:
...     print(line)
ATOM      1  CB  ALA A   1       ...
ATOM      2  CB  ALA A   1       ...

replace_residue_names(pdb_lines, orig_resname, new_resname, fn_filter=None, include=None, exclude=None)

Replaces all occurrences of a specified original residue name with a new residue name in a list of PDB lines.

This function iterates through the provided PDB lines and, for each ATOM or HETATM record, it checks if the residue name matches the orig_resname. If it does, the residue name is replaced with the new_resname. The residue names are stripped of leading/trailing whitespace and left-justified to a length of 4 characters to ensure proper formatting in the PDB file. Optionally, a filter function and inclusion/exclusion lists can be used to control which lines are processed.

PARAMETER DESCRIPTION
pdb_lines

An iterable of strings, where each string represents a line from a PDB file.

TYPE: Iterable[str]

orig_resname

The original residue name to be replaced.

TYPE: str

new_resname

The new residue name to replace the original one.

TYPE: str

fn_filter

An optional callable function that takes a PDB line as input and returns a string (e.g., residue ID) for filtering.

TYPE: Callable[[str], str] | None DEFAULT: None

include

An optional list of strings. Only lines where the result of fn_filter is in this list will have their residue names replaced.

TYPE: list[str] | None DEFAULT: None

exclude

An optional list of strings. Lines where the result of fn_filter is in this list will not have their residue names replaced.

TYPE: list[str] | None DEFAULT: None

RETURNS DESCRIPTION
list[str]

A list of PDB lines with the specified residue names replaced, subject to any provided filtering.

Examples:

To replace all "MET" residues with "ALA":

>>> pdb_lines = [
...     "ATOM      1  N   MET A   1       ...",
...     "ATOM      2  CA  MET A   1       ...",
... ]
>>> modified_lines = replace_residue_names(pdb_lines, "MET", "ALA")
>>> for line in modified_lines:
...     print(line)
ATOM      1  N   ALA A   1       ...
ATOM      2  CA  ALA A   1       ...

To replace "MET" with "ALA" only in residue ID "1":

>>> pdb_lines = [
...     "ATOM      1  N   MET A   1       ...",
...     "ATOM      2  CA  MET A   1       ...",
...     "ATOM      3  C   MET A   2       ...",
... ]
>>> def get_resid(line):
...     return parse_resid(line).strip()
>>> modified_lines = replace_residue_names(
...     pdb_lines, "MET", "ALA", fn_filter=get_resid, include=["1"]
... )
>>> for line in modified_lines:
...     print(line)
ATOM      1  N   ALA A   1       ...
ATOM      2  CA  ALA A   1       ...
ATOM      3  C   MET A   2       ...

run_replace_resnames(pdb_path, resname_map, output_path=None, fn_filter=None, include=None, exclude=None)

Replaces multiple residue names in a PDB file based on a provided mapping.

This function reads a PDB file, iterates through a dictionary that maps original residue names to new residue names, and applies the replacement using the replace_residue_names function for each mapping. The modified PDB lines are then either returned or written to a new file. Optional filtering based on a function and inclusion/exclusion lists can be applied during the replacement process for each residue name in the map.

PARAMETER DESCRIPTION
pdb_path

The path to the input PDB file.

TYPE: str

resname_map

A dictionary where the keys are the original residue names to be replaced, and the values are the corresponding new residue names.

TYPE: dict[str, str]

output_path

The path to save the new PDB file with the replaced residue names. If None, no file is written, and the modified PDB lines are returned.

TYPE: str | None DEFAULT: None

fn_filter

An optional callable function that takes a PDB line as input and returns a string for filtering during each residue name replacement.

TYPE: Callable[[str], str] | None DEFAULT: None

include

An optional list of strings. Only lines where the result of fn_filter is in this list will have their residue names replaced for each mapping in resname_map. Defaults to None.

TYPE: list[str] | None DEFAULT: None

exclude

An optional list of strings. Lines where the result of fn_filter is in this list will not have their residue names replaced for each mapping in resname_map.

TYPE: list[str] | None DEFAULT: None

RETURNS DESCRIPTION
list[str]

A list of PDB lines with the residue names replaced according to the resname_map, subject to any provided filtering.

RAISES DESCRIPTION
FileNotFoundError

If the specified pdb_path does not exist.

IOError

If there is an error reading the PDB file or writing to the output file.

Examples:

To replace all "MET" residues with "ALA" and all "GLU" residues with "ASP" in "input.pdb" and save the result to "output.pdb":

>>> resname_mapping = {"MET": "ALA", "GLU": "ASP"}
>>> modified_lines = run_replace_resnames(
...     "input.pdb", resname_mapping, output_path="output.pdb"
... )

To perform the same replacement but only for residues with ID "1":

>>> def get_resid(line):
...     return parse_resid(line).strip()
>>> resname_mapping = {"MET": "ALA", "GLU": "ASP"}
>>> modified_lines = run_replace_resnames(
...     "input.pdb",
...     resname_mapping,
...     output_path="filtered_output.pdb",
...     fn_filter=get_resid,
...     include=["1"],
... )

run_unify_water_labels(pdb_path, atom_map=None, water_resname='WAT', water_atomnames=None, output_path=None)

Ensures that water molecule atom names are consistently labeled as 'O', 'H1', and 'H2'.

This function processes a PDB file to standardize the atom names of water molecules. It identifies water residues based on the water_resname and then renames their atoms to 'O' for oxygen and 'H1' and 'H2' for the two hydrogen atoms. The hydrogen atoms are assigned 'H1' and 'H2' based on their sequential appearance within each water residue in the PDB file.

PARAMETER DESCRIPTION
pdb_path

The path to the input PDB file.

TYPE: str

atom_map

A dictionary mapping the standard water atom names ('O', 'H1', 'H2') to the desired names. If None, it defaults to {'O': 'O', 'H1': 'H1', 'H2': 'H2'}. This allows for customization of the output atom names if needed. Defaults to None.

TYPE: dict[str, str] | None DEFAULT: None

water_resname

The residue name used to identify water molecules in the PDB file.

TYPE: str DEFAULT: 'WAT'

water_atomnames

A dictionary specifying the original atom names that should be considered as oxygen and hydrogen atoms of water. The keys should be 'O' and 'H', and the values should be iterables of possible atom names. If None, it defaults to {'O': ['OW'], 'H': ['HW']}.

TYPE: dict[str, Iterable[str]] | None DEFAULT: None

output_path

The path to save the new PDB file with the unified water atom labels. If None, no file is written, and the modified PDB lines are returned.

TYPE: str | None DEFAULT: None

RETURNS DESCRIPTION
Iterable[str]

An iterable of PDB lines with the unified water atom labels.

RAISES DESCRIPTION
FileNotFoundError

If the specified pdb_path does not exist.

IOError

If there is an error reading the PDB file or writing to the output file.

Warning

This function has not been thoroughly tested and might not handle all edge cases correctly. Use with caution.

Examples:

To unify water atom labels in "input.pdb" using the default settings and save to "unified_water.pdb":

>>> modified_lines = run_unify_water_labels(
...     "input.pdb", output_path="unified_water.pdb"
... )

To specify a different water residue name and atom name mapping:

>>> atom_mapping = {"O": "OXT", "H1": "HT1", "H2": "HT2"}
>>> original_water_names = {"O": ["SOL"], "H": ["HY"]}
>>> modified_lines = run_unify_water_labels(
...     "input.pdb",
...     atom_map=atom_mapping,
...     water_resname="SOL",
...     water_atomnames=original_water_names,
...     output_path="custom_water.pdb",
... )