Names
Module for modifying lines within PDB files, including functionalities for replacing atom and residue names.
This module provides a set of utility functions to manipulate the content of Protein Data Bank (PDB) files. It includes functions to perform general line modifications based on filtering criteria, as well as specific functions for replacing atom names and residue names within the ATOM and HETATM records of a PDB file. Additionally, it offers a function to standardize the atom names of water molecules.
modify_lines(pdb_lines, fn_process, fn_args, fn_filter=None, include=None, exclude=None)
¶
General function to modify specific lines in a PDB file based on filtering.
This function iterates through a list of PDB lines and applies a processing function
(fn_process
) to lines that meet certain criteria defined by an optional filter
function (fn_filter
) and inclusion/exclusion lists.
PARAMETER | DESCRIPTION |
---|---|
pdb_lines
|
An iterable of strings, where each string represents a line from a PDB file. |
fn_process
|
A callable function that takes a PDB line as its first argument,
followed by the elements of |
fn_args
|
An iterable containing additional arguments to be passed to the
|
fn_filter
|
An optional callable function that takes
a PDB line as input and returns a string. This string is then used to
check against the |
include
|
An optional list of strings. If |
exclude
|
An optional list of strings. If |
RETURNS | DESCRIPTION |
---|---|
list[str]
|
A list of modified PDB lines. Lines that did not meet the filtering criteria or were not ATOM or HETATM records are returned unchanged. |
Notes
- The
fn_filter
function should be designed to extract a specific piece of information from the PDB line (e.g., residue name, atom name) that can be used for inclusion or exclusion. - If both
include
andexclude
are provided and a filtered value is present in both, the line will be processed if it's ininclude
. Exclusion takes precedence if onlyexclude
is provided.
Examples:
To replace "CA" atom names with "CB" only in residues named "GLY":
>>> pdb_lines = [
... "ATOM 1 CA GLY A 1 ...",
... "ATOM 2 CB ALA A 2 ...",
... ]
>>> def get_resname(line):
... return parse_resname(line).strip()
>>> modified = modify_lines(
... pdb_lines,
... replace_in_pdb_line,
... ("CA ", "CB ", 13, 17),
... fn_filter=get_resname,
... include=["GLY"],
... )
>>> for line in modified:
... print(line)
ATOM 1 CB GLY A 1 ...
ATOM 2 CB ALA A 2 ...
replace_atom_names(pdb_lines, orig_atom_name, new_atom_name)
¶
Replaces all occurrences of a specified original atom name with a new atom name in a list of PDB lines.
This function iterates through the provided PDB lines and, for each ATOM or
HETATM record, it checks if the atom name matches the orig_atom_name
.
If it does, the atom name is replaced with the new_atom_name
. The atom names are
stripped of leading/trailing whitespace and left-justified
to a length of 4 characters to ensure proper formatting in the PDB file.
PARAMETER | DESCRIPTION |
---|---|
pdb_lines
|
An iterable of strings, where each string represents a line from a PDB file. |
orig_atom_name
|
The original atom name to be replaced.
TYPE:
|
new_atom_name
|
The new atom name to replace the original one.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
list[str]
|
list[str]: A list of PDB lines with the specified atom names replaced. |
Examples:
replace_residue_names(pdb_lines, orig_resname, new_resname, fn_filter=None, include=None, exclude=None)
¶
Replaces all occurrences of a specified original residue name with a new residue name in a list of PDB lines.
This function iterates through the provided PDB lines and, for each ATOM or
HETATM record, it checks if the residue name matches the orig_resname
.
If it does, the residue name is replaced with the new_resname
. The residue names
are stripped of leading/trailing whitespace and left-justified to a length of 4
characters to ensure proper formatting in the PDB file. Optionally, a filter
function and inclusion/exclusion lists can be used to control which lines
are processed.
PARAMETER | DESCRIPTION |
---|---|
pdb_lines
|
An iterable of strings, where each string represents a line from a PDB file. |
orig_resname
|
The original residue name to be replaced.
TYPE:
|
new_resname
|
The new residue name to replace the original one.
TYPE:
|
fn_filter
|
An optional callable function that takes a PDB line as input and returns a string (e.g., residue ID) for filtering. |
include
|
An optional list of strings. Only lines where the result of
|
exclude
|
An optional list of strings. Lines where the result of |
RETURNS | DESCRIPTION |
---|---|
list[str]
|
A list of PDB lines with the specified residue names replaced, subject to any provided filtering. |
Examples:
To replace all "MET" residues with "ALA":
>>> pdb_lines = [
... "ATOM 1 N MET A 1 ...",
... "ATOM 2 CA MET A 1 ...",
... ]
>>> modified_lines = replace_residue_names(pdb_lines, "MET", "ALA")
>>> for line in modified_lines:
... print(line)
ATOM 1 N ALA A 1 ...
ATOM 2 CA ALA A 1 ...
To replace "MET" with "ALA" only in residue ID "1":
>>> pdb_lines = [
... "ATOM 1 N MET A 1 ...",
... "ATOM 2 CA MET A 1 ...",
... "ATOM 3 C MET A 2 ...",
... ]
>>> def get_resid(line):
... return parse_resid(line).strip()
>>> modified_lines = replace_residue_names(
... pdb_lines, "MET", "ALA", fn_filter=get_resid, include=["1"]
... )
>>> for line in modified_lines:
... print(line)
ATOM 1 N ALA A 1 ...
ATOM 2 CA ALA A 1 ...
ATOM 3 C MET A 2 ...
run_replace_resnames(pdb_path, resname_map, output_path=None, fn_filter=None, include=None, exclude=None)
¶
Replaces multiple residue names in a PDB file based on a provided mapping.
This function reads a PDB file, iterates through a dictionary that maps original
residue names to new residue names, and applies the replacement using the
replace_residue_names
function for each mapping. The modified PDB lines are
then either returned or written to a new file. Optional filtering based on a
function and inclusion/exclusion lists can be applied during the replacement
process for each residue name in the map.
PARAMETER | DESCRIPTION |
---|---|
pdb_path
|
The path to the input PDB file.
TYPE:
|
resname_map
|
A dictionary where the keys are the original residue names to be replaced, and the values are the corresponding new residue names. |
output_path
|
The path to save the new PDB file with the replaced residue names.
If
TYPE:
|
fn_filter
|
An optional callable function that takes a PDB line as input and returns a string for filtering during each residue name replacement. |
include
|
An optional list of strings. Only lines where the result of
|
exclude
|
An optional list of strings. Lines where the result of |
RETURNS | DESCRIPTION |
---|---|
list[str]
|
A list of PDB lines with the residue names replaced according to the
|
RAISES | DESCRIPTION |
---|---|
FileNotFoundError
|
If the specified |
IOError
|
If there is an error reading the PDB file or writing to the output file. |
Examples:
To replace all "MET" residues with "ALA" and all "GLU" residues with "ASP" in "input.pdb" and save the result to "output.pdb":
>>> resname_mapping = {"MET": "ALA", "GLU": "ASP"}
>>> modified_lines = run_replace_resnames(
... "input.pdb", resname_mapping, output_path="output.pdb"
... )
To perform the same replacement but only for residues with ID "1":
run_unify_water_labels(pdb_path, atom_map=None, water_resname='WAT', water_atomnames=None, output_path=None)
¶
Ensures that water molecule atom names are consistently labeled as 'O', 'H1', and 'H2'.
This function processes a PDB file to standardize the atom names of water
molecules. It identifies water residues based on the water_resname
and then
renames their atoms to 'O' for oxygen and 'H1' and 'H2' for the two hydrogen atoms.
The hydrogen atoms are assigned 'H1' and 'H2' based on their sequential appearance
within each water residue in the PDB file.
PARAMETER | DESCRIPTION |
---|---|
pdb_path
|
The path to the input PDB file.
TYPE:
|
atom_map
|
A dictionary mapping the standard water atom names ('O', 'H1', 'H2')
to the desired names. If |
water_resname
|
The residue name used to identify water molecules in the PDB file.
TYPE:
|
water_atomnames
|
A dictionary specifying the original atom names that should
be considered as oxygen and hydrogen atoms of water. The keys should be
'O' and 'H', and the values should be iterables of possible atom names.
If |
output_path
|
The path to save the new PDB file with the unified water atom
labels. If
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Iterable[str]
|
An iterable of PDB lines with the unified water atom labels. |
RAISES | DESCRIPTION |
---|---|
FileNotFoundError
|
If the specified |
IOError
|
If there is an error reading the PDB file or writing to the output file. |
Warning
This function has not been thoroughly tested and might not handle all edge cases correctly. Use with caution.
Examples:
To unify water atom labels in "input.pdb" using the default settings and save to "unified_water.pdb":
To specify a different water residue name and atom name mapping:
>>> atom_mapping = {"O": "OXT", "H1": "HT1", "H2": "HT2"}
>>> original_water_names = {"O": ["SOL"], "H": ["HY"]}
>>> modified_lines = run_unify_water_labels(
... "input.pdb",
... atom_map=atom_mapping,
... water_resname="SOL",
... water_atomnames=original_water_names,
... output_path="custom_water.pdb",
... )