Utils

`cli_align_pdb()` ¶

Command-line interface for aligning a PDB file.

This function sets up an argument parser to allow users to align PDB files from the command line. It takes arguments for the input PDB path, the output PDB path, and an optional MDAnalysis selection string to specify which atoms should be used for the alignment.

The command-line usage is as follows:

python your_script_name.py input.pdb aligned.pdb --selection "protein and name CA"

This would align the PDB file "input.pdb" to its first frame based on the alpha carbon atoms of the protein and save the aligned structure to "aligned.pdb".

RAISES	DESCRIPTION
`SystemExit`	If the command-line arguments are invalid or if help is requested.

`cli_write_pdb()` ¶

Command-line interface for writing a PDB file from topology and coordinate files.

This function sets up an argument parser to allow users to write PDB files from the command line. It takes arguments for the output path, input files, an optional MDAnalysis selection string, and an optional stride for writing frames.

The command-line usage is as follows:

python your_script_name.py output.pdb --files top.pdb traj.dcd --select "protein and name CA" --stride 10

This would write a PDB file named "output.pdb" containing only the alpha carbon atoms of the protein from the trajectory "traj.dcd" (with topology in "top.pdb"), taking every 10^th frame.

RAISES	DESCRIPTION
`SystemExit`	If the command-line arguments are invalid or if help is requested.

`keep_lines(lines, record_types=('ATOM', 'HETATM', 'TER', 'END', 'MODEL', 'ENDMDL'))` ¶

Filters a list of PDB file lines, retaining only those that start with specified record types.

This function iterates through a given iterable of strings, which are assumed to be lines from a PDB file. It checks if each line begins with any of the record types provided in the record_types tuple. Only the lines that match one of these record types are included in the returned list.

PARAMETER	DESCRIPTION
`lines`	An iterable (e.g., a list) of strings, where each string represents a line from a PDB file. TYPE: `Iterable[str]`
`record_types`	A tuple of strings representing the PDB record types to be kept. The default value is `("ATOM", "HETATM", "TER", "END", "MODEL", "ENDMDL")`, which includes the most common record types. TYPE: `tuple[str, ...]` DEFAULT: `('ATOM', 'HETATM', 'TER', 'END', 'MODEL', 'ENDMDL')`

RETURNS	DESCRIPTION
`list[str]`	A new list containing only the lines from the input `lines` that start with one of the specified `record_types`. The order of the lines in the output list will be the same as in the input.

Examples:

>>> pdb_lines = [
...     "HEADER    TITLE                                                   04-APR-25   NONE",
...     "ATOM      1  N   MET A   1      10.000  20.000  30.000  1.00 20.00           N",
...     "HELIX    1   1 MET A    1  THR A   4  1                                    4",
...     "HETATM  999  O   HOH     1      15.000  25.000  35.000  1.00 20.00           O",
...     "TER     1     MET A   1",
...     "END",
... ]
>>> kept_lines = keep_lines(
...     pdb_lines, record_types=("ATOM", "HETATM", "TER", "END")
... )
>>> for line in kept_lines:
...     print(line.strip())
ATOM      1  N   MET A   1      10.000  20.000  30.000  1.00 20.00           N
HETATM  999  O   HOH     1      15.000  25.000  35.000  1.00 20.00           O
TER     1     MET A   1
END

`parse_atomname(line)` ¶

Extracts the atom name from a standard PDB file line.

This function assumes the input line adheres to the standard PDB format for ATOM or HETATM records, where the atom name is typically located in columns 14-17 (inclusive).

PARAMETER	DESCRIPTION
`line`	A line from a PDB file that starts with either "ATOM" or "HETATM". TYPE: `str`

RETURNS	DESCRIPTION
`str`	The atom name (e.g., "N", "CA", "O") extracted from the line.

RAISES	DESCRIPTION
`IndexError`	If the input `line` is shorter than 17 characters, accessing the atom name slice will result in an `IndexError`.

Examples:

>>> line = "ATOM      1  N   MET A   1      10.000  20.000  30.000  1.00 20.00           N"
>>> atomname = parse_atomname(line)
>>> print(atomname)
'N'

`parse_resid(line)` ¶

Extracts the residue ID from a standard PDB file line.

This function assumes the input line adheres to the standard PDB format for ATOM or HETATM records, where the residue ID is typically located in columns 23-30 (inclusive).

PARAMETER	DESCRIPTION
`line`	A line from a PDB file that starts with either "ATOM" or "HETATM". TYPE: `str`

RETURNS	DESCRIPTION
`str`	The residue ID extracted from the line. This will typically include the residue sequence number and optionally an insertion code.

RAISES	DESCRIPTION
`IndexError`	If the input `line` is shorter than 30 characters, accessing the residue ID slice will result in an `IndexError`.

Examples:

>>> line = "ATOM      1  N   MET A   1      10.000  20.000  30.000  1.00 20.00           N"
>>> resid = parse_resid(line)
>>> print(resid)
'      1'

`parse_resname(line)` ¶

Extracts the residue name from a standard PDB file line.

This function assumes the input line adheres to the standard PDB format for ATOM or HETATM records, where the residue name is typically located in columns 18-21 (inclusive).

PARAMETER	DESCRIPTION
`line`	A line from a PDB file that starts with either "ATOM" or "HETATM". TYPE: `str`

RETURNS	DESCRIPTION
`str`	The residue name (e.g., "MET", "ALA", "HOH") extracted from the line.

RAISES	DESCRIPTION
`IndexError`	If the input `line` is shorter than 21 characters, accessing the residue name slice will result in an `IndexError`.

Examples:

>>> line = "ATOM      1  N   MET A   1      10.000  20.000  30.000  1.00 20.00           N"
>>> resname = parse_resname(line)
>>> print(resname)
'MET'

`replace_in_pdb_line(line, orig, new, start, stop)` ¶

General function to replace an original string with a new string within a specific portion of a PDB line.

This function searches for a specific orig string within a defined slice of a PDB line. If the orig string is found, it is replaced with the provided new string. The replacement is constrained to the segment of the line specified by the start and stop indices.

PARAMETER	DESCRIPTION
`line`	The original PDB line to be examined and potentially modified. TYPE: `str`
`orig`	The original string to search for within the specified slice of the `line`. TYPE: `str`
`new`	The new string to replace the `orig` string if it is found. This string should be formatted to match the expected width of the replaced segment, including any necessary spaces. For example, to represent the residue number 42, the `new` string should be `" 42"` if the residue number field occupies 5 characters. TYPE: `str`
`start`	The starting index (inclusive) of the slice in the `line` to be searched and where the replacement will occur. If `None`, the search and replacement start from the beginning of the line. TYPE: `int \| None`
`stop`	The stopping index (exclusive) of the slice in the `line` to be searched and where the replacement will occur. If `None`, the search and replacement continue to the end of the line. TYPE: `int \| None`

RETURNS	DESCRIPTION
`str`	The modified PDB line where the `orig` string has been replaced by the `new` string within the specified slice, or the original line if `orig` was not found.

Examples:

>>> line = "ATOM      1  N   MET A   1      10.000  20.000  30.000  1.00 20.00           N"
>>> new_line = replace_in_pdb_line(line, "MET", "ALA", 17, 20)
>>> print(new_line)
ATOM      1  N   ALA A   1      10.000  20.000  30.000  1.00 20.00           N

`run_align_pdb(pdb_path, out_path, selection_str=None)` ¶

Aligns the structure within a PDB file to a reference configuration based on a selection of atoms.

This function loads a PDB file into an MDAnalysis Universe, selects a subset of atoms based on the selection_str, and then performs a rigid-body fit of these atoms to their initial positions in the first frame of the trajectory. The transformation (rotation and translation) that achieves this fit is then applied to all atoms in all frames of the trajectory. The aligned trajectory is then written to a new PDB file.

PARAMETER	DESCRIPTION
`pdb_path`	The path to the input PDB file containing the structure to be aligned. TYPE: `str`
`out_path`	The path to the output PDB file where the aligned structure will be written. TYPE: `str`
`selection_str`	An MDAnalysis selection string that specifies the group of atoms to be used for the alignment. If `None`, all atoms in the structure are used for alignment. TYPE: `str \| None` DEFAULT: `None`

RAISES	DESCRIPTION
`FileNotFoundError`	If the input `pdb_path` does not exist.
`IOError`	If there is an error reading the input PDB file or writing the output PDB file.
`ValueError`	If the `selection_str` does not select any atoms.

Notes

The alignment is performed against the conformation in the first frame of the input PDB file.
This function is useful for removing overall translation and rotation from a structural ensemble.

Examples:

To align a PDB file "input.pdb" to its first frame using all atoms and save the result to "aligned.pdb":

>>> run_align_pdb("input.pdb", "aligned.pdb")

To align only the backbone atoms (N, CA, C) of the protein:

>>> run_align_pdb(
...     "input.pdb",
...     "aligned_backbone.pdb",
...     selection_str="protein and backbone",
... )

`run_filter_pdb(pdb_path, output_path=None, record_types=None)` ¶

Reads a PDB file and keeps only the lines that contain specified record types.

This function takes the path to a PDB file, reads its contents, filters the lines to retain only those that start with the record types specified in the record_types argument, and optionally writes the filtered lines to a new PDB file.

PARAMETER	DESCRIPTION
`pdb_path`	The path to the input PDB file. TYPE: `str`
`output_path`	The path to the output PDB file where the filtered lines will be written. If `None`, no new file is created, and the filtered lines are only returned. TYPE: `str \| None` DEFAULT: `None`
`record_types`	A tuple of strings representing the PDB record types to be kept. If `None`, the default record types `("ATOM", "HETATM", "TER", "END", "MODEL", "ENDMDL")` are used. TYPE: `tuple[str, ...] \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`list[str]`	A list of strings, where each string is a line from the PDB file that starts with one of the specified `record_types`.

RAISES	DESCRIPTION
`FileNotFoundError`	If the `pdb_path` does not exist.
`IOError`	If there is an error reading or writing the PDB files.

Examples:

To filter a PDB file named "input.pdb" and save the result to "output.pdb", keeping only ATOM and TER records:

>>> run_filter_pdb("input.pdb", "output.pdb", record_types=("ATOM", "TER"))

To filter a PDB file and only get the lines without writing to a new file:

>>> filtered_lines = run_filter_pdb(
...     "input.pdb", record_types=("ATOM", "HETATM")
... )
>>> for line in filtered_lines:
...     print(line.strip())

`run_merge_pdbs(*pdb_paths, output_path=None)` ¶

Merges multiple PDB files into a single MDAnalysis Universe object.

This function takes a variable number of PDB file paths as input. It loads the first PDB file into an MDAnalysis Universe object and then iteratively adds the atoms from the subsequent PDB files. It assumes that the residue indices are consistent across all input PDB files. The merging process attempts to add missing atom types to existing residues based on the information in the later PDB files. Duplicate atoms (based on their coordinates) are removed, and the atoms within each residue are sorted by their type. Finally, some topology attributes that might interfere with other programs are removed.

PARAMETER	DESCRIPTION
`*pdb_paths`	A variable number of strings, where each string is the path to a PDB file. The order of the paths is important, as the first PDB file sets the initial structure, and subsequent files are used to add missing atoms. TYPE: `str` DEFAULT: `()`
`output_path`	The path to save the merged PDB structure to a new file. If `None`, no file is written. TYPE: `str \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`Universe`	An MDAnalysis Universe object containing all the atoms from the input PDB files, with duplicate atoms removed and atoms within each residue sorted.

RAISES	DESCRIPTION
`FileNotFoundError`	If any of the provided `pdb_paths` do not exist.
`IOError`	If there is an error reading any of the PDB files.

Notes

This function prioritizes the atom information from the PDB files provided later in the argument list when resolving missing atom types within a residue.
The function removes the "segids" topology attribute from the merged Universe, as it can sometimes cause issues with programs like pdb4amber.

Examples:

To merge two PDB files, "file1.pdb" and "file2.pdb", and save the result to "merged.pdb":

>>> merged_universe = run_merge_pdbs(
...     "file1.pdb", "file2.pdb", output_path="merged.pdb"
... )

To merge multiple PDB files without saving to a new file:

>>> merged_universe = run_merge_pdbs("file1.pdb", "file2.pdb", "file3.pdb")

`run_write_pdb(file_paths, output_path, selection_str=None, stride=1)` ¶

Writes a PDB file from a set of topology and coordinate files, potentially applying a selection and stride.

This function takes a list of file paths that can be read by MDAnalysis to create a Universe object. It then iterates through the trajectory of this Universe, and at each time step (optionally with a specified stride), it writes the coordinates of the selected atoms to a PDB file.

PARAMETER	DESCRIPTION
`file_paths`	An iterable of strings, where each string is the path to a topology or coordinate file that can be loaded by MDAnalysis (e.g., a topology file like a PRMTOP or a coordinate file like a TRR, DCD, or PDB). If multiple files are provided, the first is typically the topology, and the rest are coordinate files. TYPE: `Iterable[str]`
`output_path`	The path to the output PDB file that will be created or overwritten. TYPE: `str`
`selection_str`	An MDAnalysis selection string that specifies which atoms to write to the PDB file. If `None`, all atoms in the current frame are written. TYPE: `str \| None` DEFAULT: `None`
`stride`	An integer specifying the stride for writing frames from the trajectory. Only frames where the frame number modulo `stride` is 0 will be written. A stride of 1 means every frame is written. Defaults to 1. TYPE: `int` DEFAULT: `1`

RAISES	DESCRIPTION
`FileNotFoundError`	If any of the files specified in `file_paths` do not exist.
`IOError`	If there is an error reading the input files or writing the output PDB file.
`ValueError`	If the `file_paths` iterable is empty.

Examples:

To write all atoms from a TRR trajectory file "traj.trr" and topology file "top.pdb" to a PDB file "output.pdb":

>>> run_write_pdb(["top.pdb", "traj.trr"], "output.pdb")

To write only the protein atoms with a stride of 10:

>>> run_write_pdb(
...     ["top.pdb", "traj.dcd"],
...     "protein.pdb",
...     selection_str="protein",
...     stride=10,
... )

To write all atoms from a single PDB file to another PDB file (effectively copying it):

>>> run_write_pdb(["input.pdb"], "output.pdb")

`write_in_pdb_line(line, new, start, stop)` ¶

General function to write a new string into a specific portion of a PDB line.

This function takes a PDB line and replaces a segment of it with a provided new string. It offers precise control over which part of the line is modified using start and stop indices.

PARAMETER	DESCRIPTION
`line`	The original PDB line to be modified. TYPE: `str`
`new`	The new string to be inserted into the PDB line. This string should be formatted to match the expected width of the replaced segment, including any necessary spaces. For example, to represent the number 42 in a field that typically occupies 5 characters, the `new` string should be `" 42"`. TYPE: `str`
`start`	The starting index (inclusive) of the slice in the `line` to be replaced. If `None`, the replacement starts from the beginning of the line. TYPE: `int \| None`
`stop`	The stopping index (exclusive) of the slice in the `line` to be replaced. If `None`, the replacement continues to the end of the line. TYPE: `int \| None`

RETURNS	DESCRIPTION
`str`	The modified PDB line with the specified segment replaced by the `new` string.

Examples:

>>> line = "ATOM      1  N   MET A   1      10.000  20.000  30.000  1.00 20.00           N"
>>> new_line = write_in_pdb_line(line, "   42", 6, 11)
>>> print(new_line)
ATOM     42  N   MET A   1      10.000  20.000  30.000  1.00 20.00           N

Utils

cli_align_pdb() ¶

cli_write_pdb() ¶

keep_lines(lines, record_types=('ATOM', 'HETATM', 'TER', 'END', 'MODEL', 'ENDMDL')) ¶

parse_atomname(line) ¶

parse_resid(line) ¶

parse_resname(line) ¶

replace_in_pdb_line(line, orig, new, start, stop) ¶

run_align_pdb(pdb_path, out_path, selection_str=None) ¶

run_filter_pdb(pdb_path, output_path=None, record_types=None) ¶

run_merge_pdbs(*pdb_paths, output_path=None) ¶

run_write_pdb(file_paths, output_path, selection_str=None, stride=1) ¶

write_in_pdb_line(line, new, start, stop) ¶

`cli_align_pdb()` ¶

`cli_write_pdb()` ¶

`keep_lines(lines, record_types=('ATOM', 'HETATM', 'TER', 'END', 'MODEL', 'ENDMDL'))` ¶

`parse_atomname(line)` ¶

`parse_resid(line)` ¶

`parse_resname(line)` ¶

`replace_in_pdb_line(line, orig, new, start, stop)` ¶

`run_align_pdb(pdb_path, out_path, selection_str=None)` ¶

`run_filter_pdb(pdb_path, output_path=None, record_types=None)` ¶

`run_merge_pdbs(*pdb_paths, output_path=None)` ¶

`run_write_pdb(file_paths, output_path, selection_str=None, stride=1)` ¶

`write_in_pdb_line(line, new, start, stop)` ¶