Utils
cli_align_pdb()
¶
Command-line interface for aligning a PDB file.
This function sets up an argument parser to allow users to align PDB files from the command line. It takes arguments for the input PDB path, the output PDB path, and an optional MDAnalysis selection string to specify which atoms should be used for the alignment.
The command-line usage is as follows:
This would align the PDB file "input.pdb" to its first frame based on the alpha carbon atoms of the protein and save the aligned structure to "aligned.pdb".
RAISES | DESCRIPTION |
---|---|
SystemExit
|
If the command-line arguments are invalid or if help is requested. |
See Also
run_align_pdb
: The underlying function that performs the PDB alignment.
cli_write_pdb()
¶
Command-line interface for writing a PDB file from topology and coordinate files.
This function sets up an argument parser to allow users to write PDB files from the command line. It takes arguments for the output path, input files, an optional MDAnalysis selection string, and an optional stride for writing frames.
The command-line usage is as follows:
python your_script_name.py output.pdb --files top.pdb traj.dcd --select "protein and name CA" --stride 10
This would write a PDB file named "output.pdb" containing only the alpha carbon atoms of the protein from the trajectory "traj.dcd" (with topology in "top.pdb"), taking every 10th frame.
RAISES | DESCRIPTION |
---|---|
SystemExit
|
If the command-line arguments are invalid or if help is requested. |
See Also
run_write_pdb
: The underlying function that performs the PDB writing.
keep_lines(lines, record_types=('ATOM', 'HETATM', 'TER', 'END', 'MODEL', 'ENDMDL'))
¶
Filters a list of PDB file lines, retaining only those that start with specified record types.
This function iterates through a given iterable of strings, which are assumed to be
lines from a PDB file. It checks if each line begins with any of the record types
provided in the record_types
tuple. Only the lines that match one of these record
types are included in the returned list.
PARAMETER | DESCRIPTION |
---|---|
lines
|
An iterable (e.g., a list) of strings, where each string represents a line from a PDB file. |
record_types
|
A tuple of strings representing the PDB record types to be kept.
The default value is
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
list[str]
|
A new list containing only the lines from the input |
Examples:
>>> pdb_lines = [
... "HEADER TITLE 04-APR-25 NONE",
... "ATOM 1 N MET A 1 10.000 20.000 30.000 1.00 20.00 N",
... "HELIX 1 1 MET A 1 THR A 4 1 4",
... "HETATM 999 O HOH 1 15.000 25.000 35.000 1.00 20.00 O",
... "TER 1 MET A 1",
... "END",
... ]
>>> kept_lines = keep_lines(
... pdb_lines, record_types=("ATOM", "HETATM", "TER", "END")
... )
>>> for line in kept_lines:
... print(line.strip())
ATOM 1 N MET A 1 10.000 20.000 30.000 1.00 20.00 N
HETATM 999 O HOH 1 15.000 25.000 35.000 1.00 20.00 O
TER 1 MET A 1
END
parse_atomname(line)
¶
Extracts the atom name from a standard PDB file line.
This function assumes the input line
adheres to the standard PDB format for ATOM
or HETATM records, where the atom name is typically located in columns 14-17 (inclusive).
PARAMETER | DESCRIPTION |
---|---|
line
|
A line from a PDB file that starts with either "ATOM" or "HETATM".
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
The atom name (e.g., "N", "CA", "O") extracted from the line. |
RAISES | DESCRIPTION |
---|---|
IndexError
|
If the input |
Examples:
parse_resid(line)
¶
Extracts the residue ID from a standard PDB file line.
This function assumes the input line
adheres to the standard PDB format for ATOM
or HETATM records, where the residue ID is typically located in columns
23-30 (inclusive).
PARAMETER | DESCRIPTION |
---|---|
line
|
A line from a PDB file that starts with either "ATOM" or "HETATM".
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
The residue ID extracted from the line. This will typically include the residue sequence number and optionally an insertion code. |
RAISES | DESCRIPTION |
---|---|
IndexError
|
If the input |
Examples:
parse_resname(line)
¶
Extracts the residue name from a standard PDB file line.
This function assumes the input line
adheres to the standard PDB format for ATOM
or HETATM records, where the residue name is typically located in columns
18-21 (inclusive).
PARAMETER | DESCRIPTION |
---|---|
line
|
A line from a PDB file that starts with either "ATOM" or "HETATM".
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
The residue name (e.g., "MET", "ALA", "HOH") extracted from the line. |
RAISES | DESCRIPTION |
---|---|
IndexError
|
If the input |
Examples:
replace_in_pdb_line(line, orig, new, start, stop)
¶
General function to replace an original string with a new string within a specific portion of a PDB line.
This function searches for a specific orig
string within a defined slice of a
PDB line. If the orig
string is found, it is replaced with the provided new
string. The replacement is constrained to the segment of the line specified by
the start
and stop
indices.
PARAMETER | DESCRIPTION |
---|---|
line
|
The original PDB line to be examined and potentially modified.
TYPE:
|
orig
|
The original string to search for within the specified slice of the
TYPE:
|
new
|
The new string to replace the
TYPE:
|
start
|
The starting index (inclusive) of the slice in the
TYPE:
|
stop
|
The stopping index (exclusive) of the slice in the
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
The modified PDB line where the |
Examples:
run_align_pdb(pdb_path, out_path, selection_str=None)
¶
Aligns the structure within a PDB file to a reference configuration based on a selection of atoms.
This function loads a PDB file into an MDAnalysis Universe, selects a subset of
atoms based on the selection_str
, and then performs a rigid-body fit of these
atoms to their initial positions in the first frame of the trajectory.
The transformation (rotation and translation) that achieves this fit
is then applied to all atoms in all frames of the trajectory. The aligned
trajectory is then written to a new PDB file.
PARAMETER | DESCRIPTION |
---|---|
pdb_path
|
The path to the input PDB file containing the structure to be aligned.
TYPE:
|
out_path
|
The path to the output PDB file where the aligned structure will be written.
TYPE:
|
selection_str
|
An MDAnalysis selection string that specifies the group
of atoms to be used for the alignment. If
TYPE:
|
RAISES | DESCRIPTION |
---|---|
FileNotFoundError
|
If the input |
IOError
|
If there is an error reading the input PDB file or writing the output PDB file. |
ValueError
|
If the |
Notes
- The alignment is performed against the conformation in the first frame of the input PDB file.
- This function is useful for removing overall translation and rotation from a structural ensemble.
Examples:
To align a PDB file "input.pdb" to its first frame using all atoms and save the result to "aligned.pdb":
To align only the backbone atoms (N, CA, C) of the protein:
run_filter_pdb(pdb_path, output_path=None, record_types=None)
¶
Reads a PDB file and keeps only the lines that contain specified record types.
This function takes the path to a PDB file, reads its contents, filters the lines
to retain only those that start with the record types specified in the
record_types
argument, and optionally writes the filtered lines to a new PDB file.
PARAMETER | DESCRIPTION |
---|---|
pdb_path
|
The path to the input PDB file.
TYPE:
|
output_path
|
The path to the output PDB file where the filtered lines will be
written. If
TYPE:
|
record_types
|
A tuple of strings representing the PDB record types to be kept.
If |
RETURNS | DESCRIPTION |
---|---|
list[str]
|
A list of strings, where each string is a line from the PDB file that starts
with one of the specified |
RAISES | DESCRIPTION |
---|---|
FileNotFoundError
|
If the |
IOError
|
If there is an error reading or writing the PDB files. |
Examples:
To filter a PDB file named "input.pdb" and save the result to "output.pdb", keeping only ATOM and TER records:
To filter a PDB file and only get the lines without writing to a new file:
run_merge_pdbs(*pdb_paths, output_path=None)
¶
Merges multiple PDB files into a single MDAnalysis Universe object.
This function takes a variable number of PDB file paths as input. It loads the first PDB file into an MDAnalysis Universe object and then iteratively adds the atoms from the subsequent PDB files. It assumes that the residue indices are consistent across all input PDB files. The merging process attempts to add missing atom types to existing residues based on the information in the later PDB files. Duplicate atoms (based on their coordinates) are removed, and the atoms within each residue are sorted by their type. Finally, some topology attributes that might interfere with other programs are removed.
PARAMETER | DESCRIPTION |
---|---|
*pdb_paths
|
A variable number of strings, where each string is the path to a PDB file. The order of the paths is important, as the first PDB file sets the initial structure, and subsequent files are used to add missing atoms.
TYPE:
|
output_path
|
The path to save the merged PDB structure to a new
file. If
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Universe
|
An MDAnalysis Universe object containing all the atoms from the input PDB files, with duplicate atoms removed and atoms within each residue sorted. |
RAISES | DESCRIPTION |
---|---|
FileNotFoundError
|
If any of the provided |
IOError
|
If there is an error reading any of the PDB files. |
Notes
- This function prioritizes the atom information from the PDB files provided later in the argument list when resolving missing atom types within a residue.
- The function removes the "segids" topology attribute from the merged Universe, as it can sometimes cause issues with programs like pdb4amber.
Examples:
To merge two PDB files, "file1.pdb" and "file2.pdb", and save the result to "merged.pdb":
To merge multiple PDB files without saving to a new file:
run_write_pdb(file_paths, output_path, selection_str=None, stride=1)
¶
Writes a PDB file from a set of topology and coordinate files, potentially applying a selection and stride.
This function takes a list of file paths that can be read by MDAnalysis to create a Universe object. It then iterates through the trajectory of this Universe, and at each time step (optionally with a specified stride), it writes the coordinates of the selected atoms to a PDB file.
PARAMETER | DESCRIPTION |
---|---|
file_paths
|
An iterable of strings, where each string is the path to a topology or coordinate file that can be loaded by MDAnalysis (e.g., a topology file like a PRMTOP or a coordinate file like a TRR, DCD, or PDB). If multiple files are provided, the first is typically the topology, and the rest are coordinate files. |
output_path
|
The path to the output PDB file that will be created or overwritten.
TYPE:
|
selection_str
|
An MDAnalysis selection string that specifies which atoms to
write to the PDB file. If
TYPE:
|
stride
|
An integer specifying the stride for writing frames from the
trajectory. Only frames where the frame number modulo
TYPE:
|
RAISES | DESCRIPTION |
---|---|
FileNotFoundError
|
If any of the files specified in |
IOError
|
If there is an error reading the input files or writing the output PDB file. |
ValueError
|
If the |
Examples:
To write all atoms from a TRR trajectory file "traj.trr" and topology file "top.pdb" to a PDB file "output.pdb":
To write only the protein atoms with a stride of 10:
>>> run_write_pdb(
... ["top.pdb", "traj.dcd"],
... "protein.pdb",
... selection_str="protein",
... stride=10,
... )
To write all atoms from a single PDB file to another PDB file (effectively copying it):
write_in_pdb_line(line, new, start, stop)
¶
General function to write a new string into a specific portion of a PDB line.
This function takes a PDB line and replaces a segment of it with a provided new string. It offers precise control over which part of the line is modified using start and stop indices.
PARAMETER | DESCRIPTION |
---|---|
line
|
The original PDB line to be modified.
TYPE:
|
new
|
The new string to be inserted into the PDB line. This string should be
formatted to match the expected width of the replaced segment, including
any necessary spaces. For example, to represent the number 42 in a field
that typically occupies 5 characters, the
TYPE:
|
start
|
The starting index (inclusive) of the slice in the
TYPE:
|
stop
|
The stopping index (exclusive) of the slice in the
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
The modified PDB line with the specified segment replaced by the |
Examples: