Skip to content

Extract

Command-line interface for extracting atoms or frames from molecular structure files.

add_extract_subparser(subparsers)

Adds the extract subcommand to the Simlify CLI for extracting atoms or frames from molecular structure files.

This function configures an argparse subparser named 'extract' which allows users to specify the topology and coordinate files, define atom selections using MDAnalysis selection syntax, specify trajectory frames to extract, and set the output file path and overwrite options.

PARAMETER DESCRIPTION
subparsers

An argparse._SubParsersAction object where the extract subcommand will be added. This is typically obtained by calling add_subparsers() on an ArgumentParser object.

RETURNS DESCRIPTION

argparse.ArgumentParser: The configured argparse parser object for the

extract subcommand.

The extract subcommand accepts the following arguments:

positional arguments

topo Path to the topology file (e.g., PDB, PRMTOP). This argument is optional, but at least one of topo or --coords must be provided. output Path to the coordinate file where the extracted atoms or frames will be saved (e.g., output.pdb, output.nc).

optional arguments

--select One or more strings specifying the atom selection using the MDAnalysis selection language. Multiple strings will be joined by spaces. This option allows you to select specific atoms based on various criteria (e.g., "protein", "resid 1-10", "name CA"). --frames One or more integers specifying the indices of the trajectory frames to extract. If this option is not provided, all frames will be processed. --coords One or more paths to coordinate files (e.g., trajectory files like TRR, DCD, XTC, NetCDF). If multiple coordinate files are provided, they will be concatenated and treated as a single trajectory. At least one coordinate file must be provided if no topology file is given. --overwrite If this flag is present, the output coordinate file will be overwritten if it already exists. By default, the program will prevent overwriting existing files.

The function sets the default action for this subparser to be the cli_extract_atoms function, which will be called when the user invokes the 'extract' subcommand.

cli_extract_atoms(args, parser)

Command-line interface function to extract atoms or frames from molecular structures.

This function serves as the entry point when the user invokes the extract subcommand of the Simlify CLI. It receives the parsed command-line arguments and the argument parser object. It performs a basic validation to ensure that either a topology file (topo) or at least one coordinate file (coords) is provided. If neither is present, it prints the help message for the extract subcommand and exits. Otherwise, it prepares the arguments and calls the extract_atoms function from the simlify.structure module to perform the actual extraction.

PARAMETER DESCRIPTION
args

An argparse.Namespace object containing the parsed command-line arguments for the extract subcommand.

TYPE: Namespace

parser

The argparse.ArgumentParser object for the extract subcommand, used to display help messages if necessary.

TYPE: ArgumentParser

RETURNS DESCRIPTION
None

None

The function retrieves the following arguments from the args object: - topo: Path to the topology file. - output: Path to the output coordinate file. - select: A list of strings representing the MDAnalysis atom selection. - frames: A list of integers representing the trajectory frames to extract. - coords: A list of paths to coordinate files. - overwrite: A boolean indicating whether to overwrite the output file.

It then processes the select argument by joining the list of strings into a single selection string. Finally, it calls the extract_atoms function with the extracted and processed arguments to perform the atom or frame extraction.

Example Usage

To extract protein atoms from frames 0, 10, and 20 of a trajectory:

simlify structure extract topology.prmtop output.pdb \
    --select protein --frames 0 10 20 --coords trajectory.nc

To extract all atoms from a specific residue range and overwrite the output file:

simlify structure extract input.pdb extracted.pdb \
    --select 'resid 1 to 50' --overwrite