Utils
Originally taken from Michael Hall's tbpore https://github.com/mbhall88/tbpore/blob/main/tbpore/external_tools.py
Also used by a variety of other tools (Dnaapler, Plassembler, Pharokka)
ExternalTool
Class for running external tools.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tool |
str
|
The path to the tool to run. |
required |
input |
str
|
The input file. |
required |
output |
str
|
The output file. |
required |
params |
str
|
The parameters to pass to the tool. |
required |
logdir |
Path
|
The directory to store log files. |
required |
Attributes:
| Name | Type | Description |
|---|---|---|
command |
List[str]
|
The command to run. |
out_log |
str
|
The path to the stdout log file. |
err_log |
str
|
The path to the stderr log file. |
Examples:
>>> tool = ExternalTool("tool", "input", "output", "params", "logdir")
>>> tool.command
["tool", "params", "output", "input"]
>>> tool.out_log
"logdir/tool_1234567890abcdef1234567890abcdef.out"
>>> tool.err_log
"logdir/tool_1234567890abcdef1234567890abcdef.err"
Source code in src/baktfold/utils/external_tools.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 | |
command_as_str: str
property
__init__(tool, input, output, params, logdir, env=None)
Initializes an ExternalTool object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tool |
str
|
The path to the tool to run. |
required |
input |
str
|
The input file. |
required |
output |
str
|
The output file. |
required |
params |
str
|
The parameters to pass to the tool. |
required |
logdir |
Path
|
The directory to store log files. |
required |
env |
Optional[Dict[str, str]]
|
Extra env vars merged with os.environ for the subprocess (e.g. CUDA_VISIBLE_DEVICES for multi-GPU foldseek). None == inherit unchanged. |
None
|
Attributes:
| Name | Type | Description |
|---|---|---|
command |
List[str]
|
The command to run. |
out_log |
str
|
The path to the stdout log file. |
err_log |
str
|
The path to the stderr log file. |
env |
Optional[Dict[str, str]]
|
Extra subprocess env vars. |
Examples:
>>> tool = ExternalTool("tool", "input", "output", "params", "logdir")
>>> tool.command
["tool", "params", "output", "input"]
>>> tool.out_log
"logdir/tool_1234567890abcdef1234567890abcdef.out"
>>> tool.err_log
"logdir/tool_1234567890abcdef1234567890abcdef.err"
Source code in src/baktfold/utils/external_tools.py
run()
Runs the tool.
Examples:
Source code in src/baktfold/utils/external_tools.py
run_download(tool, ctx=None)
staticmethod
Runs the given external tool and prints the aria2c output to the screen.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tool |
ExternalTool
|
The external tool to run. |
required |
ctx |
Optional[click.Context]
|
The click context to use. Defaults to None. |
None
|
Returns:
| Type | Description |
|---|---|
None
|
None. |
Raises:
| Type | Description |
|---|---|
subprocess.CalledProcessError
|
If there is an error calling the external tool. |
Examples:
Source code in src/baktfold/utils/external_tools.py
run_stream()
Runs the tool and streams the output to the terminal.
Examples:
Source code in src/baktfold/utils/external_tools.py
run_tool(tool, ctx=None)
staticmethod
Runs the given external tool.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tool |
ExternalTool
|
The external tool to run. |
required |
ctx |
Optional[click.Context]
|
The click context to use. Defaults to None. |
None
|
Returns:
| Type | Description |
|---|---|
None
|
None. |
Raises:
| Type | Description |
|---|---|
subprocess.CalledProcessError
|
If there is an error calling the external tool. |
Examples:
Source code in src/baktfold/utils/external_tools.py
run_tools(tools_to_run, ctx=None)
staticmethod
Runs a list of tools.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tools_to_run |
Tuple[ExternalTool]
|
The list of tools to run. |
required |
ctx |
Optional[click.Context]
|
The click context. |
None
|
Examples:
>>> tool1 = ExternalTool("tool1", "input1", "output1", "params1", "logdir")
>>> tool2 = ExternalTool("tool2", "input2", "output2", "params2", "logdir")
>>> ExternalTool.run_tools((tool1, tool2))
>>> ExternalTool.run_tools((tool1, tool2), ctx)
Source code in src/baktfold/utils/external_tools.py
log_fmt = '[<green>{time:YYYY-MM-DD HH:mm:ss}</green>] <level>{level: <8}</level> | <level>{message}</level>'
module-attribute
begin and end functions
OrderedCommands
Bases: click.Group
This class will preserve the order of subcommands, which is useful when printing --help
Source code in src/baktfold/utils/util.py
list_commands(ctx)
Returns a list of subcommands in the order they were added.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ctx |
click.Context
|
The click context. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
list |
A list of subcommands in the order they were added. |
Source code in src/baktfold/utils/util.py
baktfold_base(rel_path)
Returns the absolute path to the given relative path.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rel_path |
str
|
The relative path to the file. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
str |
The absolute path to the file. |
Source code in src/baktfold/utils/util.py
begin_baktfold(params, subcommand, no_log=False)
Begin baktfold process.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
params |
Dict[str, Any]
|
A dictionary of parameters for baktfold. |
required |
subcommand |
str
|
Subcommand indicating the baktfold operation. |
required |
no_log |
bool
|
No log file |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
Start time of the baktfold process. |
Source code in src/baktfold/utils/util.py
clean_up_temporary_files(output, prefix)
Clean up temporary files generated during the baktfold process.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
output |
Path
|
Path to the output directory. |
required |
prefix |
str
|
prefix str |
required |
Returns:
| Type | Description |
|---|---|
None
|
None |
Source code in src/baktfold/utils/util.py
echo_click(msg, log=None)
Prints a message to stdout and optionally to a log file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
msg |
str
|
The message to print. |
required |
log |
str
|
The path to the log file. |
None
|
Returns:
| Type | Description |
|---|---|
|
None |
Source code in src/baktfold/utils/util.py
end_baktfold(start_time, subcommand)
Finish baktfold process and log elapsed time.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
start_time |
float
|
Start time of the process. |
required |
subcommand |
str
|
Subcommand name indicating the baktfold operation. |
required |
Returns:
| Type | Description |
|---|---|
None
|
None |
Source code in src/baktfold/utils/util.py
get_type_rank(f)
ranks eukaryotic features 1) in order of gene -> mRNA -> CDS and gene -> tRNA dynamically adjusts if 5'UTR and 3'UTR is present
Source code in src/baktfold/utils/util.py
get_version()
Returns the version number from the VERSION file.
Returns:
| Name | Type | Description |
|---|---|---|
str |
The version number. |
print_citation()
Prints the contents of the CITATION file to stdout.
Returns:
| Type | Description |
|---|---|
|
None |
print_splash()
Prints the splash screen to stdout.
Returns:
| Type | Description |
|---|---|
|
None |
Source code in src/baktfold/utils/util.py
remove_directory(dir_path)
Remove a directory and all its contents if it exists.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dir_path |
Path
|
Path to the directory to remove. |
required |
Returns:
| Type | Description |
|---|---|
None
|
None |
Source code in src/baktfold/utils/util.py
remove_file(file_path)
Remove a file if it exists.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_path |
Path
|
Path to the file to remove. |
required |
Returns:
| Type | Description |
|---|---|
None
|
None |
Source code in src/baktfold/utils/util.py
replace_pipe_in_fasta(input_path)
Reads a FASTA with Biopython, replace '~PIPE~' with '|' in headers, and write the result.
Source code in src/baktfold/utils/util.py
sort_euk_feature_key(f)
Sorts a feature dictionary by start, locus, type rank, and stop.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
f |
dict
|
The feature dictionary. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
A tuple of the sorted values. |
Source code in src/baktfold/utils/util.py
touch_file(path)
Update the access and modification times of a file to the current time, creating the file if it does not exist.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path |
Path
|
Path to the file. |
required |
Returns:
| Type | Description |
|---|---|
None
|
None |
Source code in src/baktfold/utils/util.py
check_dependencies()
Checks the dependencies and versions of non Python programs (i.e. Foldseek)
Returns:
| Type | Description |
|---|---|
None
|
None |
Source code in src/baktfold/utils/validation.py
check_genbank_and_prokka(filepath, euk)
Validate that an input file is a readable GenBank file and check whether it was
annotated using Prokka. The function transparently supports compressed files
(e.g., .gz, .bz2, .xz, .zst) via xopen.
Validation steps
• Attempts to parse the file as GenBank using Biopython. • Logs an error and returns None if no GenBank records are found. • Checks the COMMENT field of each record for a Prokka signature ("Annotated using prokka", case-insensitive). • If no Prokka annotation is detected, a warning is logged but parsing continues as it is a valid genbank.
Parameters
str
Path to the GenBank or compressed GenBank file.
flag
whether or not the input is eukaryotic (skips prokka)
Returns
list[SeqRecord] or None A list of Biopython SeqRecord objects if parsing succeeds. Returns None if the file is not valid GenBank or cannot be parsed.
Source code in src/baktfold/utils/validation.py
instantiate_dirs(output_dir, force)
Checks and instantiates the output directory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
output_dir |
Union[str, Path]
|
Path to the output directory. |
required |
force |
bool
|
Force flag indicating whether to overwrite existing directory. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Path |
Path
|
Final output directory path. |
Source code in src/baktfold/utils/validation.py
validate_outfile(outfile, force)
Checks and instantiates the output file for baktfold convert-prokka
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
outfile |
Union[str, Path]
|
Path to the output file. |
required |
force |
bool
|
Force flag indicating whether to overwrite existing outfile. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Path |
Path
|
Final output file path. |