Databases
calc_md5_sum(tarball_path, buffer_size=1024 * 1024)
Calculate the MD5 checksum of the given file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tarball_path |
Path
|
The path to the file for which the MD5 checksum needs to be calculated. |
required |
buffer_size |
int
|
The buffer size for reading the file. |
1024 * 1024
|
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
The MD5 checksum of the file. |
Source code in src/baktfold/databases/db.py
check_db_installation(db_dir, foldseek_gpu)
Check if the baktfold database is installed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
db_dir |
Path
|
The directory where the database is installed. |
required |
foldseek_gpu |
bool
|
Whether to install foldseek-gpu compatible baktfold db |
required |
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
True if all required files are present, False otherwise. |
Source code in src/baktfold/databases/db.py
check_prostT5_download(model_dir, model_name)
Args: model_dir (Path): Directory where the model and tokenizer is be stored. model_name (str): Name of the pre-trained T5 model.
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
bool to tell baktfold whether to download ProstT5 |
Source code in src/baktfold/databases/db.py
download(tarball_path, cache_dir)
Download the database from the given URL using HF.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tarball_path |
Path
|
The path where the downloaded tarball should be saved. |
required |
Source code in src/baktfold/databases/db.py
download_requests(db_url, tarball_path)
Downloads a file from a given URL using the requests library.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
db_url |
str
|
The URL of the file to download. |
required |
tarball_path |
Path
|
The path to save the downloaded file. |
required |
Returns:
| Type | Description |
|---|---|
|
None |
Examples:
>>> download_requests("https://zenodo.org/records/17347516/files/baktfold_db.tar.gz", Path("baktfold_db.tar.gz"))
Source code in src/baktfold/databases/db.py
download_zenodo_prostT5(model_dir, logdir, threads)
Download the ProstT5 model from Zenodo
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
db_url |
str
|
The URL of the database. |
required |
tarball_path |
Path
|
The path where the downloaded tarball should be saved. |
required |
Source code in src/baktfold/databases/db.py
foldseek_makepaddedseqdb(db_dir)
Runs the Foldseek makepaddedseqdb command on a given database directory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
db_dir |
Path
|
The path to the database directory. |
required |
Returns:
| Type | Description |
|---|---|
None
|
None |
Examples:
Source code in src/baktfold/databases/db.py
install_database(db_dir, foldseek_gpu, threads)
Install the baktfold database.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
db_dir |
Path
|
The directory where the database should be installed. |
required |
foldseek_gpu |
bool
|
Whether to install foldseek-gpu compatible baktfold db |
required |
threads |
int
|
Number of threads available (makes downloading faster) |
required |
Source code in src/baktfold/databases/db.py
untar(tarball_path, output_path, DICT)
Extract the tarball to the output path.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tarball_path |
Path
|
The path to the tarball file. |
required |
output_path |
Path
|
The path where the contents of the tarball should be extracted. |
required |
DICT |
dict
|
version dictionary |
required |
Source code in src/baktfold/databases/db.py
validate_db(database, default_dir, foldseek_gpu)
Validates the baktfold database is installed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
database |
str
|
The directory where the database is installed. |
required |
default_dir |
str
|
Default DB location |
required |
foldseek_gpu |
bool
|
Whether to install foldseek-gpu compatible baktfold db |
required |
Returns:
| Name | Type | Description |
|---|---|---|
bool |
Path
|
True if all required files are present, False otherwise. |