Citation
Please cite our preprint
George Bouras, Sung won Lim, Lindsay Durr, Sarah Vreugde, Alexander Goesmann, Robert A. Edwards, Oliver Schwengers Baktfold: Sensitive protein functional annotation across the microbial tree of life using structural information bioRxiv 2026.03.31.715528; doi: https://doi.org/10.64898/2026.03.31.715528
Please also be sure to cite the following core dependencies - citing all bioinformatics tools that you use helps us, so helps you get better bioinformatics tools:
- Foldseek - (https://github.com/steineggerlab/foldseek) van Kempen M, Kim S, Tumescheit C, Mirdita M, Lee J, Gilchrist C, Söding J, and Steinegger M. Fast and accurate protein structure search with Foldseek. Nature Biotechnology (2023), doi:10.1038/s41587-023-01773-0
- ProstT5 - (https://github.com/mheinzinger/ProstT5) Michael Heinzinger, Konstantin Weissenow, Joaquin Gomez Sanchez, Adrian Henkel, Martin Steinegger, Burkhard Rost. ProstT5: Bilingual language model for protein sequence and structure. NAR Genomics and Bioinformatics (2024) doi:10.1101/2023.07.23.550085
Please also consider citing these databases where relevant:
- AFDB/SwissProt - Mihaly Varadi, Damian Bertoni, Paulyna Magana, Urmila Paramval, Ivanna Pidruchna, Malarvizhi Radhakrishnan, Maxim Tsenkov, Sreenath Nair, Milot Mirdita, Jingi Yeo, Oleg Kovalevskiy, Kathryn Tunyasuvunakool, Agata Laydon, Augustin Žídek, Hamish Tomlinson, Dhavanthi Hariharan, Josh Abrahamson, Tim Green, John Jumper, Ewan Birney, Martin Steinegger, Demis Hassabis, Sameer Velankar, AlphaFold Protein Structure Database in 2024: providing structure coverage for over 214 million protein sequences, Nucleic Acids Research, Volume 52, Issue D1, 5 January 2024, Pages D368–D375, https://doi.org/10.1093/nar/gkad1011
- CATH - Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM. CATH--a hierarchic classification of protein domain structures. Structure. 1997 Aug 15;5(8):1093-108. doi: 10.1016/s0969-2126(97)00260-8. PMID: 9309224.
- PDB - H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne, The Protein Data Bank (2000) Nucleic Acids Research 28: 235-242 https://doi.org/10.1093/nar/28.1.235