[1] I. J. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge: MIT Press, 2016. [2] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention Is All You Need,” ArXiv, vol. preprint, p. ArXiv ID:1706.03762v5, 2017. [3] W. Chen and A. L. Ferguson, “Molecular enhanced sampling with autoencoders: On-the-fly collective variable discovery and accelerated free energy landscape exploration,” Journal of Computational Chemistry, vol. 39, no. 25, pp. 2079–2102, sep 2018. [4] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “ImageNet Large Scale Visual Recog- nition Challenge,” International Journal of Computer Vision, vol. 115, no. 3, pp. 211–252, 2015. [5] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, “Backpropagation Applied to Handwritten Zip Code Recognition,” Neural Computa- tion, vol. 1, no. 4, pp. 541–551, dec 1989. [6] J. Schmidhuber, “Learning Complex, Extended Sequences Using the Principle of History Compression,” Neural Computation, vol. 4, no. 2, pp. 234–242, mar 1992. [7] K. Cho, B. van Merri ̈enboer, D. Bahdanau, and Y. Bengio, “On the properties of neural ma- chine translation: Encoder–decoder approaches,” Proceedings of SSST 2014 - 8th Workshop on Syntax, Semantics and Structure in Statistical Translation, pp. 103–111, 2014. [8] D. P. Kingma and M. Welling, “Auto-Encoding Variational Bayes,” in 2nd International Con- ference on Learning Representations, ICLR 2014 - Conference Track Proceedings, no. Ml, dec 2013, p. ArXiv ID: 1312.6114. [9] A. Van Den Oord, O. Vinyals, and K. Kavukcuoglu, “Neural discrete representation learning,” arXiv, vol. preprint, p. arXiv ID:1711.00937, 2017. [10] A. Vahdat and J. Kautz, “NVAE: A deep hierarchical variational autoencoder,” arXiv, vol. preprint, p. arXiv ID:2007.03898, 2021. [11] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative Adversarial Networks,” arXiv, vol. preprint, p. arXiv ID:1406.2661, jun 2014. [12] A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow, and B. Frey, “Adversarial Autoencoders,” arXiv, vol. preprint, p. ArXiv ID:1511.05644, 2015. [13] D. Bahdanau, K. Cho, and Y. Bengio, “Neural Machine Translation by Jointly Learning to Align and Translate,” in 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, sep 2014, pp. 1–15. [14] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, “RoBERTa: A Robustly Optimized BERT Pretraining Approach,” arXiv, vol. preprint, p. arXiv ID:1907.11692, 2019. [15] J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” arXiv, vol. preprint, p. arXiv ID:1810.04805, 2019. [16] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, “Language Models are Few-Shot Learners,” in Advances in Neural Informa- tion Processing Systems, may 2020. [17] Centers for Disease Control and Prevention, “Antibiotic resistance threats in the United States,” National Center for Emerging Zoonotic and Infectious Diseases (U.S.), Atlanta, Georgia, Tech. Rep., nov 2019. [18] C. L. Ventola, “The antibiotic resistance crisis: part 1: causes and threats.” P and T : a peer-reviewed journal for formulary management, vol. 40, no. 4, pp. 277–83, apr 2015. [19] K. Bush, P. Courvalin, G. Dantas, J. Davies, B. Eisenstein, P. Huovinen, G. A. Jacoby, R. Kishony, B. N. Kreiswirth, E. Kutter, S. A. Lerner, S. Levy, K. Lewis, O. Lomovskaya, J. H. Miller, S. Mobashery, L. J. V. Piddock, S. Projan, C. M. Thomas, A. Tomasz, P. M. Tulkens, T. R. Walsh, J. D. Watson, J. Witkowski, W. Witte, G. Wright, P. Yeh, and H. I. Zgurskaya, “Tackling antibiotic resistance,” Nature Reviews Microbiology, vol. 9, no. 12, pp. 894–896, dec 2011. [20] J. Li, J. J. Koh, S. Liu, R. Lakshminarayanan, C. S. Verma, and R. W. Beuerman, “Mem- brane active antimicrobial peptides: Translating mechanistic insights to design,” Frontiers in Neuroscience, vol. 11, p. Article 73, 2017. [21] M. Magana, M. Pushpanathan, A. L. Santos, L. Leanse, M. Fernandez, A. Ioannidis, M. A. Giulianotti, Y. Apidianakis, S. Bradfute, A. L. Ferguson, A. Cherkasov, M. N. Seleem, C. Pinilla, C. de la Fuente-Nunez, T. Lazaridis, T. Dai, R. A. Houghten, R. E. Hancock, and G. P. Tegos, “The value of antimicrobial peptides in the age of resistance,” The Lancet Infectious Diseases, vol. 20, no. 9, pp. e216–e230, 2020. [22] J. J. Schneider, A. Unholzer, M. Schaller, M. Sch ̈afer-Korting, and H. C. Korting, “Human defensins,” Journal of Molecular Medicine, vol. 83, no. 8, pp. 587–595, aug 2005. [23] J. Koehbach and D. J. Craik, “The Vast Structural Diversity of Antimicrobial Peptides,” Trends in Pharmacological Sciences, vol. 40, no. 7, pp. 517–528, jul 2019. [24] J. Lei, L. Sun, S. Huang, C. Zhu, P. Li, J. He, V. Mackey, D. H. Coy, and Q. He, “The antimicrobial peptides and their potential clinical applications,” Am J Transl Res, vol. 11, no. 7, pp. 3919–3931, 2019. [25] F. Guilhelmelli, N. Vilela, P. Albuquerque, L. d. S. Derengowski, I. Silva-Pereira, and C. M. Kyaw, “Antibiotic development challenges: The various mechanisms of action of antimicro- bial peptides and of bacterial resistance,” Frontiers in Microbiology, vol. 4, no. DEC, pp. 1–12, 2013. [26] A. Bin Hafeez, X. Jiang, P. J. Bergen, and Y. Zhu, “Antimicrobial Peptides: An Update on Classifications and Databases,” International Journal of Molecular Sciences, vol. 22, no. 21, p. 11691, oct 2021. [27] G. Wang, “The antimicrobial peptide database provides a platform for decoding the design principles of naturally occurring antimicrobial peptides,” Protein Science, vol. 29, no. 1, pp. 8–18, jan 2020. [28] S. P. Piotto, L. Sessa, S. Concilio, and P. Iannelli, “YADAMP: yet another database of antimi- crobial peptides,” International Journal of Antimicrobial Agents, vol. 39, no. 4, pp. 346–351, apr 2012. [29] F. H. Waghu and S. Idicula-Thomas, “Collection of antimicrobial peptides database and its derivatives: Applications and beyond,” Protein Science, vol. 29, no. 1, pp. 36–42, jan 2020. [30] P. Cramer, “AlphaFold2 and the future of structural biology,” Nature Structural and Molecu- lar Biology, vol. 28, no. 9, pp. 704–705, sep 2021. [31] D. P. Kingma and M. Welling, “An Introduction to Variational Autoencoders,” Foundations and Trends in Machine Learning, vol. 12, no. 4, pp. 307–392, jun 2019. [32] R. G ́omez-Bombarelli, J. N. Wei, D. Duvenaud, J. M. Hern ́andez-Lobato, B. S ́anchez- Lengeling, D. Sheberla, J. Aguilera-Iparraguirre, T. D. Hirzel, R. P. Adams, and A. Aspuru- Guzik, “Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules,” ACS Central Science, vol. 4, no. 2, pp. 268–276, feb 2018. [33] D. Polykovskiy, A. Zhebrak, B. Sanchez-Lengeling, S. Golovanov, O. Tatanov, S. Belyaev, R. Kurbanov, A. Artamonov, V. Aladinskiy, M. Veselov, A. Kadurin, S. Johansson, H. Chen, S. Nikolenko, A. Aspuru-Guzik, and A. Zhavoronkov, “Molecular Sets (MOSES): A Bench- marking Platform for Molecular Generation Models,” Frontiers in Pharmacology, vol. 11, pp. 1–19, 2020. [34] A. Tucs, D. P. Tran, A. Yumoto, Y. Ito, T. Uzawa, and K. Tsuda, “Generating Ampicillin- Level Antimicrobial Peptides with Activity-Aware Generative Adversarial Networks,” ACS Omega, vol. 5, no. 36, pp. 22 847–22 851, sep 2020. [35] H. Chen, O. Engkvist, Y. Wang, M. Olivecrona, and T. Blaschke, “The rise of deep learning in drug discovery,” Drug Discovery Today, vol. 23, no. 6, pp. 1241–1250, 2018. [36] M. H. Segler, T. Kogej, C. Tyrchan, and M. P. Waller, “Generating focused molecule libraries for drug discovery with recurrent neural networks,” ACS Central Science, vol. 4, no. 1, pp. 120–131, 2018. [37] P. Das, T. Sercu, K. Wadhawan, I. Padhi, S. Gehrmann, F. Cipcigan, V. Chenthamarakshan, H. Strobelt, C. dos Santos, P.-Y. Chen, Y. Y. Yang, J. P. K. Tan, J. Hedrick, J. Crain, and A. Mojsilovic, “Accelerated antimicrobial discovery via deep generative models and molec- ular dynamics simulations,” Nature Biomedical Engineering, vol. 5, no. 6, pp. 613–623, jun 2021. [38] T. Sercu, S. Gehrmann, H. Strobelt, P. Das, I. Padhi, C. D. Santos, K. Wadhawan, and V. Chenthamarakshan, “Interactive Visual Exploration of Latent Space (IVELS) for pep- tide auto-encoder model selection,” in Deep Generative Models for Highly Structured Data, DGS@ICLR 2019 Workshop. International Conference on Learning Representations, ICLR, 2019. [39] C. M. Van Oort, J. B. Ferrell, J. M. Remington, S. Wshah, and J. Li, “AMPGAN v2: Machine Learning-Guided Design of Antimicrobial Peptides,” Journal of Chemical Information and Modeling, vol. 61, no. 5, pp. 2198–2207, 2021. [40] D. Nagarajan, T. Nagarajan, N. Roy, O. Kulkarni, S. Ravichandran, M. Mishra, D. Chakra- vortty, and N. Chandra, “Computational antimicrobial peptide design and evaluation against multidrug-resistant clinical isolates of bacteria,” Journal of Biological Chemistry, vol. 293, no. 10, pp. 3492–3509, 2018. [41] A. Capecchi, X. Cai, H. Personne, T. K ̈ohler, C. van Delden, and J. L. Reymond, “Machine learning designs non-hemolytic antimicrobial peptides,” Chemical Science, vol. 12, no. 26, pp. 9221–9232, 2021. [42] O. Dollar, N. Joshi, D. A. C. Beck, and J. Pfaendtner, “Attention-based generative models for de novo molecular design,” Chemical Science, vol. 12, no. 24, pp. 8362–8372, 2021. [43] O. Prykhodko, S. V. Johansson, P.-C. Kotsias, J. Ar ́us-Pous, E. J. Bjerrum, O. Engkvist, and H. Chen, “A de novo molecular generation method using latent vector based generative adversarial network,” Journal of Cheminformatics, vol. 11, no. 1, p. 74, dec 2019. [44] M. J. Kusner, B. Paige, and J. M. Hern ́andez-Lobato, “Grammar Variational Autoencoder,” in 34th International Conference on Machine Learning, ICML 2017, mar 2017, p. ArXiv ID: 1703.01925. [45] F. Grisoni, M. Moret, R. Lingwood, and G. Schneider, “Bidirectional Molecule Generation with Recurrent Neural Networks,” Journal of Chemical Information and Modeling, vol. 60, no. 3, pp. 1175–1183, mar 2020. [46] H. Dai, Y. Tian, B. Dai, S. Skiena, and L. Song, “Syntax-directed variational autoencoder for structured data,” in 6th International Conference on Learning Representations, ICLR 2018 - Conference Track Proceedings, 2018, p. ArXiv ID: 1802.08786. [47] B. Dai, Z. Wang, and D. Wipf, “The usual suspects? Reassessing blame for VAE posterior collapse,” arXiv, p. ArXiv ID: 1912.10702, 2019. [48] R. Chowdhury, N. Bouatta, S. Biswas, C. Floristean, A. Kharkar, K. Roy, C. Rochereau, G. Ahdritz, J. Zhang, G. M. Church, P. K. Sorger, and M. AlQuraishi, “Single-sequence pro- tein structure prediction using a language model and deep learning,” Nature Biotechnology, vol. 40, no. 11, pp. 1617–1623, nov 2022. [49] E. C. Alley, G. Khimulya, S. Biswas, M. AlQuraishi, and G. M. Church, “Unified ratio- nal protein engineering with sequence-based deep representation learning,” Nature Methods, vol. 16, no. 12, pp. 1315–1322, 2019. [50] H. J. Kim, S. E. Hong, and K. J. Cha, “seq2vec: Analyzing sequential data using multi-rank embedding vectors,” Electronic Commerce Research and Applications, vol. 43, no. August 2019, p. 101003, 2020. [51] A. Bateman, “UniProt: A worldwide hub of protein knowledge,” Nucleic Acids Research, vol. 47, no. D1, pp. D506–D515, jan 2019. [52] S. A. Pinacho-Castellanos, C. R. Garc ́ıa-Jacas, M. K. Gilson, and C. A. Brizuela, “Alignment- Free Antimicrobial Peptide Predictors: Improving Performance by a Thorough Analysis of the Largest Available Data Set,” Journal of Chemical Information and Modeling, vol. 61, no. 6, pp. 3141–3157, 2021. [53] S. Ramazi, N. Mohammadi, A. Allahverdi, E. Khalili, and P. Abdolmaleki, “A review on an- timicrobial peptides databases and the computational tools,” Database, vol. 2022, no. Febru- ary, pp. 1–17, mar 2022. [54] D. P. Kingma, T. Salimans, and M. Welling, “Variational Dropout and the Local Reparame- terization Trick,” arXiv, vol. preprint, p. ArXiv ID:1506.02557, jun 2015. [55] D. P. Kingma, T. Salimans, R. Jozefowicz, X. Chen, I. Sutskever, and M. Welling, “Improving Variational Inference with Inverse Autoregressive Flow,” in Advances in Neural Information Processing Systems, no. Nips, jun 2016, p. ArXiv ID: 1606.04934. [56] I. Tolstikhin, O. Bousquet, S. Gelly, and B. Sch ̈olkopf, “Wasserstein auto-encoders,” in 6th International Conference on Learning Representations, ICLR 2018 - Conference Track Pro- ceedings, 2018, pp. 1–20. [57] T. Karras, M. Aittala, S. Laine, E. H ̈ark ̈onen, J. Hellsten, J. Lehtinen, and T. Aila, “Alias-Free Generative Adversarial Networks,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, no. NeurIPS, jun 2021, pp. 438–448. [58] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, “Analyzing and Im- proving the Image Quality of StyleGAN,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, dec 2019, pp. 8107–8116. [59] A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Sch ̈olkopf, and A. Smola, “A kernel two- sample test,” Journal of Machine Learning Research, vol. 13, no. 25, pp. 723–773, 2012. [60] J. Vig, “A multiscale visualization of attention in the transformer model,” in ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of System Demonstrations, 2019, pp. 37–42. [61] C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E. Denton, S. K. S. Ghasemipour, B. K. Ayan, S. S. Mahdavi, R. G. Lopes, T. Salimans, J. Ho, D. J. Fleet, and M. Norouzi, “Pho- torealistic Text-to-Image Diffusion Models with Deep Language Understanding,” arXiv, vol. preprint, p. arXiv ID:2205.11487, 2022. [62] P. J. Rousseeuw, “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis,” Journal of Computational and Applied Mathematics, vol. 20, no. C, pp. 53–65, 1987. [63] P. J. Cock, T. Antao, J. T. Chang, B. A. Chapman, C. J. Cox, A. Dalke, I. Friedberg, T. Hamel- ryck, F. Kauff, B. Wilczynski, and M. J. De Hoon, “Biopython: Freely available Python tools for computational molecular biology and bioinformatics,” Bioinformatics, vol. 25, no. 11, pp. 1422–1423, 2009. [64] S. Henikoff and J. G. Henikoff, “Amino acid substitution matrices from protein blocks,” Proceedings of the National Academy of Sciences of the United States of America, vol. 89, no. 22, pp. 10 915–10 919, 1992. [65] F. Cipcigan, A. P. Carrieri, E. O. Pyzer-Knapp, R. Krishna, Y.-W. Hsiao, M. Winn, M. G. Ryadnov, C. Edge, G. Martyna, and J. Crain, “Accelerating molecular discovery through data and physical sciences: Applications to peptide-membrane interactions,” The Journal of Chemical Physics, vol. 148, no. 24, p. 241744, jun 2018. [66] H. Jeon, H. K. Ko, J. Jo, Y. Kim, and J. Seo, “Measuring and Explaining the Inter-Cluster Reliability of Multidimensional Projections,” IEEE Transactions on Visualization and Com- puter Graphics, vol. 28, no. 1, pp. 551–561, 2022. [67] R. A. Mansbach and A. L. Ferguson, “Machine learning of single molecule free energy sur- faces and the impact of chemistry and environment upon structure and dynamics,” The Jour- nal of Chemical Physics, vol. 142, no. 10, p. 105101, mar 2015. [68] M. Larralde, “Peptides,” p. github.com/althonos/peptides.py, 2021. [69] A. Ikai, “Thermostability and aliphatic index of globular proteins,” Journal of Biochemistry, vol. 88, no. 6, pp. 1895–1898, 1980. [70] H. G. Boman, “Antibacterial peptides: Basic facts and emerging concepts,” Journal of Inter- nal Medicine, vol. 254, no. 3, pp. 197–215, 2003. [71] Compiled by A. D. McNaught and A. Wilkinson., Compendium of Chemical Terminology, 2nd ed. (the ”Gold Book”). Oxford: Blackwell Scientific Publications, 1997. [72] J. Kyte and R. F. Doolittle, “A simple method for displaying the hydropathic character of a protein,” Journal of Molecular Biology, vol. 157, no. 1, pp. 105–132, may 1982. [73] K. Guruprasad, B. V. Reddy, and M. W. Pandit, “Correlation between stability of a protein and its dipeptide composition: A novel approach for predicting in vivo stability of a protein from its primary sequence,” Protein Engineering, Design and Selection, vol. 4, no. 2, pp. 155–161, 1990. [74] P. J. Mohr and B. N. Taylor, “CODATA recommended values of the fundamental physical constants: 1998,” Reviews of Modern Physics, vol. 72, no. 2, pp. 351–495, apr 2000. [75] R. Luo, L. Sun, Y. Xia, T. Qin, S. Zhang, H. Poon, and T. Y. Liu, “BioGPT: generative pre- trained transformer for biomedical text generation and mining,” arXiv, vol. preprint, p. arXiv ID:2210.10341, 2022. [76] M. Arts, V. G. Satorras, C.-W. Huang, D. Zuegner, M. Federici, C. Clementi, F. No ́e, R. Pinsler, and R. van den Berg, “Two for One: Diffusion Models and Force Fields for Coarse-Grained Molecular Dynamics,” arXiv, vol. preprint, p. arXiv ID:2302.00600, 2023. [77] G. Corso, H. St ̈ark, B. Jing, R. Barzilay, and T. Jaakkola, “DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking,” arXiv, vol. preprint, p. arXiv ID:2210.01776, 2022. [78] F. Scarselli, M. Gori, Ah Chung Tsoi, M. Hagenbuchner, and G. Monfardini, “The Graph Neural Network Model,” IEEE Transactions on Neural Networks, vol. 20, no. 1, pp. 61–80, jan 2009.