
FOLLOWUS
State Key Laboratory of Molecular Engineering of Polymers, Department of Macromolecular Science, AI for Polymer Science Research Center, Fudan University, Shanghai 200438, China
wying@fudan.edu.cn
收稿日期:2025-05-30,
修回日期:2025-06-19,
录用日期:2025-06-25,
网络出版日期:2025-09-10,
纸质出版日期:2025-10-05
Scan QR Code
Wang, J. F.; Sun, Y. B.; Chen, Q. T.; Ji, F. F.; Song, Y. Y.; Ruan, M. Y.; Wang, Y. OpenPoly: a polymer database empowering benchmarking and multi-property predictions. Chinese J. Polym. Sci. 2025, 43, 1749–1760
Ji-Feng Wang, Yu-Bo Sun, Qiu-Tong Chen, et al. OpenPoly: A Polymer Database Empowering Benchmarking and Multi-property Predictions[J]. Chinese journal of polymer science, 2025, 43(10): 1749-1760.
Wang, J. F.; Sun, Y. B.; Chen, Q. T.; Ji, F. F.; Song, Y. Y.; Ruan, M. Y.; Wang, Y. OpenPoly: a polymer database empowering benchmarking and multi-property predictions. Chinese J. Polym. Sci. 2025, 43, 1749–1760 DOI: 10.1007/s10118-025-3402-y.
Ji-Feng Wang, Yu-Bo Sun, Qiu-Tong Chen, et al. OpenPoly: A Polymer Database Empowering Benchmarking and Multi-property Predictions[J]. Chinese journal of polymer science, 2025, 43(10): 1749-1760. DOI: 10.1007/s10118-025-3402-y.
OpenPoly offers 3
985 curated polymer–property pairs across 26 properties. A unified benchmark shows XGBoost surpasses deep models in sparse data
and the database enables rapid identification of candidates for high-temperature dielectrics and fuel-cell membranes
advancing data-driven polymer discovery.
Advancing the integration of artificial intelligence and polymer science requires high-quality
open-source
and large-scale datasets. However
existing polymer databases often suffer from data sparsity
lack of polymer-property labels
and limited accessibility
hindering systematic modeling across property prediction tasks. Here
we present OpenPoly
a curated experimental polymer database derived from extensive literature mining and manual validation
comprising 3985 unique polymer-property data points span
ning 26 key properties. We further develop a multi-task benchmarking framework that evaluates property prediction using four encoding methods and eight representative models. Our results highlight that the optimized degree-of-polymerization encoding coupled with Morgan fingerprints achieves an optimal trade-off between computational cost and accuracy. In data-scarce condition
XGBoost outperforms deep learning models on key properties such as dielectric constant
glass transition temperature
melting point
and mechanical strength
achieving
R
2
scores of 0.65–0.87. To further showcase the practical utility of the database
we propose potential polymers for two energy-relevant applications: high temperature polymer dielectrics and fuel cell membranes. By offering a consistent and accessible benchmark and database
OpenPoly paves the way for more accurate polymer-property modeling and fosters data-driven advances in polymer genome engineering.
Wang, Y. Application-oriented design of machine learning paradigms for battery science. npj Comput. Mater. 2025 , 11 , 89..
Liu, Y.; Madanchi, A.; Anker, A. S.; Simine, L.; Deringer, V. L. The amorphous state as a frontier in computational materials design. Nat. Rev. Mater. 2025 , 10 , 228−241..
Ge, W.; De Silva, R.; Fan, Y.; Sisson, S. A.; Stenzel, M. H. Machine Learning in Polymer Research. Adv. Mater. 2025 , 37 , 2413695..
Audus, D. J.; de Pablo, J. J. Polymer informatics: opportunities and challenges. ACS Macro Lett. 2017 , 6 , 1078−1082..
Li, Y. Q.; Jiang, Y.; Wang, L. Q.; Li, J. F. Data and machine learning in polymer science. Chinese J. Polym. Sci. 2023 , 41 , 1371−1376..
Sha, W.; Li, Y.; Tang, S.; Tian, J.; Zhao, Y.; Guo, Y.; Zhang, W.; Zhang, X.; Lu, S.; Cao, Y. C. Machine learning in polymer informatics. InfoMat 2021 , 3 , 353−361..
[Otsuka , S.; Kuwajima, I.; Hosoya, J.; Xu, Y.; Yamazaki, M. PoLyInfo: Polymer database for polymeric materials design. In 2011 International Conference on Emerging Intelligent Data and Web Technologies , 2011 ; IEEE: pp 22-29..
Ma, R.; Luo, T. PI1M: a benchmark database for polymer informatics. J. Chem. Inf. Model. 2020 , 60 , 4684−4690..
Huan, T. D.; Mannodi-Kanakkithodi, A.; Kim, C.; Sharma, V.; Pilania, G.; Ramprasad, R. A polymer dataset for accelerated property prediction and design. Sci. Data 2016 , 3 , 1−10..
Kim, S.; Schroeder, C. M.; Jackson, N. E. Open macromolecular genome: Generative design of synthetically accessible polymers. ACS Polymers Au 2023 , 3 , 318−330..
Wilson, A. N.; St John, P. C.; Marin, D. H.; Hoyt, C. B.; Rognerud, E. G.; Nimlos, M. R.; Cywar, R. M.; Rorrer, N. A.; Shebek, K. M.; Broadbelt, L. J. PolyID: Artificial intelligence for discovering performance-advantaged and sustainable polymers. Macromolecules 2023 , 56 , 8547−8557..
Dagdelen, J.; Dunn, A.; Lee, S.; Walker, N.; Ros en, A. S.; Ceder, G.; Persson, K. A.; Jain, A. Structured information extraction from scientific text with large language models. Nat. Commun. 2024 , 15 , 1418..
Polak, M. P.; Morgan, D. Extracting accurate materials data from research papers with conversational language models and prompt engineering. Nat. Commun. 2024 , 15 , 1569..
Kong, J.; Panapitiya, G.; Saldanha, E. Extracting material property measurements from scientific literature with limited annotations. J. Chem. Inf. Model. 2025 , 65 , 4906−4917..
Gupta, S.; Mahmood, A.; Shetty, P.; Adeboye, A.; Ramprasad, R. Data extraction from polymer literature using large language models. Commun. Mater. 2024 , 5 , 269..
Jiang, S.; Dieng, A. B.; Webb, M. A. Property-guided generation of complex polymer topologies using variational autoencoders. npj Comput. Mater. 2024 , 10 , 139..
[Martin, T. B.; Audus, D. J. Emerging trends in machine learning: a polymer perspective. ACS Polym. Au 2023 , 3 , 239−258..
Xu, C.; Wang, Y.; Barati Farimani, A. TransPolymer: a Transformer-based language model for polymer property predictions. npj Comput. Mater. 2023 , 9 , 64..
Qiu, H.; Liu, L.; Qiu, X.; Dai, X.; Ji, X.; Sun, Z. Y. PolyNC: a natural and chemical language model for the prediction of unified polymer properties. Chem. Sci. 2024 , 15 , 534−544..
Agarwal, S.; Mahmood, A.; Ramprasad, R. Polymer solubility prediction using large language models. ACS Mater. Lett. 2025 , 7 , 2017−2023..
Liu, N.; Jafarzadeh, S.; Lattimer, B. Y.; Ni, S.; Lua, J.; Yu, Y. Harnessing large language models for data-scarce learning of polymer properties. Nat. Comput. Sci. 2025 , 5 , 245−254..
Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A. Highly accurate protein structure prediction with AlphaFold. Nature 2021 , 596 , 583−589..
Kim, C.; Chandrasekaran, A.; Huan, T. D.; Das, D.; Ramprasad, R. Polymer genome: a data-powered polymer informatics platform for property predictions. J. Phys. Chem. C 2018 , 122 , 17575−17585..
[Brandrup, J.; Immergut, E. H.; Grulke, E. A.; Abe, A.; Bloch, D. R. Polymer Handbook , 4 th ed, Wiley, New York, 1999 , p. 705−763.
[Mark, J. E. Polymer Data Handbook , 2 nd ed, Oxford University Press, New York, 2009 , p. 114-973.
Hoaglin, D. C.; Iglewicz, B.; Tukey, J. W. Performance of some resistant rules for outlier labeling. J. Am. Stat. Assoc. 1986 , 81 , 991−999..
Volk, A. A.; Epps, R. W.; Yonemoto, D. T.; Masters, B. S.; Castellano, F. N.; Reyes, K. G.; Abolhasani, M. AlphaFlow: autonomous discovery and optimization of multi-step chemistry using a self-driven fluidic lab guided by reinforcement learning. Nat. Commun. 2023 , 14 , 1403..
Finegan, D. P.; Vamvakeros, A.; Tan, C.; Heenan, T. M. M.; Daemi, S. R.; Seitzman, N.; Di Michiel, M.; Jacques, S.; Beale, A. M.; Brett, D. J. L.; et al. Spatial quantification of dynamic inter and intra particle crystallographic heterogeneities within lithium ion electrodes. Nat. Commun. 2020 , 11 , 631..
[Jiang, B.; Zhang, Z.; Lin, D.; Tang, J.; Luo, B. Semi-supervised learning with graph learning-convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2019 ; pp 11313−11320..
[Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv preprint arXiv:1710.10903 , 2017 ..
Kuenneth, C.; Ramprasad, R. polyBERT: a chemical language model to enable fully machine-driven ultrafast polymer informatics. Nat. Commun . 2023 , 14 , 4099..
Wang, Z.; Gao, Z.; Zheng, H.; Zhang, L.; Ke, G. Exploring molecular pretraining model at scale. Adv. Neural Inf. Process. Syst. 2024 , 37 , 46956−46978..
[Liu, Y.; Wang, L.; Liu, M.; Lin, Y.; Zhang, X.; Oztekin, B.; Ji, S. Spherical message passing for 3d molecular graphs. In International Conference on Learning Representations (ICLR) , 2022 ..
Hollmann, N.; Müller, S.; Purucker, L.; Krishnakumar, A.; Körfer, M.; Hoo, S. B.; Schirrmeister, R. T.; Hutter, F. Accurate predictions on small data with a tabular foundation model. Nature 2025 , 637 , 319−326..
Rogers, D.; Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 2010 , 50 , 742−754..
Doğan, N. Ö. Bland-Altman analysis: a paradigm to understand correlation and agreement. Turk. J. Emerg. Med. 2018 , 18 , 139−141..
Xu, P.; Ji, X.; Li, M.; Lu, W. Small data machine learning in materials science. npj Comput. Mater. 2023 , 9 , 42..
Zantvoort, K.; Nacke, B.; Görlich, D.; Hornstein, S.; Jacobi, C.; Funk, B. Estimati on of minimal data sets sizes for machine learning predictions in digital mental health interventions. npj Digital Med. 2024 , 7 , 361..
Grinsztajn, L.; Oyallon, E.; Varoquaux, G. Why do tree-based models still outperform deep learning on typical tabular data. Adv. Neural Inf. Process. Syst. 2022 , 35 , 507−520..
McElfresh, D.; Khandagale, S.; Valverde, J.; Prasad C, V.; Ramakrishnan, G.; Goldblum, M.; White, C. When do neural nets outperform boosted trees on tabular data. Adv. Neural Inf. Process. Syst. 2023 , 36 , 76336−76369..
Aydin, Z. E.; Ozturk, Z. K. Performance analysis of XGBoost classifier with missing data. Manchester Journal of Artificial Intelligence and Applied Sciences (MJAIAS) 2021 , 2 , 2021..
Emmanuel, T.; Maupong, T.; Mpoeleng, D.; Semong, T.; Mphago, B.; Tabona, O. A survey on missing data in machine learning. J. Big Data 2021 , 8 , 140..
Zhang, P.; Kearney, L.; Bhowmik, D.; Fox, Z.; Naskar, A. K.; Gounley, J. Transferring a molecular foundation model for pol ymer property predictions. J. Chem. Inf. Model. 2023 , 63 , 7689−7698..
[Ye, X.; Zhang, Z.; Lan, H.; Zhang, C.; Wang, L.; Lin, J.; Xu, X.; Xu, Y.; Du, L.; Tian, X. Design of thermosetting polymers with high thermal stability and enhanced processability via ML-assisted material genome approach. Macromolecules 2025 , 58 , 5090−5100..
Kazemi-Khasragh, E.; González, C.; Haranczyk, M. Toward diverse polymer property prediction using transfer learning. Comput. Mater. Sci 2024 , 244 , 113206..
[Joshi, C. K.; Bodnar, C.; Mathis, S. V.; Cohen, T.; Lio, P. On the expressive power of geometric graph neural networks. In International conference on machine learning , 2023 ; PMLR: pp 15330−15355..
Li, H.; Zhou, Y.; Liu, Y.; Li, L.; Liu, Y.; Wang, Q. Dielectric polymers for high-temperature capacitive energy storage. Chem. Soc. Rev. 2021 , 50 , 6369−6400..
Li, X.; Hu, P.; Jiang, J.; Pan, J.; Nan, C. W.; Shen, Y. High-temperature polymer composite dielectrics: energy storage performance, large-scale preparation, and device design. Adv. Mater. 2025 , 37 , 2411507..
Yang, M.; Ren, W.; Jin, Z.; Xu, E.; Shen, Y. Enhanced high-temperature energy storage performances in polymer dielectrics by synergistically optimizing band-gap and polarization of dipolar glass. Nat. Commun. 2024 , 15 , 8647..
Zhang, Q.; Xie, Q.; Wang, T.; Huang, S.; Zhang, Q. Scalable all polymer dielectrics with self-assembled nanoscale multiboundary exhibiting superior high temperature capacitive performance. Nat. Commun. 2024 , 15 , 9351..
Kamishima, T.; Inagaki, K.; Nukazuka, A.; Iwata, T.; Enomoto, Y. Synthesis and characterization of aromatic polyketones and polyetherketones derived from divanillic acid via Friedel–Crafts acylation. Eur. Polym. J. 2025 , 228 , 113823..
Tran, H.; Gurnani, R.; Kim, C.; Pilania, G.; Kwon, H. K.; Lively, R. P.; Ramprasad, R. Design of functional and sustainable polymers assisted by artificial intelligence. Nat. Rev. Mater. 2024 , 9 , 866−886..
Liu, L.; Chu, X.; Liao, J.; Huang, Y.; Li, Y.; Ge, Z.; Hickner, M. A.; Li, N. Tuning the properties of poly(2,6-dimethyl-1,4-phenylene oxide) anion exchange membranes and their performance in H 2 /O 2 fuel cells. Energy Environ. Sci. 2018 , 11 , 435−446..
Han, Y.; Xu, F.; Ji, J.; Li, Y.; Chu, F.; Lin, B. Phosphoric acid-doped cross-linked poly(phenylene oxide)-based membranes for high temperature proton exchange membrane fuel cells. Int. J. Hydrogen Energy 2024 , 50 , 1417−1426..
Guccini, V.; Carlson, A.; Yu, S.; Lindbergh, G.; Lindström, R. W.; Salazar-Alvarez, G. Highly proton conductive membranes based on carboxylated cellulose nanofibres and their performance in proton exchange membrane fuel cells. J. Mater. Chem. A 2019 , 7 , 25032−25039..
Swaghatha, A. A. K.; Cindrella, L. Improved proton conductivity in MoS 2 –NiO–Co 3 O 4 filled chitosan based proton exchange membranes for fuel cell applications. Mater. Chem. Phys. 2022 , 290 , 126654..
Rappé, A. K.; Casewit, C. J.; Colwell, K.; Goddard III, W. A.; Skiff, W. M. UFF, a full periodic table force field for mo lecular mechanics and molecular dynamics simulations. J. Am. Chem. Soc. 1992 , 114 , 10024−10035..
0
浏览量
0
Downloads
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621