参考文献
[1] 孙聪, 张亚. 梯度法简述[J]. 运筹学学报, 2021, 25(03): 119-132. https://doi.org/10.15960/j.cnki.issn.1007-6093.2021.03.007. [2] Robbins H, Monro S. A stochastic approximation method[J]. The annals of mathematical statistics, 1951: 400-407. https://doi.org/10.1214/aoms/1177729586 [3] Gower R M, Loizou N, Qian X, et al. SGD: General Analysis and Improved Rates[C]. 2019. https://doi.org/10.48550/arXiv.1901.09401. [4] Léon Bottou, Curtis F E, Nocedal J. Optimization Methods for Large-Scale Machine Learning[J]. Society for Industrial and Applied Mathematics, 2018, 60(2): 223-311. https://doi.org/10.1137/16M1080173. [5] Needell D, Srebro N, Ward R. Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm[J]. Advances in neural information processing systems, 2014, 27. https://doi.org/10.1007/s10107-015-0864-7. [6] Gower R M, Loizou N, Qian X, et al. SGD: General Analysis and Improved Rates[C]. 2019. https://doi.org/10.48550/arXiv.1901.09401. [7] Ghadimi S, Lan G. Stochastic First-and Zeroth-order Methods for Nonconvex Stochastic Programming[J]. SIAM Journal on Optimization, 2012, 23(4): 2341-2368. https://doi.org/10.1137/120880811. [8] Wang X, Yuan Y X. On the Convergence of Stochastic Gradient Descent with Bandwidth-based Step Size[J]. Journal of Machine Learning Research, 2023, 24(1): 49. https://doi.org/10.12677/jisp.2025.142025. [9] Kingma D, Ba J. Adam. A Method for Stochastic Optimization[J]. Computer Science, 2014: 1412, 6980. https://doi.org/10.48550/arXiv.1412.6980. [10] Duchi J, Hazan E, Singer Y. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization[C]. 2011: 257-269. https://doi.org/10.1109/TNN.2011.2146788. [11] Tieleman T, Hinton G. Divide the gradient by a running average of its recent magnitude. coursera: Neural networks for machine learning[J]. Technical report, 2017. [12] Zeiler M D. ADADELTA: An adaptive learning rate method[J]. Computer Science, 2012: 1212, 5701. https://doi.org/10.48550/arXiv.1212.5701. [13] Tan C, Ma S, Dai Y H, et al. Barzilai-Borwein Step Size for Stochastic Gradient Descent[C]//The Thirtieth Annual Conference on Neural Information Processing Systems(NIPS). Curran Associates Inc. 2016. https://doi.org/10.48550/arXiv.1605.04131. [14] Barzilai J, Borwein J M. Two-Point Step Size Gradient Methods[J]. Ima J. numer. anal, 1988, 8(1): 141-148. https://doi.org/10.1093/imanum/8.1.141. [15] Vaswani S, Laradji I, Gidel G, et al. Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates[C]//Advances in Neural Information Processing Systems 32, Volume 5 of 20: 32nd Conference on Neural Information Processing Systems(NeurIPS 2019). Vancouver(CA). 8-14 December 2019. 2020. [16] Loizou N, Vaswani S, Laradji I H, et al. Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence[C]//International Conference on Artificial Intelligence and Statistics. PMLR, 2021. [17] Fathi Hafshejani S, Gaur D, Hossain S, et al. A fast non-monotone line search for stochastic gradient descent[J]. Optimization and Engineering, 2024, 25(2): 1105-1124. https://doi.org/10.1007/s11081-023-09836-6 [18] Grippo L, Lampariello F, Lucidi S. A nonmonotone line search technique for Newton’s method[J]. SIAM journal on Numerical Analysis, 1986, 23(4): 707-716. https://doi.org/10.1137/0723046. [19] YA ZHANG, CONG SUN. Cyclic Gradient Methods for Unconstrained Optimization[J]. Journal of Operational Research of Society of China, 2024, 12(3): 809-828. https://doi.org/10.1007/s40305-022-00432-6. [20] Chang C C, Lin C J. LIBSVM: A library for support vector machines[J]. ACM Transactions on Intelligent Systems and Technology, 2007, 2(3, article 27). https://doi.org/10.1145/1961189.1961199.