Abstract:[Objective] To address the renewable energy curtailment caused by the mismatch between stochastic renewable generation and time-varying load characteristics in distributed smart grids, this paper proposes a source-load power ultra-short-term forecasting model integrating convolutional neural network (CNN), bidirectional gated recurrent unit (BiGRU), and multi-head attention (MHA) mechanisms. [Methods] Firstly, a CNN was emplayed to extract spatiotemporal features of renewable power. Subsequently, a BiGRU was utilized to capture bidirectional temporal dependencies in load sequences. Thirdly, MHA was introduced to dynamically assign weights to critical time-step information, and a Newton-Raphson based optimizer (NRBO) was utilized for the automatic hyperparameter tuning to enhance model generalization capability. Finally, this paper presented the NRBO-optimized CNN-BiGRU-MHA modeling framework, achieving accurate ultra-short-term forecasting of source-load power. [Results] Case studies demonstrated that compared to CNN-BiGRU and BiGRU models, the proposed CNN-BiGRU-MHA model reduced the relative error by 51.5% and 74.1%. The proposed NRBO-CNN-BiGRU-MHA model outperformed other commonly used algorithm models in prediction accuracy, with its predicted peak values closely matching the actual values in both magnitude and trend. The model excelled in handling smooth features and exhibited strong adaptability and robustness under both peak and off-peak power load conditions.[Conclusion] The model proposed in this paper demonstrates more stable prediction performance under different weather conditions, verifying the effectiveness of its spatio-temporal feature mining. It provides a new idea for power forecasting in scenarios with high-proportion new energy integration and has practical value for promoting the consumption of renewable energy.