dc.description.abstract | Low-delay high-gain optimal multi-hop routing path is crucial to guarantee both the latency and reliability require- ments for infotainment services in the high mobility internet of vehicles (IoVs) subject to queue stability. The high mobility in multi-hop IoVs reduces reliability and energy efficiency, and becomes bottleneck for the optimal route solution using classical optimization methods. To a great extent, deep reinforcement learning (DRL)-based method is not applicable in IoVs envi- ronment because of the continuously changing topology and space complexity, which grows exponentially with the number of state variables as well as the relaying hops. Usually, in multi- hop scenario, network reliability and latency are affected by mobility as well as average hop count, which limit the vehicle- to-vehicle (V2V) link connectivity. To cope with this problem, in this paper, we formulate a minimum hop count delay-sensitive buffer-aided optimization problem in a dynamic complex multi- hop vehicular topology using a digital twin-enabled dynamic coordination graph (DCG). Particularly, for the first time, a DCG-based multi-agent deep deterministic policy gradient (DCG- MADDPG) decentralized algorithm is proposed that combines the advantage of DCG and MADDPG to model continuously changing topology and find the optimal routing solutions by cooperative learning in the aforementioned communications. The proposed DCG-MADDPG coordinated learning trains each agent towards highly reliable and low latency optimal decision-making path solutions while maintaining queue stability and convergence on the way to a desired state. Experimental results reveal that the proposed coordinated learning algorithm outperforms the existing learning in terms of energy consumption and latency at less computational complexity. | en_US |