We investigate a cross-layer relay selection scheme based on Q-learning algorithm. For the study, we consider multi-relay adaptive decode and forward (DF) cooperative diversity networks over multipath time-varying Rayleigh fading channels. The proposed scheme selects relay subsets that maximizes the link layer transmission efficiency without having knowledge of channel state information (CSI). Results show that the proposed scheme outperforms the capacity based cooperative transmission with the same number of reliable relays in terms of transmission efficiency gain. Furthermore, a Q-learning based cross-layer antenna selection for the multiple antenna relay networks is proposed, where multiple antennas allow more links from the relays to the destination under time varying Rayleigh fading channel. We studied the performance of multi-antenna relay networks and compared with single antenna case. Both schemes are shown to oﬀer high bandwidth efficiency from low to high signal-to-noise ratios (SNRs). Finally, we conclude that cooperative diversity with learning offers improved performance enhancement and bandwidth efficiency for the communication network.