In this paper, a new recursive algorithm and two types of circuit architectures are presented for the computation of the two dimensional discrete cosine transform (2-D DCT). The new algorithm permits to compute the 2-D DCT by a simple procedure of the 1-D recursive calculations involving only cosine coefficients. The recursive kernel for the proposed algorithm contains a small number of operations. Also, it requires a smaller number of pre-computed data compared to many of existing algorithms in the same category. The kernel can be easily implemented in a simple circuit block with a short critical delay path. In order to evaluate the performance improvement resulting from the new algorithm, an architecture for the 2-D DCT designed by direct mapping from the computation structure of the proposed algorithm has been implemented in a FPGA board. The results show that the reduction of the hardware consumption can easily reach 25% and the clock frequency can increase 17% compared to a system implementing a recently reported 2-D DCT recursive algorithm. For a further reduction of the hardware, another architecture has been proposed for the same 2-D DCT computation. Using one recursive computation block to perform different functions, this architecture needs only approximately one half of the hardware that is required in the first architecture, which has been confirmed by a FPGA implementation.