Cells have a complex mechanism to control the expression of genes so that they are capable of adapting to environmental changes or genetic perturbations. A major part of the mechanism is fulfilled by transcription factors which can regulate the expression of other genes. Transcriptional regulatory relationships between genes and their transcription factors can be represented by a network, called a transcription regulatory network. Many algorithms have been proposed to learn transcription regulatory networks from gene expression data. In particular, the module network method, a special type of Bayesian networks, has shown promising results. In a module network, a regulatory module is a set of genes that show similar expression profiles and are regulated by a shared set of transcription factors (i.e., the regulation program of the module). This method significantly decreases the number of parameters to be learned. Module network learning consists of two tasks: clustering genes into modules and inferring the regulation program for each module. This thesis concentrates on designing algorithms for the latter task. First, we introduce a regression tree-based Gibbs sampling algorithm for learning regulation programs in module networks. The novelty of this method is that a set of tree operations is defined for generating new regression trees from a given tree. We show that the set of tree operations is sufficient to generate a well mixing Gibbs sampler even for large datasets. Second, we apply linear models to infer regulation programs. Given a gene module, this method partitions all experimental conditions into two condition clusters, between which the module's genes are most differentially expressed. Consequently, the process of learning the regulation program for the module becomes one of identifying transcription factors that are also differentially expressed between these two condition clusters. Third, we explore the possibility of integrating results from different algorithms. The integration methods we select are union, intersection, and weighted rank aggregation. The experiments in a yeast dataset show that the union and weighted rank aggregation methods produce more accurate predictions than those given by individual algorithms, whereas the intersection method does not yield any improvement in the accuracy of predictions. In addition, somewhat surprisingly, the union method, which has a much lower computational cost than rank aggregation, archives comparable results as given by rank aggregation.