- Access through your institution
- View Open Manuscript
Abstract
Combinatorial gene regulation provides a mechanism by which relatively small numbers of transcription factors can control the expression of a much larger number of genes with finely tuned temporal and spatial patterns. This is achieved by transcription factors assembling into complexes in a combinatorial fashion, exponentially increasing the number of genes that they can target. Such an arrangement also increases the specificity and affinity for the cis-regulatory sequences required for accurate target gene expression. Superimposed on this transcription factor combinatorial arrangement is the increasing realization that histone modification marks expand the regulatory information, which is interpreted by histone readers and writers that are part of the regulatory apparatus. Here, we review the progress in these areas from the perspective of plant combinatorial gene regulation, providing examples of different regulatory solutions and comparing them to other metazoans. This article is part of a Special Issue entitled: Plant Gene Regulatory Mechanisms and Networks, edited by Dr. Erich Grotewold and Dr. Nathan Springer.
Introduction
Transcription is a highly regulated process controlled in large part by transcription factors (TFs) that specify when, where and how eukaryotic genes are expressed. TFs are operationally defined here as proteins that bind to DNA in a sequence-specific fashion. The number of TFs in an organism is significantly smaller than the number of genes that need to be controlled with exquisite temporal and spatial expression patterns. For example, Arabidopsis has ~ 28,000 protein-coding genes, but only ~ 2000 TFs [1], [2], [3], although it is likely that additional TF families remain to be identified in the ~ 8000 proteins of yet unknown function. Similarly, the maize genome encodes ~ 2700 TFs [4] from around 33,000 protein-coding genes [5]. Indeed, across the eukaryotes, TFs represent 5–10% of all genes [1], [6].
Combinatorial control provides a mechanism to explain the complexity of gene expression patterns. A TF may be part of different protein complexes that determine different types of regulation of different targets. Therefore, a composition of a protein complex and not a TF per se, represents an input leading to distinct gene expression patterns as an output.
Gene function is intimately linked to when and where genes are expressed. This information is hardwired in the gene regulatory regions formed in part by cis-regulatory elements (CREs) recognized by specific TFs [7], [8]. CREs are often located immediately upstream of the transcription start site (TSS) in what is generally known as the promoter. However, CREs are also integral part of enhancers, and can be found in 5′ UTR, introns [9], or 3′ of the genes they control [10]. The modular nature of gene regulatory regions is captured by the arrangement of CREs into cis-regulatory modules (CRMs), each responsible for executing a fraction of the overall gene regulation. Several TFs can come together and bind to each one of these regulatory modules. These DNA modules can cooperatively function following rules that often resemble digital logic with the output being the overall regulation of the gene [11]. Combinatorial control has been extensively studied from the perspective of cis-regulatory systems, i.e., how CRMs are arranged to produce distinct gene expression outputs [12]. In plants, the promoter of the viral CaMV 35S gene continues to provide one of the best examples of how CRM arrangements contribute to the expression of a gene in many plant tissues [13], [14], [15].
The identification of functionally relevant regulatory motifs starts with defining TSSs and other important gene landmarks (e.g., 3′ ends and introns). Similar to alternative splicing, a TSS can be affected by genetic variation or by development, as recently shown in maize [16]. As a consequence of alternative TSSs, genes with different TSSs are expected to have their own regulatory regions that may or may not involve shared CREs and CRMs. The identification of functionally important CREs often involves investigating conservation between co-regulated genes, or across related species in what is called phylogenetic shadowing or footprinting [17], [18]. A combination of these methods was recently used to identify CREs recognized by the ethylene response factors RELATED TO APETALA2.12 (RAP2.12) and RAP2.2 in hypoxia responsive genes [19]. Phylogenetic shadowing also permitted identification of evolutionarily conserved CREs that combine to control the expression of the GIGANTEA (GI) circadian clock protein [20].
TFs usually bind to short (5–8 bps long) DNA sequences that correspond to a consensus sequence and are represented for example by position weight matrices (PWMs). Indeed, a TF can recognize a broad range of DNA sequences in vitro with varying affinities, which range from the nanomolar to the micromolar range. Clearly, the short DNA sequences frequently recognized by a single TF in vitro, are insufficient to explain the affinity and specificity of binding in vivo [21]. For example, RAP2.2 and RAP2.12 bind in vitro the 5′-ATCTA-3′ sequence, which does not correspond to the CREs required for the regulation of RAP2.2/RAP2.12 targets [19]. While recent studies suggest a significant overlap between in vitro binding of TFs identified by DNA affinity purification sequencing (DAP-Seq) and chromatin immunoprecipitation (ChIP)-based experiments [22], there are many other examples in the literature of TFs that recognize a DNA motif in vitro which is not the top identified CRE in vivo. In many instances, this could be a consequence of indirect binding (i.e., through another TF) [23]. Not surprisingly, many TF families are identified by the presence of protein-protein interaction domains (e.g., the helix-loop-helix in bHLH, and the leucine zipper in bZip domains) that permit TFs to increase both affinity as well as specificity for DNA binding through the formation of homo- and heteromers. These interactions are often dynamic and central to combinatorial gene control.
The number of TFs that can bind and participate in the regulation of any given gene appears to be gene dependent, and was proposed to be as low as 5 and as high as 50 or more [21]. The analysis of ChIP coupled with tiling array hybridization (ChIP-chip) or high-throughput sequencing (ChIP-Seq) for 27 Arabidopsis TFs showed that, the larger the number of conditions in which a gene is expressed, the more TFs bind to its regulatory region [24]. From a combinatorial gene regulation perspective, this is what would be expected, as each condition is likely to involve different regulatory complexes that may or may not share particular TFs. From this partial dataset, it was already evident that the vast majority of genes (63%) is recognized by two or more TFs, and some highly connected genes (‘hubs’) are recognized by up to 18 different TFs, with 1174 genes bound by eight or more TFs [24].
Section snippets
Plant combinatorial gene regulation: emerging patterns?
The number of instances of plant combinatorial gene regulation has very significantly increased, since the subject was last reviewed [25]. Below, we describe a few examples that highlight some of the emerging characteristics of plant transcriptional combinatorial logic.
Chromatin expands the DNA code
The intricacy of the transcription process is not limited to multiprotein complexes interacting with naked DNA. Like other cellular processes including DNA repair, replication, and recombination, transcription occurs within the chromatin environment. Eukaryotic chromatin is built by wrapping 146 bp of DNA around the histone octamer, forming its basic building block, the nucleosome [95]. Transcriptional activity depends on the accessibility of the DNA for TF binding, which itself depends on the
Interactions of TFs with the components of post-transcriptional processes
Our understanding of the complexity of transcriptional complexes, participating in the initiation, elongation and termination phase of transcription, has expanded in the last decade. This was aided by the increasing sensitivity of proteomic studies, advances in high-throughput genomics, and the ability to capture three-dimensional (3D) aspects of chromatin structure. It is becoming increasingly clear that transcription is tightly linked to other cellular processes taking place either
Mechanisms for modulating combinatorial control
A premise of combinatorial gene regulation is that regulatory complexes associated with one set of promoters will need to disassemble and reassemble to control another set of genes. In many instances, this is controlled by posttranslational modifications [147], [148]. However, other mechanisms are becoming known that can have a significant influence on complex formation. For example, long noncoding RNAs (lncRNAs) participation in RNA-protein regulatory complexes has been demonstrated for more
Concluding remarks
The control of gene expression in plants involves the combinatorial arrangement of TFs and chromatin factors that contribute to interpreting a complex regulatory code provided by DNA and histone marks. This combinatorial control resembles in complexity and dynamic behavior what has been found in other eukaryotes. Accumulating large-scale information regarding protein-protein and protein-DNA TF interactions contributes to revealing novel aspects of this intricate puzzle.
Transparency document
Transparency document.
Acknowledgments
Control of gene expression research in the Grotewold lab is funded by grants IOS-1125620 and MCB-1513807 from the National Science Foundation.
References (157)
- et al.
Histone modification: cause or cog?
Trends Genet.
(2011)
- T. Kouzarides
Chromatin modifications and their function
Cell
(2007)
- S. Glatt et al.
Recognizing and remodeling the nucleosome
Curr. Opin. Struct. Biol.
(2011)
- M. Korenjak et al.
Native E2F/RBF complexes contain Myb-interacting proteins and repress transcription of developmentally controlled E2F target genes
Cell
(2004)
- T. Lammens et al.
Atypical E2Fs: new players in the E2F transcription factor family
Trends Cell Biol.
(2009)
- C.
Dubos et al.
MYB transcription factors in Arabidopsis
Trends Plant Sci.
(2010)
- G. Ditta et al.
The SEP4 gene of Arabidopsis thaliana functions in floral organ and meristem identity
Curr. Biol.
(2004)
- H. Ma et al.
The ABCs of floral evolution
Cell
(2000)
- T. Jack
Relearning our ABCs: new twists on an old model
Trends Plant Sci.
(2001)
- R.G. Immink et al.
The ‘ABC’ of MADS domain protein behaviour and interactions
Semin. Cell Dev. Biol.
(2010)
From plant gene regulatory grids to network dynamics
Biochim. Biophys. Acta
(2012)
Plant metabolic diversity: a regulatory perspective
Trends Plant Sci.
(2005)
WEREWOLF, a MYB-related protein in Arabidopsis, is a position-dependent regulator of epidermal cell patterning
Cell
(1999)
A Myb gene required for leaf trichome differentiation in Arabidopsis is expressed in stipules
Cell
(1991)
The patterning of epidermal hairs in Arabidopsis — updated
Curr. Opin. Plant Biol.
(2012)
MYB-bHLH-WD40 protein complex and the evolution of cellular diversity
Trends Plant Sci.
(2005)
Cistrome and epicistrome features shape the regulatory DNA landscape
Cell
(2016)
Modularity in promoters and enhancers
Cell
(1989)
Functional analysis of transcription factors in Arabidopsis
Plant Cell Physiol.
(2009)
AGRIS: Arabidopsis gene regulatory information server, an information resource of Arabidopsis cis-regulatory elements and transcription factors
BMC Bioinf.
(2003)
PlnTFDB: an integrative plant transcription factor database
BMC Bioinf.
(2007)
The maize TFome — development of a transcription factor open reading frame collection for functional genomics
Plant J.
(2014)
The B73 maize genome: complexity, diversity, and dynamics
Science
(2009)
GRASSIUS: a platform for comparative regulatory genomics across the grasses
Plant Physiol.
(2009)
Disentangling the many layers of eukaryotic transcriptional regulation
Annu. Rev. Genet.
(2012)
Molecular dissection of the AGAMOUS control region shows that cis elements for spatial regulation are located intragenically
Plant Cell
(1997)
The homeotic protein AGAMOUS controls microsporogenesis by regulation of SPOROCYTELESS
Nature
(2004)
The hardwiring of development: organization and function of genomic regulatory systems
Development
(1997)
Genomic Regulatory Systems
(2001)
The cauliflower mosaic virus 35S promoter: combinatorial regulation of transcription in plants
Science
(1990)
Combinatorial and synergistic properties of CaMV 35S enhancer subdomains
EMBO J.
(1990)
Tissue-specific expression from CaMV 35S enhancer subdomains in early stages of plant development
EMBO J.
(1990)
Core promoter plasticity between maize tissues and genotypes contrasts with predominance of sharp transcription initiation sites
Plant Cell
(2015)
Phylogenetic shadowing of primate sequences to find functional regions of the human genome
Science
(2003)
Surveying Saccharomyces genomes to identify functional elements by comparative DNA sequence analysis
Genome Res.
(2001)
Redundant ERF-VII transcription factors bind an evolutionarily-conserved cis-motif to regulate hypoxia-responsive gene expression in Arabidopsis
Plant Cell
(2015)
Evening expression of Arabidopsis GIGANTEA is controlled by combinatorial interactions among evolutionarily conserved regulatory motifs
Plant Cell
(2014)
The evolution of transcriptional regulation in eukaryotes
Mol. Biol. Evol.
(2003)
Non-targeted transcription factors motifs are a systemic component of ChIP-seq datasets
Genome Biol.
(2014)
A functional and evolutionary perspective on transcription factor binding in Arabidopsis thaliana
Plant Cell
(2014)
Transcriptional regulation in plants: the importance of combinatorial control
Plant Physiol.
(1998)
Functional analysis of the transcriptional activator encoded by the maize B gene: evidence for a direct functional interaction between two classes of regulatory proteins
Genes Dev.
(1992)
Evolutionary and comparative analysis of MYB and bHLH plant transcription factors
Plant J.
(2011)
Identification of the residues in the Myb domain of maize C1 that specify the interaction with the bHLH cofactor R
Proc. Natl. Acad. Sci. U. S. A.
(2000)
Comprehensive identification of Arabidopsis thaliana MYB transcription factors interacting with R/B-like BHLH proteins
Plant J.
(2004)
The basic helix-loop-helix transcription factor family in plants: a genome-wide study of protein structure and functional diversity
Mol. Biol. Evol.
(2003)
The an11 locus controlling flower pigmentation in petunia encodes a novel WD-repeat protein conserved in yeast , plants, and animals
Genes Dev.
(1997)
The TRANSPARENT TESTA GLABRA1 locus, which regulates trichome differentiation and anthocyanin biosynthesis in Arabidopsis, encodes a WD40 repeat protein.
Plant Cell
(1999)
Mutations in the pale aleurone color1 regulatory gene of the Zea mays anthocyanin pathway have distinct phenotypes relative to the functionally similar TRANSPARENT TESTA GLABRA1 gene in Arabidopsis thaliana
Plant Cell
(2004)
Arabidopsis and Nicotiana anthocyanin production activated by maize regulators R and C1
Science
(1992)
Cited by (30)
Challenges of Translating Gene Regulatory Information into Agronomic Improvements
2019, Trends in Plant Science
TFs often function in a combinatorial fashion, allowing a discrete number of TFs to control the expression of a much larger number of target genes with unique temporal and spatial patterns [2]. Indeed, it is the concerted action of tens to hundreds of TFs tethered to regulatory regions through specific protein–protein and protein–DNA interactions (PDIs) that allow genes to be expressed with the appropriate expression patterns [2,3]. TFs can be hierarchically organized, such that one TF often controls the expression of a gene encoding another TF.
Recommended articles (6)
© 2016 Elsevier B.V. All rights reserved.