Despite cells within an organism having near identical sequences, individual genome architecture governs cell differentiation by uniquely expressing different sets of genes along with other functions such as cell cycle facilitation, DNA replication, nuclear structure, and nuclear transport. DNA packaging within the nucleus results in distinct configurations and regions to promote specific inter- and intra-chromatin, protein, and larger nuclear structure interactions. From DNA looping to formation of higher-order chromatin structures to chromosome territories, nuclear genome organization is essential for proper cellular function.
The organization of chromosomes into distinct regions within the nucleus was first proposed in 1885 by Carl Rabl. Later in 1909, with the help of the microscopy technology at the time, Theodor Boveri coined the termed chromosome territories after observing that chromosomes occupy individually distinct nuclear regions [1]. Since then, mapping genome architecture has become a major topic of interest. It has only been over the last ten years that we been able to definitely understand particular aspects of three dimensional nuclear organization. This is largely due to the rapid advancement of a variety of methodological approaches. DNA Imaging using fluorescent tags and specialized microscopes [2] in addition to high-throughput mapping tools coupled with massive-parallel sequencing [3] are common practices to distinguish upper level genome organization. Enhancements in genome editing techniques have made it relatively simple to associate various architectural factors, such as specific DNA regions and proteins, with particular organizational function [4]. Specific methods include DNA Fluorescence in situ hybridization (FISH), Chromosome Conformation Capture, and Genome Editing with Crispr Cas9, ZFNs, or TALENs. “An Overview of Genome Organization and How We Got There: from FISH to Hi-C.” written by Fraser et al. [5] or the hyperlinks to the respective Wiki articles give good in depth description of each method.
Architectural proteins are the primary mediators of organizing the genome within the nucleus. Although the specific proteins vary, there are a few conserved proteins, or closely orthologous, found in a majority of eukaryotic species. The below proteins have been extensively associated with mammalian genome organization and are among the most important.
The first level of genome organization is the linear arrangement of DNA and the 3D formation of chromosomes. As commonly known, DNA is composed of two antiparallel strands of nucleic acids. Two bound and opposing nucleic acids are referred to as DNA base pairs. In order for DNA to pack inside the tiny cell nucleus, each strand interacts with nucleosomes to form DNA-Protein complexes called chromosomes. Depending on the eukaryote, there are multiple independent chromosomes of varying sizes within each nucleus, for example humans have 46 while giraffes have 30 [13].
Within regions of the chromosome, the order of the DNA base pairs makes up specific elements for gene expression and DNA replication. Some of the more common elements include protein coding genes (containing exons and introns), noncoding DNA, enhancers, promoters, operators, origins of replication, telomeres, and centromeres. As of yet, there is not much evidence towards the importance of specific order of these elements along or between individual chromosomes. For example, the distance between an enhancer and a promoter, interacting elements that form a basis of gene expression, can range from a few hundred base pairs to 100s of kb away [14]. As well, individual enhancers can interact with a number of different promoters and the same is true for a single promoter interacting with multiple different enhancers.
However, on a larger scale, chromosomes are heterogeneous in the context of euchromatin and heterochromatin composition. As well, there is evidence of gene rich and poor regions and various domains associated with cell differentiation, active or repressed gene expression, DNA replication, and DNA recombination and repair [15]. All of these help determine chromosome territories (CTs).
An intrinsic characteristic of chromatin fibers, DNA looping acts as the first organizational level of chromosomal folding and in turn helps regulate gene expression during interphase. This process involves the chromatin to physically loop around in order to bring into contact different DNA regions and is facilitated by a number of factors including architectural proteins (primarily CTCF and Cohesin), transcription factors, co-activators, and nc-RNAs. As a way of gene regulation, looping can act as either a way of gene repression or activation depending on the elements involved. It is noted that roughly 50% of human genes are involved in long range chromatin interactions through the process of DNA looping [16].
Looping was first observed by Walther Flemming in 1878 when he was studying amphibian oocytes. It was not until the late 20th century when DNA looping was correlated with gene expression [17]. For example, in 1990 Mandal et al. credited DNA looping to the repression of the galactose and lactose operon when in the presence of galactose or lactose. The proteins form protein-protein and protein-DNA interactions to loop the DNA. This in turn connects the gene promoters with upstream and downstream operators, effectively repressing gene expression by blocking PIC complex assembly at the promoter and therefore preventing transcription initiation [18].
DNA looping in gene activation typically involves the coming together of distal gene promoters and enhancers. The enhancer is able to recruit a large complex of proteins, such as the mediator, PIC complex, and other cell specific transcription factors, involved in initiating the transcription of a gene [19].
The next level of organization and a basis for chromatin folding is self-associating domains. These are found across all different kinds of organisms from bacteria, referred to as Chromosomal Interacting Domains (CIDs), to mammalian cells, Topologically Associating Domains (TADs). Self-interacting domains can range from the 1-2 mb scale in larger organisms [20] to 10s of kb in single celled organisms [21]. What characterizes a self-interacting domain is a set of common features. The first is that self-interacting domains have a higher of ratio of chromosomal contacts within the domain than outside it. They are formed through the help of architectural proteins and contain within them many chromatin loops. This characteristic was discovered using Hi-C techniques [22]. Second, Self-interacting domains correlate with regulation of gene expression. There specific domains that are associated with active transcription and other domains that repress transcription. What distinguishes whether a domain takes a particular form is dependent on which associated genes need to be active/inactive during particular phase of growth, cell cycle stage, or within a specific cell type. Cellular differentiation is determined by particular sets of genes being on or off, corresponding with the unique makeup of an individual cell’s self-interacting domains [23]. Lastly, the outside boundaries of these domains contain a higher frequency of architectural protein binding sites, regions and epigenetic marks correlated to active transcription, housekeeping genes, and short interspaced nuclear elements (SINEs) [24].
An interesting example of a subset of self-interacting domains is active chromatin hubs (ACHs). These hubs were discovered during observation of activated alpha- and beta-globin loci [25]. ACHs are formed through extensive DNA looping to form a “hub” of regulatory elements in order to coordinate the expression of a subset of genes [26].
Similar to self-interacting domains, lamina-associating domains (LADs) and Nucleolar-Associating Domains (NADs) are regions of the chromosome that interact with the nuclear lamina and nucleolus, respectively.
Making up approximately 40% of the genome, LADs consist mostly of gene poor regions and span between 40kb to 30Mb in size [27]. There two known types of LADs, constitutive LADs (cLADs) and facultative LADs (fLADs). cLADs are A-T rich heterochromatin regions that remain on lamina and are seen across many types of cells and species. There is evidence that these regions are important to the structural formation of interphase chromosome. On the other hand, fLADs have varying lamina interactions and contain genes that are either activated or repressed between individual cells indicating cell-type specificity [28]. The boundaries of LADs, like self-interacting domains, are enriched in transcriptional elements and architectural protein binding sites [29].
NADs, which constitutes 4% of the genome, share near all of the same physical characteristics as LADs. In fact, DNA analysis of these two types of domains have shown that many sequences overlap, indicating that certain regions may switch between lamina-binding and nucleolus-binding [30]. Interestingly, NADs are associated with nucleolus function. The nucleolus is the largest sub-organelle within the nucleus and is the principal site for rRNA transcription. It also acts in signal recognition particle biosynthesis, protein sequestration, and viral replication [31]. The nucleolus forms around rDNA genes from different chromosomes. However, only a subset of rDNA genes is transcribed at a time and do so by looping into the interior of the nucleolus. The rest of the genes lay on the periphery of the sub-nuclear organelle in silenced heterochromatin state [32].
The last level of organization before full chromosome territories is the formation of A/B Compartments. A/B compartments are on the multi-Mb scale and correlate with either open and expression active, “A” compartments, or closed and expression inactive, “B” compartments, chromosomal regions [33]. A compartments tend to be gene-rich, have high GC-content, contain histone markers for active transcription, and usually displace the interior of the nucleus. As well, they are typically made up of self-interacting domains and contain early replication origins. B compartments, on the other hand, tend to be gene-poor, compact, contain histone markers for gene silencing, and lie on the nuclear periphery. They are consisted mostly of LADs and contain late replication origins [34].
Throughout the nucleus, it has been found that A/B compartments within a chromosomal territory tend to group with respective compartments on other chromosomes, A’s with A’s and B’s with B’s. This correlates with the idea that the nucleus localizes proteins, and other factors such as long non-coding RNA (lncRNA), in regions suited for their individual roles. An example of this is the presence of multiple transcription factories throughout the nuclear interior [35]. These factories are associated with elevated levels of transcription due to the high concentration of transcription factors such as transcription protein machinery, active genes and regulatory elements, and nascent RNA. In fact, it has been revealed that roughly 95% of active genes are transcribed within transcription factories. As well, multiple genes with similar product functions or not, from the same or different chromosomes can be transcribed at same time within one factory. The last interesting characteristic of these particular foci it co-localization of genes within transcription factories are cell type dependent [36].
Similar to domain variation during cell differentiation, A/B compartments vary between cell types. This once again supports the hypothesis that genome architecture, specific gene expression, and cell differentiation are interconnected.
The last level of organization is the distinct positioning of individual chromosomes within the nucleus called chromosome territories (CTs). There are a few shared properties of CTs among eukaryotes. First, although chromosomal locations are not the same across cells within a population, there is some preference among individual chromosomes for particular regions. For example, large, gene-poor chromosomes are commonly located on the periphery near the nuclear lamina while smaller, gene-rich chromosomes group closer to the center of the nucleus [37]. Second, individual chromosome preference is variable among different cell types. An example from a study of spatial organization of chromosomes across multiple cell tissue conducted by Parada et al. is that the X-chromosome was found to prefer to localize in the periphery more often in liver cells than in kidney cells (Parada et al., 2004). Another conserved property of chromosome territories is that homologous chromosomes tend to be far apart from one another during cell interphase. The final characteristic is that the position of individual chromosomes during each cell cycle stays relatively the same until the start of mitosis [38]. The mechanisms and reasons behind chromosome territory characteristics is still unknown and further experimentation is needed.
Despite cells within an organism having near identical sequences, individual genome architecture governs cell differentiation by uniquely expressing different sets of genes along with other functions such as cell cycle facilitation, DNA replication, nuclear structure, and nuclear transport. DNA packaging within the nucleus results in distinct configurations and regions to promote specific inter- and intra-chromatin, protein, and larger nuclear structure interactions. From DNA looping to formation of higher-order chromatin structures to chromosome territories, nuclear genome organization is essential for proper cellular function.
The organization of chromosomes into distinct regions within the nucleus was first proposed in 1885 by Carl Rabl. Later in 1909, with the help of the microscopy technology at the time, Theodor Boveri coined the termed chromosome territories after observing that chromosomes occupy individually distinct nuclear regions [1]. Since then, mapping genome architecture has become a major topic of interest. It has only been over the last ten years that we been able to definitely understand particular aspects of three dimensional nuclear organization. This is largely due to the rapid advancement of a variety of methodological approaches. DNA Imaging using fluorescent tags and specialized microscopes [2] in addition to high-throughput mapping tools coupled with massive-parallel sequencing [3] are common practices to distinguish upper level genome organization. Enhancements in genome editing techniques have made it relatively simple to associate various architectural factors, such as specific DNA regions and proteins, with particular organizational function [4]. Specific methods include DNA Fluorescence in situ hybridization (FISH), Chromosome Conformation Capture, and Genome Editing with Crispr Cas9, ZFNs, or TALENs. “An Overview of Genome Organization and How We Got There: from FISH to Hi-C.” written by Fraser et al. [5] or the hyperlinks to the respective Wiki articles give good in depth description of each method.
Architectural proteins are the primary mediators of organizing the genome within the nucleus. Although the specific proteins vary, there are a few conserved proteins, or closely orthologous, found in a majority of eukaryotic species. The below proteins have been extensively associated with mammalian genome organization and are among the most important.
The first level of genome organization is the linear arrangement of DNA and the 3D formation of chromosomes. As commonly known, DNA is composed of two antiparallel strands of nucleic acids. Two bound and opposing nucleic acids are referred to as DNA base pairs. In order for DNA to pack inside the tiny cell nucleus, each strand interacts with nucleosomes to form DNA-Protein complexes called chromosomes. Depending on the eukaryote, there are multiple independent chromosomes of varying sizes within each nucleus, for example humans have 46 while giraffes have 30 [13].
Within regions of the chromosome, the order of the DNA base pairs makes up specific elements for gene expression and DNA replication. Some of the more common elements include protein coding genes (containing exons and introns), noncoding DNA, enhancers, promoters, operators, origins of replication, telomeres, and centromeres. As of yet, there is not much evidence towards the importance of specific order of these elements along or between individual chromosomes. For example, the distance between an enhancer and a promoter, interacting elements that form a basis of gene expression, can range from a few hundred base pairs to 100s of kb away [14]. As well, individual enhancers can interact with a number of different promoters and the same is true for a single promoter interacting with multiple different enhancers.
However, on a larger scale, chromosomes are heterogeneous in the context of euchromatin and heterochromatin composition. As well, there is evidence of gene rich and poor regions and various domains associated with cell differentiation, active or repressed gene expression, DNA replication, and DNA recombination and repair [15]. All of these help determine chromosome territories (CTs).
An intrinsic characteristic of chromatin fibers, DNA looping acts as the first organizational level of chromosomal folding and in turn helps regulate gene expression during interphase. This process involves the chromatin to physically loop around in order to bring into contact different DNA regions and is facilitated by a number of factors including architectural proteins (primarily CTCF and Cohesin), transcription factors, co-activators, and nc-RNAs. As a way of gene regulation, looping can act as either a way of gene repression or activation depending on the elements involved. It is noted that roughly 50% of human genes are involved in long range chromatin interactions through the process of DNA looping [16].
Looping was first observed by Walther Flemming in 1878 when he was studying amphibian oocytes. It was not until the late 20th century when DNA looping was correlated with gene expression [17]. For example, in 1990 Mandal et al. credited DNA looping to the repression of the galactose and lactose operon when in the presence of galactose or lactose. The proteins form protein-protein and protein-DNA interactions to loop the DNA. This in turn connects the gene promoters with upstream and downstream operators, effectively repressing gene expression by blocking PIC complex assembly at the promoter and therefore preventing transcription initiation [18].
DNA looping in gene activation typically involves the coming together of distal gene promoters and enhancers. The enhancer is able to recruit a large complex of proteins, such as the mediator, PIC complex, and other cell specific transcription factors, involved in initiating the transcription of a gene [19].
The next level of organization and a basis for chromatin folding is self-associating domains. These are found across all different kinds of organisms from bacteria, referred to as Chromosomal Interacting Domains (CIDs), to mammalian cells, Topologically Associating Domains (TADs). Self-interacting domains can range from the 1-2 mb scale in larger organisms [20] to 10s of kb in single celled organisms [21]. What characterizes a self-interacting domain is a set of common features. The first is that self-interacting domains have a higher of ratio of chromosomal contacts within the domain than outside it. They are formed through the help of architectural proteins and contain within them many chromatin loops. This characteristic was discovered using Hi-C techniques [22]. Second, Self-interacting domains correlate with regulation of gene expression. There specific domains that are associated with active transcription and other domains that repress transcription. What distinguishes whether a domain takes a particular form is dependent on which associated genes need to be active/inactive during particular phase of growth, cell cycle stage, or within a specific cell type. Cellular differentiation is determined by particular sets of genes being on or off, corresponding with the unique makeup of an individual cell’s self-interacting domains [23]. Lastly, the outside boundaries of these domains contain a higher frequency of architectural protein binding sites, regions and epigenetic marks correlated to active transcription, housekeeping genes, and short interspaced nuclear elements (SINEs) [24].
An interesting example of a subset of self-interacting domains is active chromatin hubs (ACHs). These hubs were discovered during observation of activated alpha- and beta-globin loci [25]. ACHs are formed through extensive DNA looping to form a “hub” of regulatory elements in order to coordinate the expression of a subset of genes [26].
Similar to self-interacting domains, lamina-associating domains (LADs) and Nucleolar-Associating Domains (NADs) are regions of the chromosome that interact with the nuclear lamina and nucleolus, respectively.
Making up approximately 40% of the genome, LADs consist mostly of gene poor regions and span between 40kb to 30Mb in size [27]. There two known types of LADs, constitutive LADs (cLADs) and facultative LADs (fLADs). cLADs are A-T rich heterochromatin regions that remain on lamina and are seen across many types of cells and species. There is evidence that these regions are important to the structural formation of interphase chromosome. On the other hand, fLADs have varying lamina interactions and contain genes that are either activated or repressed between individual cells indicating cell-type specificity [28]. The boundaries of LADs, like self-interacting domains, are enriched in transcriptional elements and architectural protein binding sites [29].
NADs, which constitutes 4% of the genome, share near all of the same physical characteristics as LADs. In fact, DNA analysis of these two types of domains have shown that many sequences overlap, indicating that certain regions may switch between lamina-binding and nucleolus-binding [30]. Interestingly, NADs are associated with nucleolus function. The nucleolus is the largest sub-organelle within the nucleus and is the principal site for rRNA transcription. It also acts in signal recognition particle biosynthesis, protein sequestration, and viral replication [31]. The nucleolus forms around rDNA genes from different chromosomes. However, only a subset of rDNA genes is transcribed at a time and do so by looping into the interior of the nucleolus. The rest of the genes lay on the periphery of the sub-nuclear organelle in silenced heterochromatin state [32].
The last level of organization before full chromosome territories is the formation of A/B Compartments. A/B compartments are on the multi-Mb scale and correlate with either open and expression active, “A” compartments, or closed and expression inactive, “B” compartments, chromosomal regions [33]. A compartments tend to be gene-rich, have high GC-content, contain histone markers for active transcription, and usually displace the interior of the nucleus. As well, they are typically made up of self-interacting domains and contain early replication origins. B compartments, on the other hand, tend to be gene-poor, compact, contain histone markers for gene silencing, and lie on the nuclear periphery. They are consisted mostly of LADs and contain late replication origins [34].
Throughout the nucleus, it has been found that A/B compartments within a chromosomal territory tend to group with respective compartments on other chromosomes, A’s with A’s and B’s with B’s. This correlates with the idea that the nucleus localizes proteins, and other factors such as long non-coding RNA (lncRNA), in regions suited for their individual roles. An example of this is the presence of multiple transcription factories throughout the nuclear interior [35]. These factories are associated with elevated levels of transcription due to the high concentration of transcription factors such as transcription protein machinery, active genes and regulatory elements, and nascent RNA. In fact, it has been revealed that roughly 95% of active genes are transcribed within transcription factories. As well, multiple genes with similar product functions or not, from the same or different chromosomes can be transcribed at same time within one factory. The last interesting characteristic of these particular foci it co-localization of genes within transcription factories are cell type dependent [36].
Similar to domain variation during cell differentiation, A/B compartments vary between cell types. This once again supports the hypothesis that genome architecture, specific gene expression, and cell differentiation are interconnected.
The last level of organization is the distinct positioning of individual chromosomes within the nucleus called chromosome territories (CTs). There are a few shared properties of CTs among eukaryotes. First, although chromosomal locations are not the same across cells within a population, there is some preference among individual chromosomes for particular regions. For example, large, gene-poor chromosomes are commonly located on the periphery near the nuclear lamina while smaller, gene-rich chromosomes group closer to the center of the nucleus [37]. Second, individual chromosome preference is variable among different cell types. An example from a study of spatial organization of chromosomes across multiple cell tissue conducted by Parada et al. is that the X-chromosome was found to prefer to localize in the periphery more often in liver cells than in kidney cells (Parada et al., 2004). Another conserved property of chromosome territories is that homologous chromosomes tend to be far apart from one another during cell interphase. The final characteristic is that the position of individual chromosomes during each cell cycle stays relatively the same until the start of mitosis [38]. The mechanisms and reasons behind chromosome territory characteristics is still unknown and further experimentation is needed.