Previous Article | Next Article ![]()
Journal of Virology, June 2005, p. 6610-6619, Vol. 79, No. 11
0022-538X/05/$08.00+0 doi:10.1128/JVI.79.11.6610-6619.2005
Copyright © 2005, American Society for Microbiology. All Rights Reserved.
Infectious Disease Laboratory, The Salk Institute, 10010 North Torrey Pines Rd., La Jolla, California 92037,1 University of Pennsylvania School of Medicine, Department of Microbiology, 3610 Hamilton Walk, Philadelphia, Pennsylvania 19104-6076,4 Gladstone Institute of Virology and Immunology, University of California, San Francisco, 365 Vermont Street, San Francisco, California 94103,2 Genomic Analysis Laboratory, The Salk Institute, 10010 North Torrey Pines Rd., La Jolla, California 92037,3 Department of Family/Preventive Medicine, University of California, San Diego School of Medicine, San Diego, California 92093,6 University of Pennsylvania School of Medicine, 1409 Blockley Hall, 423 Guardian Drive, Philadelphia, Pennsylvania 19104-60765
Received 19 November 2004/ Accepted 20 January 2005
|
|
|---|
|
|
|---|
Retroviral model systems provide a tractable means of studying the influence of chromosomal context on transcription. Each integrated provirus is joined to flanking cellular DNA at exactly the same points at the ends of the viral DNA, but integration takes place at many different sites in the host cell chromosomes. Thus, the viral genome provides a homogeneous transcription template that can be analyzed at different chromosomal locations, allowing the influence of flanking chromosomal features to be assessed.
Early during HIV gene expression, transcription is initiated by polymerase II from the viral long terminal repeat (LTR) under the control of cellular factors, including NF-
B, SP1, NFAT, and others (12, 15). Most of the resulting transcripts terminate within 100 nucleotides of the transcription initiation site (30). A low level of full-length transcripts is nevertheless synthesized, and a portion of these are spliced to yield the mRNA encoding Tat. In the late phase of viral transcription, Tat accumulates in the host cell and binds to the TAR site on the viral RNA, recruiting the cyclin T-CDK9 complex and facilitating transcriptional elongation (18, 47).
HIV transcription is known to be sensitive to the chromosomal environment at the site of integration (27, 28). In one example of such regulation, Jordan et al. found that proviruses integrated into centromeric heterochromatin had undetectable levels of basal transcription. However, activation of transcription by treatment with tumor necrosis factor alpha (TNF-
) or 12-O-tetradecanoylphorbol 13-acetate (TPA), both of which induce the NF-
B pathway, allowed activation of such proviruses (27, 28). Additional factors proposed to affect HIV transcription are reviewed in references 15 and 18.
Chromosomal features repressing HIV gene expression are of particular interest due to their possible influence on clinical latency in HIV infection. For many HIV-infected patients, treatment with highly active antiretroviral therapy can reduce viral loads to undetectable levels but, unfortunately, cells persist long term that harbor integrated proviruses capable of reseeding virus production after cessation of therapy. One well-characterized reservoir is in resting CD4-positive T cells (9, 14, 49). A low percentage of these cells harbor transcriptionally inactive HIV proviruses which may be induced to produce HIV upon T-cell activation. The finding that centromeric heterochromatin represses HIV gene expression, along with other known mechanisms for down-modulating HIV gene expression (1, 15, 18, 42, 45), provides candidate explanations connecting transcriptional repression to clinical latency.
To study how expression from the HIV type 1 (HIV-1) promoter is affected by the integration site of the provirus, we isolated cells containing stably expressed and inducible proviruses, determined integration sites by sequencing 971 host-virus DNA junctions, and then asked what identifiable features were enriched in each population. Several notable biases were found, suggesting potential mechanisms by which the chromosomal environment may modulate HIV transcription.
|
|
|---|
Jurkat cells were cultured at a density of 3 x 105 to 1 x 106 cells/ml in RPMI 1640 medium with 10% fetal bovine serum, 100 U/ml penicillin, 100 µg/ml streptomycin, and 2 mM L-glutamine at 37°C. Cells were infected at a multiplicity of infection of 0.1 with 4 µg/ml Polybrene for cloning integration sites and at 1.0 for analysis by transcriptional profiling. To date, comparisons between integration site data sets made with HIV-based vectors (40, 50) have not shown any differences with integration sites made with authentic HIV (5, 50).
Acquisition of stably bright and inducible cell populations.
Jurkat cells were fluorescence-activated cell sorter (FACS) analyzed into GFP-positive and GFP-negative populations 2 to 4 days postinfection as described elsewhere (27, 28). At this stage, about 7% of cells were GFP positive. The GFP-positive cells were sorted for GFP expression a second time 2 weeks postinfection, and DNA was extracted (QIAgen DNeasy tissue kit), yielding stably expressed proviruses. At this stage, about 90% of cells were GFP positive (geometric mean of GFP fluorescence measured in FL1 from a representative experiment was 215). GFP-negative Jurkat cells were sorted twice more for lack of GFP expression and then cultured with TNF-
for 17 h prior to sorting. After induction, approximately 0.25% of cells became GFP positive (geometric mean, 63.3, when analyzed 4 days after sorting). Note that the absolute level of the fluorescent signal measured in FL1 varied depending on the instrument used and the gate drawn compared to the uninfected control. The cells that were inducibly GFP positive were collected and DNA was extracted, yielding the inducible sample. The inducible cells became dark upon withdrawal of TNF-
(over 90% became dim 2 weeks after removal of TNF), indicating dependence of expression on the inducing agent. The fraction of inducible cells seen in this study was similar to that reported in reference 27.
Integration site cloning and mapping to the genome. DNA from stably expressed and inducible populations was digested with three restriction endonucleases with six-base recognition sites (NheI, SpeI, and XbaI, essentially as described in reference 40) or with MseI (which has a four-base recognition site, as described in reference 50). Digested DNA was then ligated to the appropriate adapter and amplified by nested PCR as described previously (40). Oligonucleotides used are listed in Table S1 of the supplemental material. Integration site sequences were determined to be authentic if they began at the junction with the HIV LTR, had a sequence identity of >98%, and yielded a unique best hit when mapped to the human genome using BLAT (University of California, Santa Cruz).
A small data set (20 sites) was also generated using TPA as an inducing agent and analyzed. This set was biased in favor of integration in genes, and 2/20 were in alphoid repeats, paralleling sites analyzed after induction with TNF-
(data not shown).
Expression analysis.
A total of 3 x 106 Jurkat cells (in triplicate per treatment group) were plated and either left untreated in culture, infected with the vesicular stomatitis virus G protein-pseudotyped LTR-Tat-IRES-GFP HIV-based vector (with 4 µg/ml Polybrene) at a multiplicity of infection of 1 for 24 h, or treated with 10 ng/ml TNF-
for 17 h. Cells were harvested, and total RNA was extracted using the QIAgen RNeasy kit. Labeling and hybridization of RNA to Affymetrix HG-U133A arrays was performed using the Affymetrix protocol. Analysis used Affymetrix Microarray Analysis suite 5.1 software. Changes in transcriptional activity were quantified using EASE and significance analysis of microarrays (SAM) to determine the false discovery rate. For the comparison of untreated Jurkat cells to HIV-infected cells, 575 genes were found to change at least twofold in activity (accepting a 1% false discovery rate). For the comparison of untreated cells to TNF-
-treated cells, 10 genes were found to be upregulated and 32 were downregulated under the same criteria.
Statistical analysis. A detailed statistical analysis is presented in the supplemental material. An analysis of the randomly selected genes yielded a surprising result which suggested that the bias for favored integration in active genes (see Fig. 4, below) is stronger than the figure may suggest. Randomly selected sites that were mapped to genes were distributed into classes by expression level as in Fig. 4, below, and analyzed. The random sites did not yield a uniform distribution in each expression class, but instead revealed a bias in favor of the least-well-expressed genes (values were as follows: class 1, 15.1 to 16.1%; class 2, 14.6 to 15.7%; class 3, 15.1 to 15.3%; class 4, 12.8 to 13.4%; class 5, 11.4 to 11.6%; class 6, 11.7 to 12.1%; class 7, 10.8 to 11.2%; class 8, 6.2 to 6.7%; P < 0.0001 by chi-square; the range is for all three data sets in Fig. 4A to C, below). This is probably explained by the finding that highly expressed genes tend to have shorter introns (7) and so are smaller targets for integration. This emphasizes that the tendency to integrate in active genes is likely stronger than previously appreciated, because active genes are typically smaller than poorly expressed genes.
![]() View larger version (19K): [in a new window] |
FIG. 4. Inducible proviruses are found more commonly in very highly active genes. Expression levels were assayed in Jurkat cells (three independent Affymetrix HU133A microarrays for each condition) and scored using the Affymetrix Microarry suite 5.1 software package. To classify the expression levels of genes hosting integration events, class boundaries were first generated by dividing all the genes on the array into eight classes according to their relative level of expression. Genes that hosted integration events were then distributed into the classes defined by these boundaries, summed, and expressed as a percentage of the total number of integration sites in genes on the array. The leftmost class in each panel contains the 1/8 most weakly expressed genes, and the rightmost class contains the 1/8 most highly expressed. The highest signal value represented in each expression bin (for untreated Jurkat cells) was as follows: bin 1, 9.2; bin 2, 20.6; bin 3, 38.6; bin 4, 66; bin 5, 117; bin 6, 227; bin 7, 488; bin 8, 12050. Integration sites were analyzed using data from untreated Jurkat cells (A), TNF-treated Jurkat cells (B), or HIV-Tat-GFP-infected Jurkat cells (C) (P < 0.003; chi-square test). Inducible proviruses in the eighth class (most highly expressed) accounted for about 17% of the total.
|
Nucleotide sequence accession numbers The sequences for the integration sites newly determined in this study have been deposited at NCBI and assigned accession numbers CZ442176 to CZ443146. Microarray data have been deposited at the NCBI GEO repository under accession numbers GSE2504.
|
|
|---|
, an agent that is known to activate LTR transcription (39) and thereby to activate transcription from silent proviruses. Cells were then sorted to obtain the induced GFP-positive population. Previous studies using this model have shown that most of these inducible proviruses are silent due to integration in chromosomal sites unfavorable for gene expression (27, 28). In addition, focusing on the inducible fraction minimizes possible complications resulting from the inactivation of viral genomes by mutation. Integrated proviruses that were not expressed and were uninducible were not studied.
![]() View larger version (35K): [in a new window] |
FIG. 1. Acquisition of cells containing stably expressed and inducible proviruses. (A) Tat-transducing HIV-based vector used in this study. Tat, HIV-encoded transcriptional activator; IRES, internal ribosome entry site. Transcription initiates within the left LTR. (B) Acquisition of cells containing stably expressed and inducible proviruses by FACS. Cells were infected at a multiplicity of about 0.1 and sorted for GFP-positive and -negative cells (left side). GFP-positive cells were collected and then sorted a second time to isolate a stably bright fraction. The GFP-negative (dark) population was sorted twice, and the dark cells were collected each time. The stably dark cells were then treated with TNF- , and the resulting bright cells were collected (right side).
|
|
View this table: [in a new window] |
TABLE 1. Integration site data sets used in this study
|
|
View this table: [in a new window] |
TABLE 2. Integration in transcription unitsa
|
![]() View larger version (27K): [in a new window] |
FIG. 2. Primary sequences surrounding the stably expressed and inducible proviruses. The weak consensus sequence seen at the stably expressed (top) and inducible (bottom) proviruses was rendered so that the degree of conservation is proportional to the height of each letter, using LOGO (http://weblogo.Berkeley.edu/logo.cgi). The y axis reflects the information content at each base, so that perfect conservation would have a score of 2 bits. The points of joining between the HIV and human DNA lie between 1 and 0 (for the sequenced HIV DNA end) and between 4 and 5 on the other strand for the other end of the HIV DNA. Thus, the points of joining, and the integration consensus sequence, are symmetric around position 2 (arrow).
|
|
View this table: [in a new window] |
TABLE 3. Integration in repeated sequencesa
|
A small number of integration sites (20 total) were isolated from cells after induction with TPA instead of TNF-
. Of these, two were in alphoid repeats, paralleling results with TNF-
induction (data not shown).
All HIV integration site data sets showed that human endogenous retroviruses (HERVs) are significantly disfavored targets (P < 0.013), as reported previously for the SupT1 data set (40). HERVs are enriched outside transcription units, while HIV integration is favored within transcription units, accounting for the observed bias.
Inducible proviruses are more frequently found in gene deserts. A second difference was found in an analysis of the positions of stably expressed and inducible proviruses in intergenic regions. The stably expressed proviruses were more frequently found in short intergenic regions, indicative of favored integration in gene-rich chromosomal domains, as seen previously (34, 40). In contrast, the inducible proviruses were much more frequently found in long intergenic regions or "gene deserts" (Fig. 3) (P < 0.0007, regardless of gene call used for the analysis) (see p. 67-79 of the statistical information provided in the supplement material).
![]() View larger version (29K): [in a new window] |
FIG. 3. Frequency of stably expressed or inducible proviruses in intergenic regions of different lengths. Shorter intergenic regions are shown to the left, and longer ones are to the right. Genscan genes were used for this analysis, though the conclusions were similar for other gene sets as well (see p. 67-79 of the statistical information provided in the supplement material). The P value is obtained from the logistic regression of event type (stable or inducible) on a cubic B-spline basis (i.e., a third-order polynomial) for intergenic distance. The units on the x axis indicate lengths of intergenic regions, in base pairs. Lengths of intergenic regions for each category were defined by the following boundaries (from left to right, in bp): 1,627, 6,135, 10,506, 14,900, 21,907, 28,989, 36,333, 43,531, 62,837, 104,802, and 3,182,720. The inducible proviruses in the rightmost five bins accounted for 14% of all inducible proviruses.
|
Inducible proviruses are more frequently found in very highly expressed cellular genes. A third chromosomal feature correlating with inducible HIV gene expression was identified by transcriptional profiling analysis of the Jurkat target cells. The expression signals of cellular genes hosting integration events were tabulated for the stably expressed and inducible proviruses. The median for both groups of genes was found to be higher than the median of all the probe sets on the HU133A microarrays used (stably expressed = 152, inducible = 177, all genes on the array = 66; units are "signal," as defined by Affymetrix MAS 5.1). Genes in both the stably expressed and inducible populations were also more active than genes from the random control population in Table 1 (random = 57; P < 0.0001 for comparison to either the stably expressed or inducible populations; Mann-Whitney test). This broadly parallels previous studies of HIV, which revealed that active genes were favored as integration targets (34, 40, 50).
Thus, it was unexpected that the stably expressed and inducible data sets differ from each other. The median expression value for genes hosting inducible proviruses was found to be significantly higher than the median of genes hosting stably expressed proviruses (P = 0.0004; Mann-Whitney test).
To analyze this issue in more detail, expression signals of genes hosting integration events were divided into classes by their signal values and the distribution was examined (Fig. 4A). As with previous studies, genes hosting integration events were found more commonly in the more highly expressed genes. The inducible proviruses were more frequently found in the highest expression class: 24% of inducible integration sites (in genes represented on the array) compared to 14% for the stably expressed set (P = 0.003; chi-square test). In previous studies, genes in the highest expression class (eighth bin) were consistently found to be less favorable for integration (34, 40); here, this is seen as well for the stably bright population but not the inducible population. Thus, we infer that integration in the very highly expressed genes was associated with the inducible phenotype and, specifically, that the transcription level in bin 8 is disfavorable for HIV transcription. Inducible proviruses in highly expressed genes were found in both orientations relative to the direction of host gene transcription (data not shown). An analysis of the placement of integration sites within genes showed no obvious bias; for example, the inducible sites in the most highly transcribed genes (eighth bin) were not clustered near the start site of transcription (data not shown).
The relationship between integration targeting and host cell transcription was probed further by repeating the transcriptional profiling measurements under two additional conditions. Jurkat cells were infected with the HIV-Tat-GFP vector prior to RNA isolation, or cells were treated with 10 ng/ml TNF and RNA was isolated subsequently. These manipulations caused clearly detectable changes in transcription. Notably, infection with the Tat-transducing vector caused down-modulation of a large family of genes involved in signal transduction and immune responses (Fig. 5), potentially a biologically significant activity of Tat involved in evasion of the host immune response (11, 24, 29). Treatment with TNF resulted in induction of a number of previously characterized TNF-inducible genes. Though these changes were readily detectable, overall transcription in the cell types studied was still quite similar (correlation coefficients for pair-wise comparisons of any two microarrays showed R > 0.98). Analysis of genes hosting integration events using these transcriptional profiling data sets also indicated that very highly transcribed cellular genes were more common targets in the inducible data set (Fig. 4B and C).
![]() View larger version (65K): [in a new window] |
FIG. 5. Tat down-modulates host cell genes important in signal transduction and immune responses. Signal intensities from Affymetrix HU133A microarrays were analyzed by SAM (http://www-stat.Stanford.EDU/ tibs/SAM/) to identify significantly affected genes and then clustered according to gene ontology using EASE (http://david.niaid.nih.gov/david/ease.htm). The three left columns show results from uninfected cells, and the three right columns show results from cells infected with the Tat-transducing HIV-based vector. A large set of Tat-repressed genes (115 probe sets corresponding to 108 different genes) was identified as overrepresented compared to all genes queried by the microarray in the "signal transducer activity" category (P = 1.16 x 105; Fisher exact test with Bonferroni correction for multiple comparisons). Expression values were normalized by dividing by the mean. In cases where multiple probe sets queried the activities of a single gene, the values were found to be closely similar and a single representative probe set was used for the figure. Gray tiles indicate negative values. All genes called by EASE in the "signal transducer activity" category are shown, except for six olfactory receptors and one taste receptor.
|
![]() View larger version (30K): [in a new window] |
FIG. 6. Clustering of transcriptional profiles from Jurkat cells with human leukocytes. Data for human tissues are from reference 44. All analyses used Affymetrix HU133A microarrays. Transcription signal values were averaged between replicates and ranked prior to clustering. Squared Euclidean distance and unweighted pair-group average linkage (also know as UPGMA) cluster analysis of the transcriptional profiles was carried out using Statistica 7.0.
|
|
|
|---|
. Three chromosomal features correlated with inducible expression: centromeric heterochromatin, gene deserts, and highly active host transcription units. Each of these is discussed below. However, only about 40% of the inducible proviruses were associated with one of these three features, and so further chromosomal environments unfavorable for expression may yet be found. In addition, studies from others using this model suggest that low-level GFP expression may also result from stochastic fluctuations in Tat levels. For cells expressing low levels of Tat protein, fluctuations in Tat concentration may extinguish LTR-driven transcription, and this may become "locked in" because Tat protein is required to activate its own expression (D. Schaffer and coworkers, personal communication). Silencing HIV proviruses by transcriptional interference. A significantly greater proportion of the inducible proviruses were found in the most highly expressed fraction of host genes (Fig. 4), suggesting that very-high-level host gene transcription interferes with transcription of an integrated provirus. Many studies have established that transcriptional interference can repress gene expression (4, 10, 19, 20, 22, 33), and a model HIV promoter has previously been shown to be sensitive to transcriptional interference in HeLa cells (20). For a provirus in the same orientation as the host cell gene, read-through transcription may repress by blocking access of factors to the downstream promoter or by actively dislodging bound proteins (4, 19, 20, 22, 33). In the HeLa cell model, read-through transcription was found to repress HIV transcription by dislodging bound Sp1 (20). A provirus in an orientation opposite that of the host gene may be silenced by the above mechanisms, or by transcriptional "trainwrecking" whereby two RNA polymerase complexes collide during convergent elongation. Convergent transcription could also result in transcription of both DNA strands and formation of double-stranded RNA, which might silence proviral transcription via RNA interference (reviewed in references 23 and 37), RNA-directed DNA methylation (35), induction of the interferon response (13), or generation of antisense RNA (38).
Inducible proviruses are integrated more commonly in gene deserts. A strong trend was seen involving integration sites outside genes, in which long intergenic regions or gene deserts more frequently hosted inducible proviruses. Short intergenic regions more commonly hosted stably expressed proviruses. A similar trend was also seen comparing the frequency of integration in CpG islands, which are known to be associated with genes. A variety of mechanisms could account for this bias, none mutually exclusive. Gene deserts may be heterochromatic, and so packaged in proteins unfavorable for efficient transcription (25, 26, 46). Gene deserts may be enriched in binding sites for transcriptional silencer proteins, though no candidate binding sites emerged from our analysis of primary sequences at integration sites. Intranuclear positioning of gene deserts could also be a factor (3, 6, 8). A recent study suggested that activation of genes in yeast can be accompanied by translocation of the genes to a nuclear pore complex (6). Thus, proviruses integrated into gene-sparse regions may be localized within nuclear domains that are unfavorable for transcription.
Integration in centromeric heterochromatin disfavors HIV gene expression. Repression of HIV expression after integration in alphoid repeats was previously observed by Eric Verdin and colleagues using the Jurkat model (27, 28). Heterochromatin adopts a condensed structure that blocks access of the transcriptional machinery (41, 46). Thus, a simple model to explain our results is that wrapping of the proviral DNA in heterochromatin blocks access of the transcriptional machinery and thereby represses transcription.
Models for the mechanism of transcriptional latency in patients.
HIV-infected patients on successful long-term antiretroviral therapy nevertheless harbor cells containing latent proviruses, and after cessation of treatment HIV from these cells can reinitiate active replication (9, 14, 21, 49). Our findings reveal mechanisms by which the surrounding chromosomal environment may silence some integrated proviruses while leaving them inducible by TNF-
treatment. The data presented here suggest that proviruses integrated in centromeric heterochromatin, gene deserts, and highly transcribed genes may contribute to the latent population.
Direct studies of integration sites from latently infected cells in patients have been challenging. One report investigated the distribution of HIV integration sites in resting CD4+ lymphocytes of patients on effective highly active antiretroviral therapy (21). However, this work was complicated by the fact that defective proviruses greatly outnumber latent proviruses in patient cells (9, 14, 49). Han et al. cloned 74 integration sites and found that 93% of the proviruses were integrated within active transcription units (21). If these sites are representative of latent integration sites in patients, then the transcriptional interference model may be the most attractive based on our data.
This work was supported by NIH grants AI52845 and AI34786, the James B. Pendleton Charitable Trust, Robin and Frederic Withington (F.D.B.), and the Fritz B. Burns Foundation (to J.R.E.).
Supplemental material for this article may be found at http://jvi.asm.org/. ![]()
|
|
|---|
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2009 by the American Society for Microbiology. For an alternate route to Journals.ASM.org, visit: http://intl-journals.asm.org | More Info»