1 The aim of this use case

Here we want to check and visualise the common target genes of two transcription factors. We describe how to find transcription factors which share common target genes. We create a transcription factor - target gene interaction graph of the STAT5A and STAT5B transcription factors using the igraph R package. We also show how to create the same transcription factor - target gene interaction graph by using the Cytoscape software.

2 Input data

Let’s download the human (Homo sapiens) interaction table including all (small- and large-scale) interactions from the Download part of the TFLink website, and read it in with the fread command of the data.table package that reads huge tables effectively. We also will use the tidyverse package.

library(data.table)
library(tidyverse)
Interaction_tab <- fread("TFLink_Homo_sapiens_interactions_All_simpleFormat_v1.0.tsv")

Let’s see the first few rows of the interaction table.

Interaction_tab %>% 
  slice(1:5)
Head of interaction table (continued below)
UniprotID.TF UniprotID.Target NCBI.GeneID.TF NCBI.GeneID.Target Name.TF Name.Target Detection.method
Q9H9S0 O94907 79923 22943 NANOG DKK1 chromatin immunoprecipitation assay;inferred by curator
P37231 P10826 5468 5915 PPARG RARB chromatin immunoprecipitation assay;inferred by curator
P10242 P08047 4602 6667 MYB SP1 chromatin immunoprecipitation assay;inferred by curator
P31269 Q02363 3205 3398 HOXA9 ID2 inferred by curator
P03372 P17275 2099 3726 ESR1 JUNB chromatin immunoprecipitation assay;inferred by curator
PubmedID Organism Source.database Small-scale.evidence
19148141;29087512;29126285;27924024 Homo sapiens GTRD;ReMap;TRRUST Yes
17202159;12839938;29087512;27924024 Homo sapiens GTRD;TRED;TRRUST Yes
29126285;27924024;17202159 Homo sapiens GTRD;ReMap;TRED Yes
29087512;20565746 Homo sapiens TRRUST Yes
29126285;18971253;27924024;11477071;17202159;29087512 Homo sapiens GTRD;PAZAR;ReMap;TRED;TRRUST Yes

3 Checking the number of target genes each transcription factor has

Let’s see the distribution of number of target genes among transcription factors.

Interaction_tab %>%
 group_by(UniprotID.Target, `Small-scale.evidence`) %>%
 summarise(`Nr. of TFs regulates the target gene` = n()) %>%
 arrange(desc(`Nr. of TFs regulates the target gene`)) %>% 
  ggplot(aes(x = `Nr. of TFs regulates the target gene`, 
             fill = `Small-scale.evidence`)) +
  geom_histogram(binwidth = 50) +
  theme_minimal()

4 Transcription factors having up to 50 target genes

To be able to plot all target genes of two transcription factors with readable names, we will narrow down our choices to interactions supported by small-scale evidences, and we will select from transcription factor having up-to 50 target genes.

Therefore, we create a new interaction table containing interaction data supported by small-scale evidences. (Alternatively, this table can also be downloaded directly from the TFLink website.)

Interaction_tab_SS <- Interaction_tab %>%
  filter(`Small-scale.evidence` == "Yes")

Let’s save the names of such transcription factors having up to 50 target genes to a vector variable.

TF_names <- Interaction_tab_SS %>%
 group_by(Name.TF) %>%
 summarise(`Nr. of target genes` = n()) %>%
 filter(`Nr. of target genes` < 50) %>%
 select(Name.TF) %>%
 pull()

# Checking the content
head(TF_names)
## [1] "AATF"   "ABL1"   "AHR"    "AIP"    "AIRE"   "ANKRD1"

Then, let’s create a list variable containing all target genes of each transcription factors, to be able to find the ones regulating common target genes.

# Creating the list
TGs_per_TFs_list <- Interaction_tab_SS %>%
  filter(Name.TF %in% TF_names) %>%
  select(Name.TF, Name.Target) %>%
  group_by(Name.TF) %>%
  group_split() %>%
  lapply(., pull, var = Name.Target)

# Naming the elements
names(TGs_per_TFs_list) <- TF_names

# Checking the content
head(TGs_per_TFs_list, n = 3)
## $AATF
## [1] "MYC"    "TP53"   "BAX"    "CDKN1A" "KLKB1"  "KLK3"  
## 
## $ABL1
##  [1] "JUN"    "TP53"   "BAX"    "CSF1"   "CDKN1A" "BCL2"   "BCL6"   "PIM1"  
##  [9] "FOXO3"  "CCND2" 
## 
## $AHR
##  [1] "AHRR"   "MYC"    "ABCG2"  "MFSD2A" "MT2A"   "IL1B"   "PCNA"   "FOS"   
##  [9] "RFC3"   "CYP1A2" "CA9"    "UGT1A6" "UGT1A1" "CCNG2"  "IL13"   "CYP2B6"
## [17] "CYP1B1" "CYP1A1" "IL6"    "ARNT"   "CCND1"  "BRCA1"

Let’s check which transcription factor pairs have the most common target genes.

# TF pairs in every combinations
TF_combination_list <- combn(names(TGs_per_TFs_list), 2, simplify = FALSE)

# Nr. of common genes of all TF pairs
Intersect_lengths <- list()
for(i in 1:length(TF_combination_list)){
   Intersect_lengths[i] <-
     intersect(TGs_per_TFs_list[[TF_combination_list[[i]][1]]],
               TGs_per_TFs_list[[TF_combination_list[[i]][2]]]) %>%
     length()
}
rm(i)

# Converting list to vector
Intersect_lengths <- Intersect_lengths %>% 
  unlist()

# Adding names of TF pairs (separated by "_")
names(Intersect_lengths) <- sapply(TF_combination_list, paste, collapse = "_")

# Checking the maximum number of common genes
Intersect_lengths %>% 
  sort() %>% 
  tail()
##   NR1H2_NR1H3    FOSL2_JUND  DNMT1_DNMT3B SREBF1_SREBF2 TWIST1_TWIST2 
##            16            17            19            19            25 
## STAT5A_STAT5B 
##            30

Let’s create a vector containing the names of the transcription factor pair having the most common target genes.

TFs <- which.max(Intersect_lengths) %>% 
  names() %>% 
  str_split(pattern = "_") %>% 
  unlist()

# Let's see
TFs
## [1] "STAT5A" "STAT5B"

STAT5A and STAT5B (signal transducer and activator of transcription 5 A and B) are two highly similar transcription factors. The STAT5 proteins are involved in cytosolic signalling and they mediate the expression of specific genes (Nosaka et al. 1999).

5 Network visualization of the two transcription factors having the most common target genes

6 Using the igraph R package

Now we need more steps to create an adjacency matrix input for the igraph network plot.

Adjacency matrix is a square matrix used to represent a finite graph. The elements of the matrix indicate whether pairs of vertices are adjacent or not in the graph.

# Creating data.frame
# while excluding target genes with missing names ("-") 
Wide_selected_Interaction_tab_SS <- Interaction_tab_SS %>% 
  filter(Name.TF %in% TFs & Name.Target != "-") %>% 
  select(Name.TF, Name.Target) %>%
  mutate(Value = 1) %>% 
  pivot_wider(names_from = Name.Target, values_from = Value) %>% 
  replace(is.na(.), 0) %>% 
  data.frame()

# Adding row names
rownames(Wide_selected_Interaction_tab_SS) <- Wide_selected_Interaction_tab_SS$Name.TF
Wide_selected_Interaction_tab_SS$Name.TF <- NULL

# Converting to matrix
Wide_selected_Interaction_tab_SS <- Wide_selected_Interaction_tab_SS %>% 
  as.matrix()

# We need a function for creating the (symmetrical) adjacency matrix
expand.matrix <- function(A){
  m <- nrow(A)
  n <- ncol(A)
  B <- matrix(0,nrow = m, ncol = m)
  C <- matrix(0,nrow = n, ncol = n)
  cbind(rbind(B,t(A)),rbind(A,C))
}

# Applying the expand.matrix function
Wide_selected_Interaction_tab_SS <- Wide_selected_Interaction_tab_SS %>% 
  expand.matrix()

# Adding missing row names
rownames(Wide_selected_Interaction_tab_SS)[1:2] <- colnames(
  Wide_selected_Interaction_tab_SS)[1:2]

# Converting to symmetrical adjacency matrix to asymmetrical to be able to 
# create a directed graph (by deleting values below diagonal)
Wide_selected_Interaction_tab_SS[lower.tri(Wide_selected_Interaction_tab_SS)] <- 0L

Let’s see how does the resulted adjacency matrix looks like by checking its upper left part.

Wide_selected_Interaction_tab_SS[1:10, 1:10]
##         STAT5A STAT5B PIM1 PTTG1IP MET BCL2L1 IFNG FCGR1A CCND2 CEL
## STAT5A       0      0    1       1   1      1    1      1     1   1
## STAT5B       0      0    1       1   1      1    1      1     0   1
## PIM1         0      0    0       0   0      0    0      0     0   0
## PTTG1IP      0      0    0       0   0      0    0      0     0   0
## MET          0      0    0       0   0      0    0      0     0   0
## BCL2L1       0      0    0       0   0      0    0      0     0   0
## IFNG         0      0    0       0   0      0    0      0     0   0
## FCGR1A       0      0    0       0   0      0    0      0     0   0
## CCND2        0      0    0       0   0      0    0      0     0   0
## CEL          0      0    0       0   0      0    0      0     0   0

Let’s create the igraph plot.

library(igraph)

# Creating the igraph variable
Network <- graph_from_adjacency_matrix(Wide_selected_Interaction_tab_SS, 
                                       mode = "directed", diag = FALSE)
# Creating a vector containing three colours for vertices
Vcol <- case_when(colSums(Wide_selected_Interaction_tab_SS[1:2,]) == 0 ~ "#488990",
          colSums(Wide_selected_Interaction_tab_SS[1:2,]) == 1 ~ "#ffbe6f",
          colSums(Wide_selected_Interaction_tab_SS[1:2,]) == 2 ~ "#f66151")

# Reducing the margins of the plot
par(mar = rep(0.1, 4))

# Setting a random seed to reproduce the very same graph over and over again
set.seed(1)

# Creating the plot itself
plot.igraph(Network, layout = layout_with_dh,
            vertex.color = Vcol, vertex.label.color = "black", 
            vertex.label.cex = 0.4, edge.arrow.size = 0.4)

The two transcription factors are shown with green, their common target genes with red and their distinct target genes with yellow vertices.

To avoid such overlapping among vertices we can adjust the network by hand using the tkplot function of the igraph package.

set.seed(1)
tkplot(Network, layout = layout_with_dh,
       vertex.color = Vcol, vertex.label.color = "black", 
       vertex.label.cex = 0.4)

After adjusting the common vertices by hand to avoid overlapping, we can produce a similar image.

7 Using the Cytoscape software

Let’s visualise the common and distinct target genes of the two transcription factors with the Cytoscape network visualization software by following these steps:

  1. After downloading the target gene interaction tables of the STAT5A and STAT5B transcription factors from their TFLink entry pages, we open the Cytoscape software.
  2. Then we import the tsv files as networks into the Cytoscape:
    • FileImportNetwork from file... and select the downloaded file(s) (one file at once).
    • In the pop-up window click on the column names and select the following explanations for them:
      • UniprotID.TFSource Node Attribute
      • Name.TFSource
      • UniprotID.TargetTarget Node Attribute
      • NCBI.GeneID.TargetTarget Node Attribute
      • Name.TargetTarget
      • then, all of the other columns will turn to Edge Attribute.
    • Click on the OK button.
  3. Repeat the steps of point 2 with the other interaction table as well.
  4. Select only the small-scale interactions from the imported networks:
    • On the left side in the Network panel click on the network what is needed to be filtered.
    • Go to the Filter panel, click on the + sign, select the Edge: Small-scale evidence option, and write “Yes” into the box.
    • Then, create a net network: Go to FileNet NetworkFrom Selected Nodes, Selected Edges.
  5. Repeat the steps of point 3 with the network of the other transcription factor.
  6. Merge the filtered networks with ToolsMergeNetwork..., and in the pop-up window put the two filtered networks (targets_of_P42229.tsv(1) and targets_of_P51692.tsv(1)) into the right side of the panel. Then click on the Merge button.
  7. Go to the Style panel on the left side and set different colours, shapes, size parameters for the transcription factors and the target genes.
  8. To create a meaningful layout, select the nodes of the common target genes in the filtered network at the Network panel and go to the Layout menu → Grid layoutSelected Nodes only. Then repeat this with the nodes of the distinct target genes. You can drag the nodes of the two transcription factors manually.

After all these steps we will end up a transcription factor - target gene interaction network like this:

8 Environment

In this use case the following software and package versions were applied:

  • R version 4.1.3
  • tidyverse version 1.3.1
  • data.table version 1.14.2
  • igraph version 1.2.11
  • Cytoscape version 3.8.2

References

Nosaka, T, T Kawashima, K Misawa, K Ikuta, A L Mui, and T Kitamura. 1999. Stat5 as a Molecular Regulator of Proliferation, Differentiation and Apoptosis in Hematopoietic Cells.” The EMBO Journal 18 (17): 4754–65. https://doi.org/10.1093/emboj/18.17.4754.