Here we want to check and visualise the common target genes of two transcription factors. We describe how to find transcription factors which share common target genes. We create a transcription factor - target gene interaction graph of the STAT5A and STAT5B transcription factors using the igraph R package. We also show how to create the same transcription factor - target gene interaction graph by using the Cytoscape software.
Let’s download the human (Homo sapiens) interaction table including all (small- and large-scale) interactions from the Download part of the TFLink website, and read it in with the fread command of the data.table package that reads huge tables effectively. We also will use the tidyverse package.
library(data.table)
library(tidyverse)
Interaction_tab <- fread("TFLink_Homo_sapiens_interactions_All_simpleFormat_v1.0.tsv")Let’s see the first few rows of the interaction table.
Interaction_tab %>%
slice(1:5)| UniprotID.TF | UniprotID.Target | NCBI.GeneID.TF | NCBI.GeneID.Target | Name.TF | Name.Target | Detection.method |
|---|---|---|---|---|---|---|
| Q9H9S0 | O94907 | 79923 | 22943 | NANOG | DKK1 | chromatin immunoprecipitation assay;inferred by curator |
| P37231 | P10826 | 5468 | 5915 | PPARG | RARB | chromatin immunoprecipitation assay;inferred by curator |
| P10242 | P08047 | 4602 | 6667 | MYB | SP1 | chromatin immunoprecipitation assay;inferred by curator |
| P31269 | Q02363 | 3205 | 3398 | HOXA9 | ID2 | inferred by curator |
| P03372 | P17275 | 2099 | 3726 | ESR1 | JUNB | chromatin immunoprecipitation assay;inferred by curator |
| PubmedID | Organism | Source.database | Small-scale.evidence |
|---|---|---|---|
| 19148141;29087512;29126285;27924024 | Homo sapiens | GTRD;ReMap;TRRUST | Yes |
| 17202159;12839938;29087512;27924024 | Homo sapiens | GTRD;TRED;TRRUST | Yes |
| 29126285;27924024;17202159 | Homo sapiens | GTRD;ReMap;TRED | Yes |
| 29087512;20565746 | Homo sapiens | TRRUST | Yes |
| 29126285;18971253;27924024;11477071;17202159;29087512 | Homo sapiens | GTRD;PAZAR;ReMap;TRED;TRRUST | Yes |
Let’s see the distribution of number of target genes among transcription factors.
Interaction_tab %>%
group_by(UniprotID.Target, `Small-scale.evidence`) %>%
summarise(`Nr. of TFs regulates the target gene` = n()) %>%
arrange(desc(`Nr. of TFs regulates the target gene`)) %>%
ggplot(aes(x = `Nr. of TFs regulates the target gene`,
fill = `Small-scale.evidence`)) +
geom_histogram(binwidth = 50) +
theme_minimal()To be able to plot all target genes of two transcription factors with readable names, we will narrow down our choices to interactions supported by small-scale evidences, and we will select from transcription factor having up-to 50 target genes.
Therefore, we create a new interaction table containing interaction data supported by small-scale evidences. (Alternatively, this table can also be downloaded directly from the TFLink website.)
Interaction_tab_SS <- Interaction_tab %>%
filter(`Small-scale.evidence` == "Yes")Let’s save the names of such transcription factors having up to 50 target genes to a vector variable.
TF_names <- Interaction_tab_SS %>%
group_by(Name.TF) %>%
summarise(`Nr. of target genes` = n()) %>%
filter(`Nr. of target genes` < 50) %>%
select(Name.TF) %>%
pull()
# Checking the content
head(TF_names)## [1] "AATF" "ABL1" "AHR" "AIP" "AIRE" "ANKRD1"
Then, let’s create a list variable containing all target genes of each transcription factors, to be able to find the ones regulating common target genes.
# Creating the list
TGs_per_TFs_list <- Interaction_tab_SS %>%
filter(Name.TF %in% TF_names) %>%
select(Name.TF, Name.Target) %>%
group_by(Name.TF) %>%
group_split() %>%
lapply(., pull, var = Name.Target)
# Naming the elements
names(TGs_per_TFs_list) <- TF_names
# Checking the content
head(TGs_per_TFs_list, n = 3)## $AATF
## [1] "MYC" "TP53" "BAX" "CDKN1A" "KLKB1" "KLK3"
##
## $ABL1
## [1] "JUN" "TP53" "BAX" "CSF1" "CDKN1A" "BCL2" "BCL6" "PIM1"
## [9] "FOXO3" "CCND2"
##
## $AHR
## [1] "AHRR" "MYC" "ABCG2" "MFSD2A" "MT2A" "IL1B" "PCNA" "FOS"
## [9] "RFC3" "CYP1A2" "CA9" "UGT1A6" "UGT1A1" "CCNG2" "IL13" "CYP2B6"
## [17] "CYP1B1" "CYP1A1" "IL6" "ARNT" "CCND1" "BRCA1"
Let’s check which transcription factor pairs have the most common target genes.
# TF pairs in every combinations
TF_combination_list <- combn(names(TGs_per_TFs_list), 2, simplify = FALSE)
# Nr. of common genes of all TF pairs
Intersect_lengths <- list()
for(i in 1:length(TF_combination_list)){
Intersect_lengths[i] <-
intersect(TGs_per_TFs_list[[TF_combination_list[[i]][1]]],
TGs_per_TFs_list[[TF_combination_list[[i]][2]]]) %>%
length()
}
rm(i)
# Converting list to vector
Intersect_lengths <- Intersect_lengths %>%
unlist()
# Adding names of TF pairs (separated by "_")
names(Intersect_lengths) <- sapply(TF_combination_list, paste, collapse = "_")
# Checking the maximum number of common genes
Intersect_lengths %>%
sort() %>%
tail()## NR1H2_NR1H3 FOSL2_JUND DNMT1_DNMT3B SREBF1_SREBF2 TWIST1_TWIST2
## 16 17 19 19 25
## STAT5A_STAT5B
## 30
Let’s create a vector containing the names of the transcription factor pair having the most common target genes.
TFs <- which.max(Intersect_lengths) %>%
names() %>%
str_split(pattern = "_") %>%
unlist()
# Let's see
TFs## [1] "STAT5A" "STAT5B"
STAT5A and STAT5B (signal transducer and activator of transcription 5 A and B) are two highly similar transcription factors. The STAT5 proteins are involved in cytosolic signalling and they mediate the expression of specific genes (Nosaka et al. 1999).
igraph R packageNow we need more steps to create an adjacency matrix input for the igraph network plot.
Adjacency matrix is a square matrix used to represent a finite graph. The elements of the matrix indicate whether pairs of vertices are adjacent or not in the graph.
# Creating data.frame
# while excluding target genes with missing names ("-")
Wide_selected_Interaction_tab_SS <- Interaction_tab_SS %>%
filter(Name.TF %in% TFs & Name.Target != "-") %>%
select(Name.TF, Name.Target) %>%
mutate(Value = 1) %>%
pivot_wider(names_from = Name.Target, values_from = Value) %>%
replace(is.na(.), 0) %>%
data.frame()
# Adding row names
rownames(Wide_selected_Interaction_tab_SS) <- Wide_selected_Interaction_tab_SS$Name.TF
Wide_selected_Interaction_tab_SS$Name.TF <- NULL
# Converting to matrix
Wide_selected_Interaction_tab_SS <- Wide_selected_Interaction_tab_SS %>%
as.matrix()
# We need a function for creating the (symmetrical) adjacency matrix
expand.matrix <- function(A){
m <- nrow(A)
n <- ncol(A)
B <- matrix(0,nrow = m, ncol = m)
C <- matrix(0,nrow = n, ncol = n)
cbind(rbind(B,t(A)),rbind(A,C))
}
# Applying the expand.matrix function
Wide_selected_Interaction_tab_SS <- Wide_selected_Interaction_tab_SS %>%
expand.matrix()
# Adding missing row names
rownames(Wide_selected_Interaction_tab_SS)[1:2] <- colnames(
Wide_selected_Interaction_tab_SS)[1:2]
# Converting to symmetrical adjacency matrix to asymmetrical to be able to
# create a directed graph (by deleting values below diagonal)
Wide_selected_Interaction_tab_SS[lower.tri(Wide_selected_Interaction_tab_SS)] <- 0LLet’s see how does the resulted adjacency matrix looks like by checking its upper left part.
Wide_selected_Interaction_tab_SS[1:10, 1:10]## STAT5A STAT5B PIM1 PTTG1IP MET BCL2L1 IFNG FCGR1A CCND2 CEL
## STAT5A 0 0 1 1 1 1 1 1 1 1
## STAT5B 0 0 1 1 1 1 1 1 0 1
## PIM1 0 0 0 0 0 0 0 0 0 0
## PTTG1IP 0 0 0 0 0 0 0 0 0 0
## MET 0 0 0 0 0 0 0 0 0 0
## BCL2L1 0 0 0 0 0 0 0 0 0 0
## IFNG 0 0 0 0 0 0 0 0 0 0
## FCGR1A 0 0 0 0 0 0 0 0 0 0
## CCND2 0 0 0 0 0 0 0 0 0 0
## CEL 0 0 0 0 0 0 0 0 0 0
Let’s create the igraph plot.
library(igraph)
# Creating the igraph variable
Network <- graph_from_adjacency_matrix(Wide_selected_Interaction_tab_SS,
mode = "directed", diag = FALSE)
# Creating a vector containing three colours for vertices
Vcol <- case_when(colSums(Wide_selected_Interaction_tab_SS[1:2,]) == 0 ~ "#488990",
colSums(Wide_selected_Interaction_tab_SS[1:2,]) == 1 ~ "#ffbe6f",
colSums(Wide_selected_Interaction_tab_SS[1:2,]) == 2 ~ "#f66151")
# Reducing the margins of the plot
par(mar = rep(0.1, 4))
# Setting a random seed to reproduce the very same graph over and over again
set.seed(1)
# Creating the plot itself
plot.igraph(Network, layout = layout_with_dh,
vertex.color = Vcol, vertex.label.color = "black",
vertex.label.cex = 0.4, edge.arrow.size = 0.4)The two transcription factors are shown with green, their common target genes with red and their distinct target genes with yellow vertices.
To avoid such overlapping among vertices we can adjust the network by hand using the tkplot function of the igraph package.
set.seed(1)
tkplot(Network, layout = layout_with_dh,
vertex.color = Vcol, vertex.label.color = "black",
vertex.label.cex = 0.4)After adjusting the common vertices by hand to avoid overlapping, we can produce a similar image.
Let’s visualise the common and distinct target genes of the two transcription factors with the Cytoscape network visualization software by following these steps:
File → Import → Network from file... and select the downloaded file(s) (one file at once).Source Node AttributeSourceTarget Node AttributeTarget Node AttributeTargetEdge Attribute.OK button.Network panel click on the network what is needed to be filtered.Filter panel, click on the + sign, select the Edge: Small-scale evidence option, and write “Yes” into the box.File → Net Network → From Selected Nodes, Selected Edges.Tools → Merge → Network..., and in the pop-up window put the two filtered networks (targets_of_P42229.tsv(1) and targets_of_P51692.tsv(1)) into the right side of the panel. Then click on the Merge button.Style panel on the left side and set different colours, shapes, size parameters for the transcription factors and the target genes.Network panel and go to the Layout menu → Grid layout → Selected Nodes only. Then repeat this with the nodes of the distinct target genes. You can drag the nodes of the two transcription factors manually.After all these steps we will end up a transcription factor - target gene interaction network like this:
In this use case the following software and package versions were applied:
R version 4.1.3tidyverse version 1.3.1data.table version 1.14.2igraph version 1.2.11Cytoscape version 3.8.2