Here we want to check and visualise the common target genes of two transcription factors. We describe how to find transcription factors which share common target genes. We create a transcription factor - target gene interaction graph of the STAT5A and STAT5B transcription factors using the igraph
R
package. We also show how to create the same transcription factor - target gene interaction graph by using the Cytoscape software.
Let’s download the human (Homo sapiens) interaction table including all (small- and large-scale) interactions from the Download part of the TFLink website, and read it in with the fread
command of the data.table
package that reads huge tables effectively. We also will use the tidyverse
package.
library(data.table)
library(tidyverse)
<- fread("TFLink_Homo_sapiens_interactions_All_simpleFormat_v1.0.tsv") Interaction_tab
Let’s see the first few rows of the interaction table.
%>%
Interaction_tab slice(1:5)
UniprotID.TF | UniprotID.Target | NCBI.GeneID.TF | NCBI.GeneID.Target | Name.TF | Name.Target | Detection.method |
---|---|---|---|---|---|---|
Q9H9S0 | O94907 | 79923 | 22943 | NANOG | DKK1 | chromatin immunoprecipitation assay;inferred by curator |
P37231 | P10826 | 5468 | 5915 | PPARG | RARB | chromatin immunoprecipitation assay;inferred by curator |
P10242 | P08047 | 4602 | 6667 | MYB | SP1 | chromatin immunoprecipitation assay;inferred by curator |
P31269 | Q02363 | 3205 | 3398 | HOXA9 | ID2 | inferred by curator |
P03372 | P17275 | 2099 | 3726 | ESR1 | JUNB | chromatin immunoprecipitation assay;inferred by curator |
PubmedID | Organism | Source.database | Small-scale.evidence |
---|---|---|---|
19148141;29087512;29126285;27924024 | Homo sapiens | GTRD;ReMap;TRRUST | Yes |
17202159;12839938;29087512;27924024 | Homo sapiens | GTRD;TRED;TRRUST | Yes |
29126285;27924024;17202159 | Homo sapiens | GTRD;ReMap;TRED | Yes |
29087512;20565746 | Homo sapiens | TRRUST | Yes |
29126285;18971253;27924024;11477071;17202159;29087512 | Homo sapiens | GTRD;PAZAR;ReMap;TRED;TRRUST | Yes |
Let’s see the distribution of number of target genes among transcription factors.
%>%
Interaction_tab group_by(UniprotID.Target, `Small-scale.evidence`) %>%
summarise(`Nr. of TFs regulates the target gene` = n()) %>%
arrange(desc(`Nr. of TFs regulates the target gene`)) %>%
ggplot(aes(x = `Nr. of TFs regulates the target gene`,
fill = `Small-scale.evidence`)) +
geom_histogram(binwidth = 50) +
theme_minimal()
To be able to plot all target genes of two transcription factors with readable names, we will narrow down our choices to interactions supported by small-scale evidences, and we will select from transcription factor having up-to 50 target genes.
Therefore, we create a new interaction table containing interaction data supported by small-scale evidences. (Alternatively, this table can also be downloaded directly from the TFLink website.)
<- Interaction_tab %>%
Interaction_tab_SS filter(`Small-scale.evidence` == "Yes")
Let’s save the names of such transcription factors having up to 50 target genes to a vector
variable.
<- Interaction_tab_SS %>%
TF_names group_by(Name.TF) %>%
summarise(`Nr. of target genes` = n()) %>%
filter(`Nr. of target genes` < 50) %>%
select(Name.TF) %>%
pull()
# Checking the content
head(TF_names)
## [1] "AATF" "ABL1" "AHR" "AIP" "AIRE" "ANKRD1"
Then, let’s create a list
variable containing all target genes of each transcription factors, to be able to find the ones regulating common target genes.
# Creating the list
<- Interaction_tab_SS %>%
TGs_per_TFs_list filter(Name.TF %in% TF_names) %>%
select(Name.TF, Name.Target) %>%
group_by(Name.TF) %>%
group_split() %>%
lapply(., pull, var = Name.Target)
# Naming the elements
names(TGs_per_TFs_list) <- TF_names
# Checking the content
head(TGs_per_TFs_list, n = 3)
## $AATF
## [1] "MYC" "TP53" "BAX" "CDKN1A" "KLKB1" "KLK3"
##
## $ABL1
## [1] "JUN" "TP53" "BAX" "CSF1" "CDKN1A" "BCL2" "BCL6" "PIM1"
## [9] "FOXO3" "CCND2"
##
## $AHR
## [1] "AHRR" "MYC" "ABCG2" "MFSD2A" "MT2A" "IL1B" "PCNA" "FOS"
## [9] "RFC3" "CYP1A2" "CA9" "UGT1A6" "UGT1A1" "CCNG2" "IL13" "CYP2B6"
## [17] "CYP1B1" "CYP1A1" "IL6" "ARNT" "CCND1" "BRCA1"
Let’s check which transcription factor pairs have the most common target genes.
# TF pairs in every combinations
<- combn(names(TGs_per_TFs_list), 2, simplify = FALSE)
TF_combination_list
# Nr. of common genes of all TF pairs
<- list()
Intersect_lengths for(i in 1:length(TF_combination_list)){
<-
Intersect_lengths[i] intersect(TGs_per_TFs_list[[TF_combination_list[[i]][1]]],
2]]]) %>%
TGs_per_TFs_list[[TF_combination_list[[i]][length()
}rm(i)
# Converting list to vector
<- Intersect_lengths %>%
Intersect_lengths unlist()
# Adding names of TF pairs (separated by "_")
names(Intersect_lengths) <- sapply(TF_combination_list, paste, collapse = "_")
# Checking the maximum number of common genes
%>%
Intersect_lengths sort() %>%
tail()
## NR1H2_NR1H3 FOSL2_JUND DNMT1_DNMT3B SREBF1_SREBF2 TWIST1_TWIST2
## 16 17 19 19 25
## STAT5A_STAT5B
## 30
Let’s create a vector
containing the names of the transcription factor pair having the most common target genes.
<- which.max(Intersect_lengths) %>%
TFs names() %>%
str_split(pattern = "_") %>%
unlist()
# Let's see
TFs
## [1] "STAT5A" "STAT5B"
STAT5A and STAT5B (signal transducer and activator of transcription 5 A and B) are two highly similar transcription factors. The STAT5 proteins are involved in cytosolic signalling and they mediate the expression of specific genes (Nosaka et al. 1999).
igraph
R
packageNow we need more steps to create an adjacency matrix input for the igraph
network plot.
Adjacency matrix is a square matrix used to represent a finite graph. The elements of the matrix indicate whether pairs of vertices are adjacent or not in the graph.
# Creating data.frame
# while excluding target genes with missing names ("-")
<- Interaction_tab_SS %>%
Wide_selected_Interaction_tab_SS filter(Name.TF %in% TFs & Name.Target != "-") %>%
select(Name.TF, Name.Target) %>%
mutate(Value = 1) %>%
pivot_wider(names_from = Name.Target, values_from = Value) %>%
replace(is.na(.), 0) %>%
data.frame()
# Adding row names
rownames(Wide_selected_Interaction_tab_SS) <- Wide_selected_Interaction_tab_SS$Name.TF
$Name.TF <- NULL
Wide_selected_Interaction_tab_SS
# Converting to matrix
<- Wide_selected_Interaction_tab_SS %>%
Wide_selected_Interaction_tab_SS as.matrix()
# We need a function for creating the (symmetrical) adjacency matrix
<- function(A){
expand.matrix <- nrow(A)
m <- ncol(A)
n <- matrix(0,nrow = m, ncol = m)
B <- matrix(0,nrow = n, ncol = n)
C cbind(rbind(B,t(A)),rbind(A,C))
}
# Applying the expand.matrix function
<- Wide_selected_Interaction_tab_SS %>%
Wide_selected_Interaction_tab_SS expand.matrix()
# Adding missing row names
rownames(Wide_selected_Interaction_tab_SS)[1:2] <- colnames(
1:2]
Wide_selected_Interaction_tab_SS)[
# Converting to symmetrical adjacency matrix to asymmetrical to be able to
# create a directed graph (by deleting values below diagonal)
lower.tri(Wide_selected_Interaction_tab_SS)] <- 0L Wide_selected_Interaction_tab_SS[
Let’s see how does the resulted adjacency matrix looks like by checking its upper left part.
1:10, 1:10] Wide_selected_Interaction_tab_SS[
## STAT5A STAT5B PIM1 PTTG1IP MET BCL2L1 IFNG FCGR1A CCND2 CEL
## STAT5A 0 0 1 1 1 1 1 1 1 1
## STAT5B 0 0 1 1 1 1 1 1 0 1
## PIM1 0 0 0 0 0 0 0 0 0 0
## PTTG1IP 0 0 0 0 0 0 0 0 0 0
## MET 0 0 0 0 0 0 0 0 0 0
## BCL2L1 0 0 0 0 0 0 0 0 0 0
## IFNG 0 0 0 0 0 0 0 0 0 0
## FCGR1A 0 0 0 0 0 0 0 0 0 0
## CCND2 0 0 0 0 0 0 0 0 0 0
## CEL 0 0 0 0 0 0 0 0 0 0
Let’s create the igraph
plot.
library(igraph)
# Creating the igraph variable
<- graph_from_adjacency_matrix(Wide_selected_Interaction_tab_SS,
Network mode = "directed", diag = FALSE)
# Creating a vector containing three colours for vertices
<- case_when(colSums(Wide_selected_Interaction_tab_SS[1:2,]) == 0 ~ "#488990",
Vcol colSums(Wide_selected_Interaction_tab_SS[1:2,]) == 1 ~ "#ffbe6f",
colSums(Wide_selected_Interaction_tab_SS[1:2,]) == 2 ~ "#f66151")
# Reducing the margins of the plot
par(mar = rep(0.1, 4))
# Setting a random seed to reproduce the very same graph over and over again
set.seed(1)
# Creating the plot itself
plot.igraph(Network, layout = layout_with_dh,
vertex.color = Vcol, vertex.label.color = "black",
vertex.label.cex = 0.4, edge.arrow.size = 0.4)
The two transcription factors are shown with green, their common target genes with red and their distinct target genes with yellow vertices.
To avoid such overlapping among vertices we can adjust the network by hand using the tkplot
function of the igraph
package.
set.seed(1)
tkplot(Network, layout = layout_with_dh,
vertex.color = Vcol, vertex.label.color = "black",
vertex.label.cex = 0.4)
After adjusting the common vertices by hand to avoid overlapping, we can produce a similar image.
Let’s visualise the common and distinct target genes of the two transcription factors with the Cytoscape network visualization software by following these steps:
File
→ Import
→ Network from file...
and select the downloaded file(s) (one file at once).Source Node Attribute
Source
Target Node Attribute
Target Node Attribute
Target
Edge Attribute
.OK
button.Network
panel click on the network what is needed to be filtered.Filter
panel, click on the +
sign, select the Edge: Small-scale evidence
option, and write “Yes” into the box.File
→ Net Network
→ From Selected Nodes, Selected Edges
.Tools
→ Merge
→ Network...
, and in the pop-up window put the two filtered networks (targets_of_P42229.tsv(1) and targets_of_P51692.tsv(1)) into the right side of the panel. Then click on the Merge
button.Style
panel on the left side and set different colours, shapes, size parameters for the transcription factors and the target genes.Network
panel and go to the Layout
menu → Grid layout
→ Selected Nodes only
. Then repeat this with the nodes of the distinct target genes. You can drag the nodes of the two transcription factors manually.After all these steps we will end up a transcription factor - target gene interaction network like this:
In this use case the following software and package versions were applied:
R
version 4.1.3tidyverse
version 1.3.1data.table
version 1.14.2igraph
version 1.2.11Cytoscape
version 3.8.2