Compute fuzzy clusters of different gene sets, aiming to identify grouped categories that can better represent the distinct biological themes in the enrichment results
gs_fuzzyclustering(
res_enrich,
gtl = NULL,
n_gs = nrow(res_enrich),
gs_ids = NULL,
similarity_matrix = NULL,
similarity_threshold = 0.35,
fuzzy_seeding_initial_neighbors = 3,
fuzzy_multilinkage_rule = 0.5
)
A data.frame
object, storing the result of the functional
enrichment analysis. See more in the main function, GeneTonic()
, to check the
formatting requirements (a minimal set of columns should be present).
A GeneTonic
-list object, containing in its slots the arguments
specified above: dds
, res_de
, res_enrich
, and annotation_obj
- the names
of the list must be specified following the content they are expecting
Integer value, corresponding to the maximal number of gene sets to be displayed
Character vector, containing a subset of gs_id
as they are
available in res_enrich
. Lists the gene sets to be displayed.
A similarity matrix between gene sets. Can be e.g.
computed with create_kappa_matrix()
or create_jaccard_matrix()
or a similar
function, returning a symmetric matrix with numeric values (max = 1). If not
provided, this will be computed on the fly with create_kappa_matrix()
A numeric value for the similarity matrix, used to determine the initial seeds as in the implementation of DAVID. Higher values will lead to more genesets being initially unclustered, leading to a functional classification result with fewer groups and fewer geneset members. Defaults to 0.35, recommended to not go below 0.3 (see DAVID help pages)
Integer value, corresponding to the minimum geneset number in a seeding group. Lower values will lead to the inclusion of more genesets in the functional groups, and may generate a lot of small size groups. Defaults to 3
Numeric value, comprised between 0 and 1. This parameter will determine how the seeding groups merge with each other, by specifying the percentage of shared genesets required to merge the two subsets into one group. Higher values will give sharper separation between the groups of genesets. Defaults to 0.5 (50%)
A data frame, shaped in a similar way as the originally provided
res_enrich
object, containing two extra columns: gs_fuzzycluster
, to specify
the identifier of the fuzzy cluster of genesets, and gs_cluster_status
, which
can specify whether the geneset is the "Representative" for that cluster or
a simple "Member".
Notably, the number of rows in the returned object can be higher than the
original number of rows in res_enrich
.
See https://david.ncifcrf.gov/helps/functional_classification.html#clustering for details on the original implementation
data(res_enrich_macrophage, package = "GeneTonic")
res_enrich <- shake_topGOtableResult(topgoDE_macrophage_IFNg_vs_naive)
#> Found 500 gene sets in `topGOtableResult` object.
#> Converting for usage in GeneTonic...
# taking a smaller subset
res_enrich_subset <- res_enrich[1:100, ]
fuzzy_subset <- gs_fuzzyclustering(
res_enrich = res_enrich_subset,
n_gs = nrow(res_enrich_subset),
gs_ids = NULL,
similarity_matrix = NULL,
similarity_threshold = 0.35,
fuzzy_seeding_initial_neighbors = 3,
fuzzy_multilinkage_rule = 0.5
)
# show all genesets members of the first cluster
fuzzy_subset[fuzzy_subset$gs_fuzzycluster == "1", ]
#> gs_id gs_description gs_pvalue
#> GO:0060333 GO:0060333 interferon-gamma-mediated signaling pathway 1.2e-20
#> GO:0060337 GO:0060337 type I interferon signaling pathway 4.8e-13
#> GO:0034341 GO:0034341 response to interferon-gamma 2.2e-08
#> GO:0045087 GO:0045087 innate immune response 1.1e-07
#> GO:0071346 GO:0071346 cellular response to interferon-gamma 8.5e-06
#> GO:0019221 GO:0019221 cytokine-mediated signaling pathway 7.4e-04
#> gs_genes
#> GO:0060333 B2M,CAMK2D,CIITA,GBP1,GBP2,HLA-A,HLA-B,HLA-C,HLA-DPA1,HLA-DPB1,HLA-DQA1,HLA-DQB1,HLA-DQB2,HLA-DRA,HLA-DRB1,HLA-DRB5,HLA-E,HLA-F,HLA-G,ICAM1,IRF1,IRF7,JAK2,MT2A,NLRC5,NMI,OAS2,PARP14,PML,SOCS1,STAT1,TRIM22,TRIM31,VCAM1
#> GO:0060337 GBP2,HLA-A,HLA-B,HLA-C,HLA-E,HLA-F,HLA-G,IFI27,IFI35,IFIT2,IFIT3,IFITM1,IFITM2,IFITM3,IRF1,IRF7,ISG20,NLRC5,OAS2,PSMB8,RSAD2,STAT1,STAT2,XAF1,ZBP1
#> GO:0034341 ACOD1,B2M,CALCOCO2,CAMK2D,CCL13,CCL15,CCL18,CCL5,CCL7,CCL8,CIITA,EDN1,GBP1,GBP2,GBP3,GBP4,GBP5,GBP6,GBP7,GCH1,HLA-A,HLA-B,HLA-C,HLA-DPA1,HLA-DPB1,HLA-DQA1,HLA-DQB1,HLA-DQB2,HLA-DRA,HLA-DRB1,HLA-DRB5,HLA-E,HLA-F,HLA-G,ICAM1,IFITM1,IFITM2,IFITM3,IL12RB1,IRF1,IRF7,JAK2,MEFV,MT2A,NLRC5,NMI,NUB1,OAS2,PARP14,PML,RAB20,SLC11A1,SOCS1,STAT1,TRIM22,TRIM31,UBD,VCAM1
#> GO:0045087 ACOD1,ADAM8,AIM2,APOBEC3A,APOBEC3D,APOBEC3G,APOL1,APPL2,B2M,C1QB,C1R,C1RL,C1S,C2,C3,C4A,C4B,CALCOCO2,CAMK2D,CASP4,CCL13,CCL15,CCL18,CCL5,CCL7,CCL8,CD300LF,CEACAM1,CFB,CFH,CIITA,CLEC10A,CLEC6A,COLEC12,CTSS,CX3CR1,CYLD,DDX60,DTX3L,EDN1,FCN1,GBP1,GBP2,GBP3,GBP4,GBP5,GBP6,GBP7,GCH1,GRAMD4,GSDMD,H2BC21,HLA-A,HLA-B,HLA-C,HLA-DPA1,HLA-DPB1,HLA-DQA1,HLA-DQB1,HLA-DQB2,HLA-DRA,HLA-DRB1,HLA-DRB5,HLA-E,HLA-F,HLA-G,ICAM1,ICAM2,IFI27,IFI35,IFIH1,IFIT2,IFIT3,IFIT5,IFITM1,IFITM2,IFITM3,IL12A,IL12RB1,IL27,IRF1,IRF7,ISG20,JAK2,JAK3,KLRK1,LAG3,LILRA2,LYN,MCOLN2,MEFV,MSRB1,MT2A,MUC1,NCF1,NLRC5,NMI,NOD2,NUB1,OAS2,OPTN,PARP14,PML,PSMB10,PSMB8,PSMB9,PSME1,PSME2,PYHIN1,RAB20,RELB,RIPK2,RSAD2,SERINC5,SERPING1,SLAMF1,SLAMF6,SLAMF7,SLAMF8,SLC11A1,SOCS1,STAT1,STAT2,TICAM2,TIFA,TLR10,TLR5,TLR7,TLR8,TRAFD1,TRIM22,TRIM31,UBD,VCAM1,XAF1,ZBP1
#> GO:0071346 ACOD1,B2M,CAMK2D,CCL13,CCL15,CCL18,CCL5,CCL7,CCL8,CIITA,EDN1,GBP1,GBP2,GBP3,GBP4,GBP5,GBP6,GBP7,HLA-A,HLA-B,HLA-C,HLA-DPA1,HLA-DPB1,HLA-DQA1,HLA-DQB1,HLA-DQB2,HLA-DRA,HLA-DRB1,HLA-DRB5,HLA-E,HLA-F,HLA-G,ICAM1,IL12RB1,IRF1,IRF7,JAK2,MT2A,NLRC5,NMI,OAS2,PARP14,PML,RAB20,SOCS1,STAT1,TRIM22,TRIM31,VCAM1
#> GO:0019221 ACSL1,AIM2,APPL2,B2M,BCL6,BIRC5,CAMK2D,CARD16,CASP4,CCL13,CCL15,CCL18,CCL5,CCL7,CCL8,CCRL2,CD300LF,CD40,CD74,CD80,CEACAM1,CIITA,CISH,CSF2RB,CX3CR1,CXCL10,CXCL11,CXCL5,CXCL9,CXCR3,CYLD,EDA,EDN1,FLT3LG,GBP1,GBP2,HGF,HLA-A,HLA-B,HLA-C,HLA-DPA1,HLA-DPB1,HLA-DQA1,HLA-DQB1,HLA-DQB2,HLA-DRA,HLA-DRB1,HLA-DRB5,HLA-E,HLA-F,HLA-G,ICAM1,IFI27,IFI35,IFIT2,IFIT3,IFITM1,IFITM2,IFITM3,IL12A,IL12RB1,IL12RB2,IL15,IL15RA,IL18BP,IL1RN,IL22RA2,IL27,IL31RA,IL32,IL3RA,IRF1,IRF7,ISG20,JAK2,JAK3,KARS1,MMP9,MT2A,MUC1,NLRC5,NMI,NOD2,OAS2,OSM,PADI2,PARP14,PIM1,PML,PSMB10,PSMB8,PSMB9,PSME1,PSME2,RIPK2,RSAD2,S1PR1,SOCS1,STAP1,STAT1,STAT2,TICAM2,TNFRSF11A,TNFSF13B,TNFSF18,TNFSF8,TRIM22,TRIM31,VCAM1,XAF1,ZBP1
#> gs_de_count gs_bg_count Expected gs_fuzzycluster gs_cluster_status
#> GO:0060333 34 82 4.55 1 Representative
#> GO:0060337 25 80 4.44 1 Member
#> GO:0034341 58 174 9.66 1 Member
#> GO:0045087 136 710 39.42 1 Member
#> GO:0071346 49 158 8.77 1 Member
#> GO:0019221 111 656 36.42 1 Member
# list only the representative clusters
head(fuzzy_subset[fuzzy_subset$gs_cluster_status == "Representative", ], 10)
#> gs_id
#> GO:0060333 GO:0060333
#> GO:0019885 GO:0019885
#> GO:0006958 GO:0006958
#> GO:0006270 GO:0006270
#> GO:0070098 GO:0070098
#> GO:0006954 GO:0006954
#> GO:0035455 GO:0035455
#> GO:0032727 GO:0032727
#> GO:0045124 GO:0045124
#> GO:0002250 GO:0002250
#> gs_description
#> GO:0060333 interferon-gamma-mediated signaling pathway
#> GO:0019885 antigen processing and presentation of endogenous peptide antigen via MHC class I
#> GO:0006958 complement activation, classical pathway
#> GO:0006270 DNA replication initiation
#> GO:0070098 chemokine-mediated signaling pathway
#> GO:0006954 inflammatory response
#> GO:0035455 response to interferon-alpha
#> GO:0032727 positive regulation of interferon-alpha production
#> GO:0045124 regulation of bone resorption
#> GO:0002250 adaptive immune response
#> gs_pvalue
#> GO:0060333 1.20e-20
#> GO:0019885 4.20e-12
#> GO:0006958 2.50e-07
#> GO:0006270 1.90e-08
#> GO:0070098 1.20e-06
#> GO:0006954 2.90e-08
#> GO:0035455 6.20e-05
#> GO:0032727 5.50e-04
#> GO:0045124 1.52e-03
#> GO:0002250 9.20e-23
#> gs_genes
#> GO:0060333 B2M,CAMK2D,CIITA,GBP1,GBP2,HLA-A,HLA-B,HLA-C,HLA-DPA1,HLA-DPB1,HLA-DQA1,HLA-DQB1,HLA-DQB2,HLA-DRA,HLA-DRB1,HLA-DRB5,HLA-E,HLA-F,HLA-G,ICAM1,IRF1,IRF7,JAK2,MT2A,NLRC5,NMI,OAS2,PARP14,PML,SOCS1,STAT1,TRIM22,TRIM31,VCAM1
#> GO:0019885 B2M,ERAP2,HLA-A,HLA-B,HLA-C,HLA-E,HLA-F,HLA-G,TAP1,TAP2,TAPBP
#> GO:0006958 C1QB,C1R,C1RL,C1S,C2,C3,C4A,C4B,SERPING1
#> GO:0006270 CCNE2,CDC45,CDC6,CDT1,MCM10,MCM2,MCM4,MCM5,MCM6,MCM7,ORC1,POLE2,TICRR
#> GO:0070098 CCL13,CCL15,CCL18,CCL5,CCL7,CCL8,CCRL2,CX3CR1,CXCL10,CXCL11,CXCL5,CXCL9,CXCR3,EDN1,PADI2
#> GO:0006954 ACOD1,ADAM8,ADGRE5,ADORA3,AGT,AIM2,AOC3,APOL2,APOL3,APPL2,BCL6,C1QB,C1R,C1S,C2,C3,C4A,C4B,CASP4,CCL13,CCL15,CCL18,CCL5,CCL7,CCL8,CCRL2,CD28,CD40,CDH5,CFB,CFH,CIITA,CR1L,CRHBP,CXCL10,CXCL11,CXCL5,CXCL9,CXCR3,CYLD,CYSLTR1,GBP5,GGT5,GHRL,GSDMD,HGF,HLA-E,ICAM1,IDO1,IL15,IL1RN,IL22RA2,IL27,JAK2,KARS1,KLF4,LRRK2,LY75,LYN,MEFV,MMP25,MMP9,NLRP12,NMI,NOD2,OSM,P2RX7,POLB,PTGER4,RELB,RIPK2,SBNO2,SERPING1,SLAMF8,SLC11A1,STAP1,SUCNR1,TGM2,TICAM2,TLR10,TLR5,TLR7,TLR8,TNFAIP6,TNFRSF11A,VCAM1,ZC3H12A
#> GO:0035455 IFIT2,IFIT3,IFITM1,IFITM2,IFITM3,LAMP3,PYHIN1
#> GO:0032727 IFIH1,IRF7,RIPK2,STAT1,TLR7,TLR8
#> GO:0045124 ADAM8,CD38,P2RX7,PDK4,S1PR1,TMEM119,TNFRSF11A
#> GO:0002250 B2M,BCL6,BTN3A1,BTN3A2,BTN3A3,C1QB,C1R,C1RL,C1S,C2,C3,C4A,C4B,CD1A,CD1C,CD274,CD28,CD40,CD7,CD74,CD80,CEACAM1,CLEC10A,CLEC6A,CTLA4,CTSS,ERAP2,EXO1,FGL1,FGL2,GCNT3,GPR183,HLA-A,HLA-B,HLA-C,HLA-DMA,HLA-DMB,HLA-DOA,HLA-DOB,HLA-DPA1,HLA-DPB1,HLA-DQA1,HLA-DQB1,HLA-DQB2,HLA-DRA,HLA-DRB1,HLA-DRB5,HLA-E,HLA-F,HLA-G,ICAM1,IL12A,IL12RB1,IL18BP,IL27,IRF1,IRF7,ITK,JAK2,JAK3,KLRK1,LAG3,LAMP3,LILRA1,LILRB3,LYN,MCOLN2,P2RX7,PDCD1,PDCD1LG2,RELB,RIPK2,RNF19B,RSAD2,SERPING1,SIT1,SLAMF1,SLAMF6,SLAMF7,SLC11A1,TAP1,TAP2,TBX21,TLR8,TNFRSF11A,TNFRSF21,TNFSF13B,TNFSF18,ZC3H12A
#> gs_de_count gs_bg_count Expected gs_fuzzycluster gs_cluster_status
#> GO:0060333 34 82 4.55 1 Representative
#> GO:0019885 11 14 0.78 2 Representative
#> GO:0006958 9 19 1.05 3 Representative
#> GO:0006270 13 35 1.94 4 Representative
#> GO:0070098 15 63 3.50 5 Representative
#> GO:0006954 87 589 32.70 6 Representative
#> GO:0035455 7 20 1.11 7 Representative
#> GO:0032727 6 20 1.11 8 Representative
#> GO:0045124 7 32 1.78 9 Representative
#> GO:0002250 89 323 17.93 10 Representative