Compute fuzzy clusters of different gene sets, aiming to identify grouped categories that can better represent the distinct biological themes in the enrichment results

gs_fuzzyclustering(
  res_enrich,
  gtl = NULL,
  n_gs = nrow(res_enrich),
  gs_ids = NULL,
  similarity_matrix = NULL,
  similarity_threshold = 0.35,
  fuzzy_seeding_initial_neighbors = 3,
  fuzzy_multilinkage_rule = 0.5
)

Arguments

res_enrich

A data.frame object, storing the result of the functional enrichment analysis. See more in the main function, GeneTonic(), to check the formatting requirements (a minimal set of columns should be present).

gtl

A GeneTonic-list object, containing in its slots the arguments specified above: dds, res_de, res_enrich, and annotation_obj - the names of the list must be specified following the content they are expecting

n_gs

Integer value, corresponding to the maximal number of gene sets to be displayed

gs_ids

Character vector, containing a subset of gs_id as they are available in res_enrich. Lists the gene sets to be displayed.

similarity_matrix

A similarity matrix between gene sets. Can be e.g. computed with create_kappa_matrix() or create_jaccard_matrix() or a similar function, returning a symmetric matrix with numeric values (max = 1). If not provided, this will be computed on the fly with create_kappa_matrix()

similarity_threshold

A numeric value for the similarity matrix, used to determine the initial seeds as in the implementation of DAVID. Higher values will lead to more genesets being initially unclustered, leading to a functional classification result with fewer groups and fewer geneset members. Defaults to 0.35, recommended to not go below 0.3 (see DAVID help pages)

fuzzy_seeding_initial_neighbors

Integer value, corresponding to the minimum geneset number in a seeding group. Lower values will lead to the inclusion of more genesets in the functional groups, and may generate a lot of small size groups. Defaults to 3

fuzzy_multilinkage_rule

Numeric value, comprised between 0 and 1. This parameter will determine how the seeding groups merge with each other, by specifying the percentage of shared genesets required to merge the two subsets into one group. Higher values will give sharper separation between the groups of genesets. Defaults to 0.5 (50%)

Value

A data frame, shaped in a similar way as the originally provided res_enrich object, containing two extra columns: gs_fuzzycluster, to specify the identifier of the fuzzy cluster of genesets, and gs_cluster_status, which can specify whether the geneset is the "Representative" for that cluster or a simple "Member". Notably, the number of rows in the returned object can be higher than the original number of rows in res_enrich.

References

See https://david.ncifcrf.gov/helps/functional_classification.html#clustering for details on the original implementation

Examples

data(res_enrich_macrophage, package = "GeneTonic")
res_enrich <- shake_topGOtableResult(topgoDE_macrophage_IFNg_vs_naive)
#> Found 500 gene sets in `topGOtableResult` object.
#> Converting for usage in GeneTonic...
# taking a smaller subset
res_enrich_subset <- res_enrich[1:100, ]

fuzzy_subset <- gs_fuzzyclustering(
  res_enrich = res_enrich_subset,
  n_gs = nrow(res_enrich_subset),
  gs_ids = NULL,
  similarity_matrix = NULL,
  similarity_threshold = 0.35,
  fuzzy_seeding_initial_neighbors = 3,
  fuzzy_multilinkage_rule = 0.5
)

# show all genesets members of the first cluster
fuzzy_subset[fuzzy_subset$gs_fuzzycluster == "1", ]
#>                 gs_id                              gs_description gs_pvalue
#> GO:0060333 GO:0060333 interferon-gamma-mediated signaling pathway   1.2e-20
#> GO:0060337 GO:0060337         type I interferon signaling pathway   4.8e-13
#> GO:0034341 GO:0034341                response to interferon-gamma   2.2e-08
#> GO:0045087 GO:0045087                      innate immune response   1.1e-07
#> GO:0071346 GO:0071346       cellular response to interferon-gamma   8.5e-06
#> GO:0019221 GO:0019221         cytokine-mediated signaling pathway   7.4e-04
#>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         gs_genes
#> GO:0060333                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             B2M,CAMK2D,CIITA,GBP1,GBP2,HLA-A,HLA-B,HLA-C,HLA-DPA1,HLA-DPB1,HLA-DQA1,HLA-DQB1,HLA-DQB2,HLA-DRA,HLA-DRB1,HLA-DRB5,HLA-E,HLA-F,HLA-G,ICAM1,IRF1,IRF7,JAK2,MT2A,NLRC5,NMI,OAS2,PARP14,PML,SOCS1,STAT1,TRIM22,TRIM31,VCAM1
#> GO:0060337                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    GBP2,HLA-A,HLA-B,HLA-C,HLA-E,HLA-F,HLA-G,IFI27,IFI35,IFIT2,IFIT3,IFITM1,IFITM2,IFITM3,IRF1,IRF7,ISG20,NLRC5,OAS2,PSMB8,RSAD2,STAT1,STAT2,XAF1,ZBP1
#> GO:0034341                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 ACOD1,B2M,CALCOCO2,CAMK2D,CCL13,CCL15,CCL18,CCL5,CCL7,CCL8,CIITA,EDN1,GBP1,GBP2,GBP3,GBP4,GBP5,GBP6,GBP7,GCH1,HLA-A,HLA-B,HLA-C,HLA-DPA1,HLA-DPB1,HLA-DQA1,HLA-DQB1,HLA-DQB2,HLA-DRA,HLA-DRB1,HLA-DRB5,HLA-E,HLA-F,HLA-G,ICAM1,IFITM1,IFITM2,IFITM3,IL12RB1,IRF1,IRF7,JAK2,MEFV,MT2A,NLRC5,NMI,NUB1,OAS2,PARP14,PML,RAB20,SLC11A1,SOCS1,STAT1,TRIM22,TRIM31,UBD,VCAM1
#> GO:0045087 ACOD1,ADAM8,AIM2,APOBEC3A,APOBEC3D,APOBEC3G,APOL1,APPL2,B2M,C1QB,C1R,C1RL,C1S,C2,C3,C4A,C4B,CALCOCO2,CAMK2D,CASP4,CCL13,CCL15,CCL18,CCL5,CCL7,CCL8,CD300LF,CEACAM1,CFB,CFH,CIITA,CLEC10A,CLEC6A,COLEC12,CTSS,CX3CR1,CYLD,DDX60,DTX3L,EDN1,FCN1,GBP1,GBP2,GBP3,GBP4,GBP5,GBP6,GBP7,GCH1,GRAMD4,GSDMD,H2BC21,HLA-A,HLA-B,HLA-C,HLA-DPA1,HLA-DPB1,HLA-DQA1,HLA-DQB1,HLA-DQB2,HLA-DRA,HLA-DRB1,HLA-DRB5,HLA-E,HLA-F,HLA-G,ICAM1,ICAM2,IFI27,IFI35,IFIH1,IFIT2,IFIT3,IFIT5,IFITM1,IFITM2,IFITM3,IL12A,IL12RB1,IL27,IRF1,IRF7,ISG20,JAK2,JAK3,KLRK1,LAG3,LILRA2,LYN,MCOLN2,MEFV,MSRB1,MT2A,MUC1,NCF1,NLRC5,NMI,NOD2,NUB1,OAS2,OPTN,PARP14,PML,PSMB10,PSMB8,PSMB9,PSME1,PSME2,PYHIN1,RAB20,RELB,RIPK2,RSAD2,SERINC5,SERPING1,SLAMF1,SLAMF6,SLAMF7,SLAMF8,SLC11A1,SOCS1,STAT1,STAT2,TICAM2,TIFA,TLR10,TLR5,TLR7,TLR8,TRAFD1,TRIM22,TRIM31,UBD,VCAM1,XAF1,ZBP1
#> GO:0071346                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          ACOD1,B2M,CAMK2D,CCL13,CCL15,CCL18,CCL5,CCL7,CCL8,CIITA,EDN1,GBP1,GBP2,GBP3,GBP4,GBP5,GBP6,GBP7,HLA-A,HLA-B,HLA-C,HLA-DPA1,HLA-DPB1,HLA-DQA1,HLA-DQB1,HLA-DQB2,HLA-DRA,HLA-DRB1,HLA-DRB5,HLA-E,HLA-F,HLA-G,ICAM1,IL12RB1,IRF1,IRF7,JAK2,MT2A,NLRC5,NMI,OAS2,PARP14,PML,RAB20,SOCS1,STAT1,TRIM22,TRIM31,VCAM1
#> GO:0019221                                                                                                                                        ACSL1,AIM2,APPL2,B2M,BCL6,BIRC5,CAMK2D,CARD16,CASP4,CCL13,CCL15,CCL18,CCL5,CCL7,CCL8,CCRL2,CD300LF,CD40,CD74,CD80,CEACAM1,CIITA,CISH,CSF2RB,CX3CR1,CXCL10,CXCL11,CXCL5,CXCL9,CXCR3,CYLD,EDA,EDN1,FLT3LG,GBP1,GBP2,HGF,HLA-A,HLA-B,HLA-C,HLA-DPA1,HLA-DPB1,HLA-DQA1,HLA-DQB1,HLA-DQB2,HLA-DRA,HLA-DRB1,HLA-DRB5,HLA-E,HLA-F,HLA-G,ICAM1,IFI27,IFI35,IFIT2,IFIT3,IFITM1,IFITM2,IFITM3,IL12A,IL12RB1,IL12RB2,IL15,IL15RA,IL18BP,IL1RN,IL22RA2,IL27,IL31RA,IL32,IL3RA,IRF1,IRF7,ISG20,JAK2,JAK3,KARS1,MMP9,MT2A,MUC1,NLRC5,NMI,NOD2,OAS2,OSM,PADI2,PARP14,PIM1,PML,PSMB10,PSMB8,PSMB9,PSME1,PSME2,RIPK2,RSAD2,S1PR1,SOCS1,STAP1,STAT1,STAT2,TICAM2,TNFRSF11A,TNFSF13B,TNFSF18,TNFSF8,TRIM22,TRIM31,VCAM1,XAF1,ZBP1
#>            gs_de_count gs_bg_count Expected gs_fuzzycluster gs_cluster_status
#> GO:0060333          34          82     4.55               1    Representative
#> GO:0060337          25          80     4.44               1            Member
#> GO:0034341          58         174     9.66               1            Member
#> GO:0045087         136         710    39.42               1            Member
#> GO:0071346          49         158     8.77               1            Member
#> GO:0019221         111         656    36.42               1            Member

# list only the representative clusters
head(fuzzy_subset[fuzzy_subset$gs_cluster_status == "Representative", ], 10)
#>                 gs_id
#> GO:0060333 GO:0060333
#> GO:0019885 GO:0019885
#> GO:0006958 GO:0006958
#> GO:0006270 GO:0006270
#> GO:0070098 GO:0070098
#> GO:0006954 GO:0006954
#> GO:0035455 GO:0035455
#> GO:0032727 GO:0032727
#> GO:0045124 GO:0045124
#> GO:0002250 GO:0002250
#>                                                                               gs_description
#> GO:0060333                                       interferon-gamma-mediated signaling pathway
#> GO:0019885 antigen processing and presentation of endogenous peptide antigen via MHC class I
#> GO:0006958                                          complement activation, classical pathway
#> GO:0006270                                                        DNA replication initiation
#> GO:0070098                                              chemokine-mediated signaling pathway
#> GO:0006954                                                             inflammatory response
#> GO:0035455                                                      response to interferon-alpha
#> GO:0032727                                positive regulation of interferon-alpha production
#> GO:0045124                                                     regulation of bone resorption
#> GO:0002250                                                          adaptive immune response
#>            gs_pvalue
#> GO:0060333  1.20e-20
#> GO:0019885  4.20e-12
#> GO:0006958  2.50e-07
#> GO:0006270  1.90e-08
#> GO:0070098  1.20e-06
#> GO:0006954  2.90e-08
#> GO:0035455  6.20e-05
#> GO:0032727  5.50e-04
#> GO:0045124  1.52e-03
#> GO:0002250  9.20e-23
#>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   gs_genes
#> GO:0060333                                                                                                                                                                                                                                                                                                                                                       B2M,CAMK2D,CIITA,GBP1,GBP2,HLA-A,HLA-B,HLA-C,HLA-DPA1,HLA-DPB1,HLA-DQA1,HLA-DQB1,HLA-DQB2,HLA-DRA,HLA-DRB1,HLA-DRB5,HLA-E,HLA-F,HLA-G,ICAM1,IRF1,IRF7,JAK2,MT2A,NLRC5,NMI,OAS2,PARP14,PML,SOCS1,STAT1,TRIM22,TRIM31,VCAM1
#> GO:0019885                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   B2M,ERAP2,HLA-A,HLA-B,HLA-C,HLA-E,HLA-F,HLA-G,TAP1,TAP2,TAPBP
#> GO:0006958                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        C1QB,C1R,C1RL,C1S,C2,C3,C4A,C4B,SERPING1
#> GO:0006270                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           CCNE2,CDC45,CDC6,CDT1,MCM10,MCM2,MCM4,MCM5,MCM6,MCM7,ORC1,POLE2,TICRR
#> GO:0070098                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        CCL13,CCL15,CCL18,CCL5,CCL7,CCL8,CCRL2,CX3CR1,CXCL10,CXCL11,CXCL5,CXCL9,CXCR3,EDN1,PADI2
#> GO:0006954                                                                       ACOD1,ADAM8,ADGRE5,ADORA3,AGT,AIM2,AOC3,APOL2,APOL3,APPL2,BCL6,C1QB,C1R,C1S,C2,C3,C4A,C4B,CASP4,CCL13,CCL15,CCL18,CCL5,CCL7,CCL8,CCRL2,CD28,CD40,CDH5,CFB,CFH,CIITA,CR1L,CRHBP,CXCL10,CXCL11,CXCL5,CXCL9,CXCR3,CYLD,CYSLTR1,GBP5,GGT5,GHRL,GSDMD,HGF,HLA-E,ICAM1,IDO1,IL15,IL1RN,IL22RA2,IL27,JAK2,KARS1,KLF4,LRRK2,LY75,LYN,MEFV,MMP25,MMP9,NLRP12,NMI,NOD2,OSM,P2RX7,POLB,PTGER4,RELB,RIPK2,SBNO2,SERPING1,SLAMF8,SLC11A1,STAP1,SUCNR1,TGM2,TICAM2,TLR10,TLR5,TLR7,TLR8,TNFAIP6,TNFRSF11A,VCAM1,ZC3H12A
#> GO:0035455                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   IFIT2,IFIT3,IFITM1,IFITM2,IFITM3,LAMP3,PYHIN1
#> GO:0032727                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                IFIH1,IRF7,RIPK2,STAT1,TLR7,TLR8
#> GO:0045124                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   ADAM8,CD38,P2RX7,PDK4,S1PR1,TMEM119,TNFRSF11A
#> GO:0002250 B2M,BCL6,BTN3A1,BTN3A2,BTN3A3,C1QB,C1R,C1RL,C1S,C2,C3,C4A,C4B,CD1A,CD1C,CD274,CD28,CD40,CD7,CD74,CD80,CEACAM1,CLEC10A,CLEC6A,CTLA4,CTSS,ERAP2,EXO1,FGL1,FGL2,GCNT3,GPR183,HLA-A,HLA-B,HLA-C,HLA-DMA,HLA-DMB,HLA-DOA,HLA-DOB,HLA-DPA1,HLA-DPB1,HLA-DQA1,HLA-DQB1,HLA-DQB2,HLA-DRA,HLA-DRB1,HLA-DRB5,HLA-E,HLA-F,HLA-G,ICAM1,IL12A,IL12RB1,IL18BP,IL27,IRF1,IRF7,ITK,JAK2,JAK3,KLRK1,LAG3,LAMP3,LILRA1,LILRB3,LYN,MCOLN2,P2RX7,PDCD1,PDCD1LG2,RELB,RIPK2,RNF19B,RSAD2,SERPING1,SIT1,SLAMF1,SLAMF6,SLAMF7,SLC11A1,TAP1,TAP2,TBX21,TLR8,TNFRSF11A,TNFRSF21,TNFSF13B,TNFSF18,ZC3H12A
#>            gs_de_count gs_bg_count Expected gs_fuzzycluster gs_cluster_status
#> GO:0060333          34          82     4.55               1    Representative
#> GO:0019885          11          14     0.78               2    Representative
#> GO:0006958           9          19     1.05               3    Representative
#> GO:0006270          13          35     1.94               4    Representative
#> GO:0070098          15          63     3.50               5    Representative
#> GO:0006954          87         589    32.70               6    Representative
#> GO:0035455           7          20     1.11               7    Representative
#> GO:0032727           6          20     1.11               8    Representative
#> GO:0045124           7          32     1.78               9    Representative
#> GO:0002250          89         323    17.93              10    Representative