Read in gene-sets from a GMT file

readGmt(..., uniqGenes = TRUE, namespace = NULL)



Named or unnamed characater string vector, giving file names of one or more GMT format files.


Logical, whether duplicated genes should be removed


Character, namespace of the gene-set. It can be used to specify namespace or sources of the gene-sets. If NULL is given, so no namespace is used and all gene-sets are assumed to come from the same unspecified namespace. The option can be helpful when gene-sets from multiple namespaces are jointly used.


A GmtList object, which is a S4-class wrapper of a list. Each element in the object is a list of (at least) three items:

  • gene-set name (field name), character string, accessible with gsName

  • gene-set description (field desc), character string, accessible with gsDesc

  • genes (field genes), a vector of character strings, , accessible with gsGenes

  • namespace (field namespace), accessible with gsNamespace


Currently, when namespace is set as NULL, no namespace is used. This may change in the future, since we may use file base name as the default namespace.


gmt_file <- system.file("extdata/exp.tissuemark.affy.roche.symbols.gmt", package="BioQC")
gmt_list <- readGmt(gmt_file)
gmt_nonUniqGenes_list <- readGmt(gmt_file, uniqGenes=FALSE)
gmt_namespace_list <- readGmt(gmt_file, uniqGenes=FALSE, namespace="myNamespace")

## suppose we have two lists of gene-sets to read in
test_gmt_file <- system.file("extdata/test.gmt", package="BioQC")
gmt_twons_list <- readGmt(gmt_file, test_gmt_file, namespace=c("BioQC", "test"))
## alternatively
gmt_twons_list <- readGmt(BioQC=gmt_file, test=test_gmt_file)