Make names of gene-sets unique by namespace, and member genes of gene-sets unique

uniqGenesetsByNamespace(gmtList)

Arguments

gmtList

A GmtList object, probably from readGmt. The object must have namespaces defined by setNamespace.

The function make sure that

  • names of gene-sets within each namespace are unique, by merging gene-sets with duplicated names

  • genes within each gene-set are unique, by removing duplicated genes

Gene-sets with duplicated names and different desc are merged, desc are made unique, and in case of multiple values, concatenated (with | as the collapse character).

Value

A GmtList object, with unique gene-sets and unique gene lists. If not already present, a new item namespace is appended to each list element in the GmtList object, recording the namespace used to make gene-sets unique. The order of the returned GmtList object is given by the unique gene-set name of the input object.

Examples

myGmtList <- GmtList(list(list(name="GeneSet1", desc="Namespace1", genes=LETTERS[1:3]),
  list(name="GeneSet2", desc="Namespace1", genes=rep(LETTERS[4:6],2)),
  list(name="GeneSet1", desc="Namespace1", genes=LETTERS[4:6]),
  list(name="GeneSet3", desc="Namespace2", genes=LETTERS[1:5])))
 
print(myGmtList)
#> A gene-set list in GMT format with 4 genesets
#> Gene-sets:
#>   GeneSet1 (Namespace1,n=3): A,B,C
#>   GeneSet2 (Namespace1,n=6): D,E,F,...
#>   GeneSet1 (Namespace1,n=3): D,E,F
#>   GeneSet3 (Namespace2,n=5): A,B,C,...
myGmtList <- setNamespace(myGmtList, namespace=function(x) x$desc)
myUniqGmtList <- uniqGenesetsByNamespace(myGmtList)
print(myUniqGmtList)
#> A gene-set list in GMT format with 3 genesets
#> Namespaces:
#>   [1] Namespace1 (n=2)
#>   [2] Namespace2 (n=1)
#> Gene-sets:
#>   GeneSet1 (Namespace1,n=6): A,B,C,...
#>   GeneSet2 (Namespace1,n=3): D,E,F
#>   GeneSet3 (Namespace2,n=5): A,B,C,...