| Grouping-class {IRanges} | R Documentation |
In this man page, we call "grouping" the action of dividing a collection of NO objects into NG groups (some of which may be empty). The Grouping class and subclasses are containers for representing groupings.
Let's give a formal description of the Grouping core API:
Groups G_i are indexed from 1 to NG (1 <= i <= NG).
Objects O_j are indexed from 1 to NO (1 <= j <= NO).
Every object must belong to one group and only one.
Given that empty groups are allowed, NG can be greater than NO.
Grouping an empty collection of objects (NO = 0) is supported. In that case, all the groups are empty. And only in that case, NG can be zero too (meaning there are no groups).
If x is a Grouping object:
length(x):
Returns the number of groups (NG).
names(x):
Returns the names of the groups.
nobj(x):
Returns the number of objects (NO). Equivalent to length(togroup(x)).
Going from groups to objects:
x[[i]]:
Returns the indices of the objects (the j's) that belong to G_i.
The j's are returned in ascending order.
This provides the mapping from groups to objects (one-to-many mapping).
grouplength(x, i=NULL):
Returns the number of objects in G_i.
Works in a vectorized fashion (unlike x[[i]]).
grouplength(x) is equivalent to grouplength(x, seq_len(length(x))).
If i is not NULL, grouplength(x, i) is equivalent to
sapply(i, function(ii) length(x[[ii]])).
members(x, i):
Equivalent to x[[i]] if i is a single integer.
Otherwise, if i is an integer vector of arbitrary length, it's
equivalent to sort(unlist(sapply(i, function(ii) x[[ii]]))).
vmembers(x, L):
A version of members that works in a vectorized fashion with
respect to the L argument (L must be a list of integer
vectors). Returns lapply(L, function(i) members(x, i)).
Going from objects to groups:
togroup(x, j=NULL):
Returns the index i of the group that O_j belongs to.
This provides the mapping from objects to groups (many-to-one mapping).
Works in a vectorized fashion. togroup(x) is equivalent to
togroup(x, seq_len(nobj(x))): both return the entire mapping in
an integer vector of length NO.
If j is not NULL, togroup(x, j) is equivalent to
y <- togroup(x); y[j].
togrouplength(x, j=NULL):
Returns the number of objects that belong to the same group as O_j
(including O_j itself).
Equivalent to grouplength(x, togroup(x, j)).
Given that length, names and [[ are defined
for Grouping objects, those objects can be considered Sequence
objects. In particular, as.list works out-of-the-box on them.
One important property of any Grouping object x is
that unlist(as.list(x)) is always a permutation of
seq_len(nobj(x)). This is a direct consequence of the fact
that every object in the grouping belongs to one group and only
one.
[DOCUMENT ME]
A Partitioning container represents a block-grouping, i.e. a grouping
where each group contains objects that are neighbors in the original
collection of objects. More formally, a grouping x is a
block-grouping iff togroup(x) is sorted in increasing order
(not necessarily strictly increasing).
A block-grouping object can also be seen (and manipulated) as a Ranges object where all the ranges are adjacent starting at 1 (i.e. it covers the 1:NO interval with no overlap between the ranges).
Note that a Partitioning object is both: a particular type of Grouping
object and a particular type of Ranges object. Therefore all the
methods that are defined for Grouping and Ranges objects can also
be used on a Partitioning object. See ?Ranges for a description of
the Ranges API.
The Partitioning class is virtual with 2 concrete subclasses: PartitioningByEnd (only stores the end of the groups, allowing fast mapping from groups to objects), and PartitioningByWidth (only stores the width of the groups).
A Binning container represents a grouping where each observation is assigned to a group or bin. It is similar in nature to taking a the integer codes of a factor object and splitting it up by its levels (i.e. myFactor <- factor(...); split(as.integer(myFactor), myFactor)).
H2LGrouping(high2low=integer()):
[DOCUMENT ME]
Dups(high2low=integer()):
[DOCUMENT ME]
PartitioningByEnd(end=integer(), names=NULL):
Return the PartitioningByEnd object made of the partitions ending
at the values specified by end. end must contain
sorted non-negative integer values. If the names argument
is non NULL, it is used to name the partitions.
PartitioningByWidth(width=integer(), names=NULL):
Return the PartitioningByWidth object made of the partitions with
the widths specified by width. width must contain
non-negative integer values. If the names argument
is non NULL, it is used to name the partitions.
Binning(group=integer(), names=NULL):
Return the Binning object made from the group argument, which
takes a factor or positive valued integer vector. If the names
argument is non NULL, it is used to name the bins. When group
is a factor, the names are set to levels(group) unless
specified otherwise.
names argument
(to remain consistent with what `names<-` does on standard
vectors).
H. Pages and P. Aboyoun
Sequence-class, Ranges-class, IRanges-class, successiveIRanges, cumsum, diff
showClass("Grouping") # shows (some of) the known subclasses
## ---------------------------------------------------------------------
## A. H2LGrouping OBJECTS
## ---------------------------------------------------------------------
high2low <- c(NA, NA, 2, 2, NA, NA, NA, 6, NA, 1, 2, NA, 6, NA, NA, 2)
x <- H2LGrouping(high2low)
x
## The Grouping core API:
length(x)
nobj(x) # same as 'length(x)' for H2LGrouping objects
x[[1]]
x[[2]]
x[[3]]
x[[4]]
x[[5]]
grouplength(x) # same as 'unname(sapply(x, length))'
grouplength(x, 5:2)
members(x, 5:2) # all the members are put together and sorted
togroup(x)
togroup(x, 5:2)
togrouplength(x) # same as 'grouplength(x, togroup(x))'
togrouplength(x, 5:2)
## The Sequence API:
as.list(x)
sapply(x, length)
## ---------------------------------------------------------------------
## B. Dups OBJECTS
## ---------------------------------------------------------------------
x_dups <- as(x, "Dups")
x_dups
duplicated(x_dups) # same as 'duplicated(togroup(x_dups))'
### The purpose of a Dups object is to describe the groups of duplicated
### elements in a vector-like object:
x <- c(2, 77, 4, 4, 7, 2, 8, 8, 4, 99)
x_high2low <- high2low(x)
x_high2low # same length as 'x'
x_dups <- Dups(x_high2low)
x_dups
togroup(x_dups)
duplicated(x_dups)
togrouplength(x_dups) # frequency for each element
table(x)
## ---------------------------------------------------------------------
## C. Partitioning OBJECTS
## ---------------------------------------------------------------------
x <- PartitioningByEnd(end=c(4, 7, 7, 8, 15), names=LETTERS[1:5])
x # the 3rd partition is empty
## The Grouping core API:
length(x)
nobj(x)
x[[1]]
x[[2]]
x[[3]]
grouplength(x) # same as 'unname(sapply(x, length))' and 'width(x)'
togroup(x)
togrouplength(x) # same as 'grouplength(x, togroup(x))'
names(x)
## The Ranges core API:
start(x)
end(x)
width(x)
## The Sequence API:
as.list(x)
sapply(x, length)
## Replacing the names:
names(x)[3] <- "empty partition"
x
## Coercion to an IRanges object:
as(x, "IRanges")
## Other examples:
PartitioningByEnd(end=c(0, 0, 19), names=LETTERS[1:3])
PartitioningByEnd() # no partition
PartitioningByEnd(end=integer(9)) # all partitions are empty
## ---------------------------------------------------------------------
## D. RELATIONSHIP BETWEEN Partitioning OBJECTS AND successiveIRanges()
## ---------------------------------------------------------------------
mywidths <- c(4, 3, 0, 1, 7)
## The 3 following calls produce the same ranges:
x1 <- successiveIRanges(mywidths) # IRanges instance.
x2 <- PartitioningByEnd(end=cumsum(mywidths)) # PartitioningByEnd instance.
x3 <- PartitioningByWidth(width=mywidths) # PartitioningByWidth instance.
stopifnot(identical(as(x1, "PartitioningByEnd"), x2))
stopifnot(identical(as(x1, "PartitioningByWidth"), x3))
## ---------------------------------------------------------------------
## E. Binning OBJECTS
## ---------------------------------------------------------------------
set.seed(0)
x <- Binning(factor(sample(letters, 36, replace=TRUE), levels=letters))
x
grouplength(x)
togroup(x)
x[[2]]
x[["u"]]