summaryP {Hmisc}R Documentation

Multi-way Summary of Proportions

Description

summaryP produces a tall and thin data frame containing numerators (freq) and denominators (denom) after stratifying the data by a series of variables. A special capability to group a series of related yes/no variables is included through the use of the ynbind function, for which the user specials a final argument label used to label the panel created for that group of related variables.

The plot method for summaryP displays proportions as a multi-panel dot chart using the lattice package's dotplot function with a special panel function. Numerators and denominators of proportions are also included as text, in the same colors as used by an optional groups variable. The formula argument used in the dotplot call is constructed, but the user can easily reorder the variables by specifying formula, with elements named val (category levels), var (classification variable name), freq (calculated result) plus the overall cross-classification variables excluding groups.

The latex method produces one or more LaTeX tabulars containing a table representation of the result, with optional side-by-side display if groups is specified. Multiple tabulars result from the presence of non-group stratification factors.

Usage

summaryP(formula, data = NULL, subset = NULL,
         na.action = na.retain, exclude1=TRUE, sort=TRUE,
         asna = c("unknown", "unspecified"), ...)
## S3 method for class 'summaryP'
plot(x, formula, groups=NULL, xlim = c(-.05, 1.05),
         text.at=NULL, cex.values = 0.5,
         key = list(columns = length(groupslevels), x = 0.75,
                    y = -0.04, cex = 0.9,
                    col = trellis.par.get('superpose.symbol')$col,
                    corner=c(0,1)),
         outerlabels=TRUE, autoarrange=TRUE, ...)
## S3 method for class 'summaryP'
latex(object, groups=NULL, file='', round=3,
                           size=NULL, append=TRUE, ...)

Arguments

formula

a formula with the variables for whose levels proportions are computed on the left hand side, and major classification variables on the right. The formula need to include any variable later used as groups, as the data summarization does not distinguish between superpositioning and paneling. For the plot method, formula can provide an overall to the default formula for dotplot().

data

an optional data frame

subset

an optional subsetting expression or vector

na.action

function specifying how to handle NAs. The default is to keep all NAs in the analysis frame.

exclude1

By default, summaryP removes redundant entries from tables for variables with only two levels. For example, if you print the proportion of females, you don't need to print the proportion of males. To override this, set exclude1=FALSE.

sort

set to FALSE to not sort category levels in descending order of global proportions

asna

character vector specifying level names to consider the same as NA. Set asna=NULL to not consider any.

x

an object produced by summaryP

groups

a character string containing the name of a superpositioning variable for obtaining further stratification within a horizontal line in the dot chart.

xlim

x-axis limits. Default is c(0,1).

text.at

specify to leave unused space to the right of each panel to prevent numerators and denominators from touching data points. text.at is the upper limit for scaling panels' x-axes but tick marks are only labeled up to max(xlim).

cex.values

character size to use for plotting numerators and denominators

key

a list to pass to the auto.key argument of dotplot. To place a key above the entire chart use auto.key=list(columns=2) for example.

outerlabels

by default if there are two conditioning variables besides groups, the latticeExtra package's useOuterStrips function is used to put strip labels in the margins, usually resulting in a much prettier chart. Set to FALSE to prevent usage of useOuterStrips.

autoarrange

If TRUE, the formula is re-arranged so that if there are two conditioning (paneling) variables, the variable with the most levels is taken as the vertical condition.

...

ignored

object

an object produced by summaryP

file

file name, defaults to writing to console

round

number of digits to the right of the decimal place for proportions

size

optional font size such as "small"

append

set to FALSE to start output over

Value

summaryP produces a data frame of class "summaryP". The plot method produces a lattice object of class "trellis". The latex method produces an object of class "latex" with an additional attribute ngrouplevels specifying the number of levels of any groups variable.

Author(s)

Frank Harrell
Department of Biostatistics
Vanderbilt University
f.harrell@vanderbilt.edu

See Also

bpplotM, summaryM, ynbind, pBlock

Examples

n <- 100
f <- function(na=FALSE) {
  x <- sample(c('N', 'Y'), n, TRUE)
  if(na) x[runif(100) < .1] <- NA
  x
}
set.seed(1)
d <- data.frame(x1=f(), x2=f(), x3=f(), x4=f(), x5=f(), x6=f(), x7=f(TRUE),
                age=rnorm(n, 50, 10),
                race=sample(c('Asian', 'Black/AA', 'White'), n, TRUE),
                sex=sample(c('Female', 'Male'), n, TRUE),
                treat=sample(c('A', 'B'), n, TRUE),
                region=sample(c('North America','Europe'), n, TRUE))
d <- upData(d, labels=c(x1='MI', x2='Stroke', x3='AKI', x4='Migraines',
                 x5='Pregnant', x6='Other event', x7='MD withdrawal',
                 race='Race', sex='Sex'))
dasna <- subset(d, region=='North America')
with(dasna, table(race, treat))
s <- summaryP(race + sex + ynbind(x1, x2, x3, x4, x5, x6, x7, label='Exclusions') ~
              region + treat, data=d)
# add exclude1=FALSE to include female category
plot(s, groups='treat')

plot(s, val ~ freq | region * var, groups='treat', outerlabels=FALSE)
# Much better looking if omit outerlabels=FALSE; see output at
# http://biostat.mc.vanderbilt.edu/HmiscNew#summaryP
# See more examples under bpplotM

# Make a chart where there is a block of variables that
# are only analyzed for males.  Keep redundant sex in block for demo.
# Leave extra space for numerators, denominators
sb <- summaryP(race + sex +
               pBlock(race, sex, label='Race: Males', subset=sex=='Male') ~
               region, data=d)
plot(sb, text.at=1.3)
plot(sb, groups='region', layout=c(1,3), key=list(space='top'),
     text.at=1.15)
## Not run: 
plot(s, groups='treat')
# plot(s, groups='treat', outerlabels=FALSE) for standard lattice output
plot(s, groups='region', key=list(columns=2, space='bottom'))

plot(summaryP(race + sex ~ region, data=d, exclude1=FALSE), col='green')

# Make your own plot using data frame created by summaryP
useOuterStrips(dotplot(val ~ freq | region * var, groups=treat, data=s,
        xlim=c(0,1), scales=list(y='free', rot=0), xlab='Fraction',
        panel=function(x, y, subscripts, ...) {
          denom <- s$denom[subscripts]
          x <- x / denom
          panel.dotplot(x=x, y=y, subscripts=subscripts, ...) }))

# Show marginal summary for all regions combined
s <- summaryP(race + sex ~ region, data=addMarginal(d, region))
plot(s, groups='region', key=list(space='top'), layout=c(1,2))

# Show marginal summaries for both race and sex
s <- summaryP(ynbind(x1, x2, x3, x4, label='Exclusions', sort=FALSE) ~
              race + sex, data=addMarginal(d, race, sex))
plot(s, val ~ freq | sex*race)

## End(Not run)

[Package Hmisc version 3.14-4 Index]