Title: | Models for Data from Unmarked Animals |
---|---|
Description: | Fits hierarchical models of animal abundance and occurrence to data collected using survey methods such as point counts, site occupancy sampling, distance sampling, removal sampling, and double observer sampling. Parameters governing the state and observation processes can be modeled as functions of covariates. References: Kellner et al. (2023) <doi:10.1111/2041-210X.14123>, Fiske and Chandler (2011) <doi:10.18637/jss.v043.i10>. |
Authors: | Richard Chandler [aut], Ken Kellner [cre, aut], Ian Fiske [aut], David Miller [aut], Andy Royle [aut], Jeff Hostetler [aut], Rebecca Hutchinson [aut], Adam Smith [aut], Lea Pautrel [aut], Marc Kery [ctb], Mike Meredith [ctb], Auriel Fournier [ctb], Ariel Muldoon [ctb], Chris Baker [ctb] |
Maintainer: | Ken Kellner <[email protected]> |
License: | GPL (>=3) |
Version: | 1.4.3.9007 |
Built: | 2025-01-09 15:34:44 UTC |
Source: | https://github.com/biodiverse/unmarked |
Fits hierarchical models of animal occurrence and abundance to data collected on species that may be detected imperfectly. Models include single- and multi-season site occupancy models, binomial N-mixture models, and multinomial N-mixture models. The data can arise from survey methods such as occurrence sampling, temporally replicated counts, removal sampling, double observer sampling, and distance sampling. Parameters governing the state and observation processes can be modeled as functions of covariates. General treatment of these models can be found in MacKenzie et al. (2006) and Royle and Dorazio (2008). The primary reference for the package is Fiske and Chandler (2011).
Overview of Model-fitting Functions:
occu
fits occurrence models with no linkage between
abundance and detection (MacKenzie et al. 2002).
occuRN
fits abundance models to presence/absence data by
exploiting the link between detection probability and abundance (Royle and
Nichols 2003).
occuFP
fits occupancy models to data characterized by
false negatives and false positive detections (e.g., Royle and Link
[2006] and Miller et al. [2011]).
occuMulti
fits multi-species occupancy model of Rota et
al. [2016].
colext
fits the mutli-season occupancy model of
MacKenzie et al. (2003).
pcount
fits N-mixture models (aka binomial mixture models) to
repeated count data (Royle 2004a, Kery et al 2005).
distsamp
fits the distance sampling model of
Royle et al. (2004) to distance data recorded in discrete intervals.
gdistsamp
fits the generalized distance sampling model
described by Chandler et al. (2011) to distance data recorded in
discrete intervals.
gpcount
fits the generalized N-mixture model
described by Chandler et al. (2011) to repeated count data collected
using the robust design.
multinomPois
fits the multinomial-Poisson model of Royle (2004b)
to data collected using methods such as removal sampling or double observer
sampling.
gmultmix
fits a generalized form of the multinomial-mixture model
of Royle (2004b) that allows for estimating availability and detection
probability.
pcountOpen
fits the open population model of Dail and
Madsen (2011) to repeated count data. This is a genearlized form of the
Royle (2004a) N-mixture model that includes parameters for recruitment
and apparent survival.
Data: All data are passed to unmarked's estimation functions as
a formal S4 class called an unmarkedFrame, which has child classes
for each model type. This allows metadata (eg as distance interval cut
points, measurement units, etc...) to be stored with the response and
covariate data. See unmarkedFrame
for a detailed
description of unmarkedFrames and how to create them.
Model Specification: unmarked's
model-fitting functions allow specification of covariates for both the
state process and the detection process. For two-level hierarchical
models, (eg occu
, occuRN
, pcount
,
multinomPois
, distsamp
) covariates for the
detection process (at the site or observation level) and the state
process (at the site level) are specified with a double right-hand sided
formula, in that order. Such a formula looks like
where through
are additive covariates of
the process of interest. Using two tildes in a single formula
differs from standard R convention, but it is informative about the model
being fit. The meaning of these covariates, or what they
model, is full described in the help files for the individual functions
and is not the same for all functions. For models with more than two
processes (eg
colext
, gmultmix
,
pcountOpen
), single right-hand sided formulas (only one
tilde) are used to model each parameter.
Utility Functions: unmarked contains several utility
functions for organizing data into the form required by its model-fitting
functions. csvToUMF
converts an appropriately
formated comma-separated values (.csv) file to a list containing the
components required by model-fitting functions.
Ian Fiske, Richard Chandler, Andy Royle, Marc Kery, David Miller, and Rebecca Hutchinson
Chandler, R. B., J. A. Royle, and D. I. King. 2011. Inference about density and temporary emigration in unmarked populations. Ecology 92:1429-1435.
Dail, D. and L. Madsen. 2011. Models for estimating abundance from repeated counts of an open metapopulation. Biometrics 67:577-587.
Fiske, I. and R. B. Chandler. 2011. unmarked: An R package for fitting hierarchical models of wildlife occurrence and abundance. Journal of Statistical Software 43:1–23.
Kery, M., Royle, J. A., and Schmid, H. 2005 Modeling avian abundance from replicated counts using binomial mixture models. Ecological Applications 15:1450–1461.
MacKenzie, D. I., J. D. Nichols, G. B. Lachman, S. Droege, J. A. Royle, and C. A. Langtimm. 2002. Estimating site occupancy rates when detection probabilities are less than one. Ecology 83: 2248–2255.
MacKenzie, D. I., J. D. Nichols, J. E. Hines, M. G. Knutson, and A. B. Franklin. 2003. Estimating site occupancy, colonization, and local extinction when a species is detected imperfectly. Ecology 84:2200–2207.
MacKenzie, D. I., J. D. Nichols, J. A. Royle, K. H. Pollock, L. L. Bailey, and J. E. Hines. 2006. Occupancy Estimation and Modeling. Amsterdam: Academic Press.
Miller, D.A., J.D. Nichols, B.T. McClintock, E.H.C. Grant, L.L. Bailey, and L.A. Weir. 2011. Improving occupancy estimation when two types of observational error occur: non-detection and species misidentification. Ecology 92:1422-1428.
Rota, C.T., et al. 2016. A multi-species occupancy model for two or more interacting species. Methods in Ecology and Evolution 7: 1164-1173.
Royle, J. A. 2004a. N-Mixture models for estimating population size from spatially replicated counts. Biometrics 60:108–105.
Royle, J. A. 2004b. Generalized estimators of avian abundance from count survey data. Animal Biodiversity and Conservation 27:375–386.
Royle, J. A., D. K. Dawson, and S. Bates. 2004. Modeling abundance effects in distance sampling. Ecology 85:1591–1597.
Royle, J. A., and R. M. Dorazio. 2006. Hierarchical models of animal abundance and occurrence. Journal Of Agricultural Biological And Environmental Statistics 11:249–263.
Royle, J.A., and W.A. Link. 2006. Generalized site occupancy models allowing for false positive and false negative errors. Ecology 87:835-841.
Royle, J. A. and R. M. Dorazio. 2008. Hierarchical Modeling and Inference in Ecology. Academic Press.
Royle, J. A. and J. D. Nichols. 2003. Estimating Abundance from Repeated Presence-Absence Data or Point Counts. Ecology, 84:777–790.
Sillett, S. and Chandler, R.B. and Royle, J.A. and Kery, M. and Morrison, S.A. In Press. Hierarchical distance sampling models to estimate population size and habitat-specific abundance of an island endemic. Ecological Applications
## An example site-occupancy analysis # Simulate occupancy data set.seed(344) nSites <- 100 nReps <- 5 covariates <- data.frame(veght=rnorm(nSites), habitat=factor(c(rep('A', 50), rep('B', 50)))) psipars <- c(-1, 1, -1) ppars <- c(1, -1, 0) X <- model.matrix(~veght+habitat, covariates) # design matrix psi <- plogis(X %*% psipars) p <- plogis(X %*% ppars) y <- matrix(NA, nSites, nReps) z <- rbinom(nSites, 1, psi) # true occupancy state for(i in 1:nSites) { y[i,] <- rbinom(nReps, 1, z[i]*p[i]) } # Organize data and look at it umf <- unmarkedFrameOccu(y = y, siteCovs = covariates) head(umf) summary(umf) # Fit some models fm1 <- occu(~1 ~1, umf) fm2 <- occu(~veght+habitat ~veght+habitat, umf) fm3 <- occu(~veght ~veght+habitat, umf) # Model selection fms <- fitList(m1=fm1, m2=fm2, m3=fm3) modSel(fms) # Empirical Bayes estimates of the number of sites occupied sum(bup(ranef(fm3), stat="mode")) # Sum of posterior modes sum(z) # Actual # Model-averaged prediction and plots # psi in each habitat type newdata1 <- data.frame(habitat=c('A', 'B'), veght=0) Epsi1 <- predict(fms, type="state", newdata=newdata1) with(Epsi1, { plot(1:2, Predicted, xaxt="n", xlim=c(0.5, 2.5), ylim=c(0, 0.5), xlab="Habitat", ylab=expression(paste("Probability of occurrence (", psi, ")")), cex.lab=1.2, pch=16, cex=1.5) axis(1, 1:2, c('A', 'B')) arrows(1:2, Predicted-SE, 1:2, Predicted+SE, angle=90, code=3, length=0.05) }) # psi and p as functions of vegetation height newdata2 <- data.frame(habitat=factor('A', levels=c('A','B')), veght=seq(-2, 2, length=50)) Epsi2 <- predict(fms, type="state", newdata=newdata2, appendData=TRUE) Ep <- predict(fms, type="det", newdata=newdata2, appendData=TRUE) op <- par(mfrow=c(2, 1), mai=c(0.9, 0.8, 0.2, 0.2)) plot(Predicted~veght, Epsi2, type="l", lwd=2, ylim=c(0,1), xlab="Vegetation height (standardized)", ylab=expression(paste("Probability of occurrence (", psi, ")"))) lines(lower ~ veght, Epsi2, col=gray(0.7)) lines(upper ~ veght, Epsi2, col=gray(0.7)) plot(Predicted~veght, Ep, type="l", lwd=2, ylim=c(0,1), xlab="Vegetation height (standardized)", ylab=expression(paste("Detection probability (", italic(p), ")"))) lines(lower~veght, Ep, col=gray(0.7)) lines(upper~veght, Ep, col=gray(0.7)) par(op)
## An example site-occupancy analysis # Simulate occupancy data set.seed(344) nSites <- 100 nReps <- 5 covariates <- data.frame(veght=rnorm(nSites), habitat=factor(c(rep('A', 50), rep('B', 50)))) psipars <- c(-1, 1, -1) ppars <- c(1, -1, 0) X <- model.matrix(~veght+habitat, covariates) # design matrix psi <- plogis(X %*% psipars) p <- plogis(X %*% ppars) y <- matrix(NA, nSites, nReps) z <- rbinom(nSites, 1, psi) # true occupancy state for(i in 1:nSites) { y[i,] <- rbinom(nReps, 1, z[i]*p[i]) } # Organize data and look at it umf <- unmarkedFrameOccu(y = y, siteCovs = covariates) head(umf) summary(umf) # Fit some models fm1 <- occu(~1 ~1, umf) fm2 <- occu(~veght+habitat ~veght+habitat, umf) fm3 <- occu(~veght ~veght+habitat, umf) # Model selection fms <- fitList(m1=fm1, m2=fm2, m3=fm3) modSel(fms) # Empirical Bayes estimates of the number of sites occupied sum(bup(ranef(fm3), stat="mode")) # Sum of posterior modes sum(z) # Actual # Model-averaged prediction and plots # psi in each habitat type newdata1 <- data.frame(habitat=c('A', 'B'), veght=0) Epsi1 <- predict(fms, type="state", newdata=newdata1) with(Epsi1, { plot(1:2, Predicted, xaxt="n", xlim=c(0.5, 2.5), ylim=c(0, 0.5), xlab="Habitat", ylab=expression(paste("Probability of occurrence (", psi, ")")), cex.lab=1.2, pch=16, cex=1.5) axis(1, 1:2, c('A', 'B')) arrows(1:2, Predicted-SE, 1:2, Predicted+SE, angle=90, code=3, length=0.05) }) # psi and p as functions of vegetation height newdata2 <- data.frame(habitat=factor('A', levels=c('A','B')), veght=seq(-2, 2, length=50)) Epsi2 <- predict(fms, type="state", newdata=newdata2, appendData=TRUE) Ep <- predict(fms, type="det", newdata=newdata2, appendData=TRUE) op <- par(mfrow=c(2, 1), mai=c(0.9, 0.8, 0.2, 0.2)) plot(Predicted~veght, Epsi2, type="l", lwd=2, ylim=c(0,1), xlab="Vegetation height (standardized)", ylab=expression(paste("Probability of occurrence (", psi, ")"))) lines(lower ~ veght, Epsi2, col=gray(0.7)) lines(upper ~ veght, Epsi2, col=gray(0.7)) plot(Predicted~veght, Ep, type="l", lwd=2, ylim=c(0,1), xlab="Vegetation height (standardized)", ylab=expression(paste("Detection probability (", italic(p), ")"))) lines(lower~veght, Ep, col=gray(0.7)) lines(upper~veght, Ep, col=gray(0.7)) par(op)
Methods for bracket extraction [
in Package ‘unmarked’
## S4 method for signature 'unmarkedEstimateList,ANY,ANY,ANY' x[i, j, drop] ## S4 method for signature 'unmarkedFit,ANY,ANY,ANY' x[i, j, drop] ## S4 method for signature 'unmarkedFrame,numeric,numeric,missing' x[i, j] ## S4 method for signature 'unmarkedFrame,list,missing,missing' x[i, j] ## S4 method for signature 'unmarkedMultFrame,missing,numeric,missing' x[i, j] ## S4 method for signature 'unmarkedMultFrame,numeric,missing,missing' x[i, j] ## S4 method for signature 'unmarkedFrameGMM,numeric,missing,missing' x[i, j] ## S4 method for signature 'unmarkedFrameGDS,numeric,missing,missing' x[i, j] ## S4 method for signature 'unmarkedFramePCO,numeric,missing,missing' x[i, j]
## S4 method for signature 'unmarkedEstimateList,ANY,ANY,ANY' x[i, j, drop] ## S4 method for signature 'unmarkedFit,ANY,ANY,ANY' x[i, j, drop] ## S4 method for signature 'unmarkedFrame,numeric,numeric,missing' x[i, j] ## S4 method for signature 'unmarkedFrame,list,missing,missing' x[i, j] ## S4 method for signature 'unmarkedMultFrame,missing,numeric,missing' x[i, j] ## S4 method for signature 'unmarkedMultFrame,numeric,missing,missing' x[i, j] ## S4 method for signature 'unmarkedFrameGMM,numeric,missing,missing' x[i, j] ## S4 method for signature 'unmarkedFrameGDS,numeric,missing,missing' x[i, j] ## S4 method for signature 'unmarkedFramePCO,numeric,missing,missing' x[i, j]
x |
Object of appropriate S4 class |
i |
Row numbers |
j |
Observation numbers (eg occasions, distance classes, etc...) |
drop |
Not currently used |
Extract a unmarkedEstimate object from an unmarkedEstimateList by name (either 'det' or 'state')
Extract a unmarkedEstimate object from an unmarkedFit by name (either 'det' or 'state')
Extract observations from an unmarkedFrame.
Extract rows from an unmarkedFrame
Extract rows and observations from an unmarkedFrame
Extract primary sampling periods from an unmarkedMultFrame
List is the index of observations to subset for each site.
Extract rows (sites) from an unmarkedMultFrame
Extract rows (sites) from an unmarkedFrameGMM object
Extract rows (sites) from an unmarkedFrameGDS object
Extract rows (sites) from an unmarkedFramePCO object
data(mallard) mallardUMF <- unmarkedFramePCount(mallard.y, siteCovs = mallard.site, obsCovs = mallard.obs) summary(mallardUMF) mallardUMF[1:5,] mallardUMF[,1:2] mallardUMF[1:5, 1:2]
data(mallard) mallardUMF <- unmarkedFramePCount(mallard.y, siteCovs = mallard.site, obsCovs = mallard.obs) summary(mallardUMF) mallardUMF[1:5,] mallardUMF[,1:2] mallardUMF[1:5, 1:2]
Methods for function backTransform
in Package ‘unmarked’.
This converts from link-scale to original-scale
## S4 method for signature 'unmarkedFit' backTransform(obj, type) ## S4 method for signature 'unmarkedEstimate' backTransform(obj)
## S4 method for signature 'unmarkedFit' backTransform(obj, type) ## S4 method for signature 'unmarkedEstimate' backTransform(obj)
obj |
Object of appropriate S4 class |
type |
one of names(obj), eg 'state' or 'det' |
Typically done internally
Back-transform a parameter from a fitted model. Only possible if no covariates are present. Must specify argument type as one of the values returned by names(obj).
Back-transform a predicted value created by
linearComb
. This is done internally by predict
but
can be done explicitly by user.
## Not run: data(mallard) mallardUMF <- unmarkedFramePCount(mallard.y, siteCovs = mallard.site, obsCovs = mallard.obs) (fm <- pcount(~ 1 ~ forest, mallardUMF)) # Fit a model backTransform(fm, type="det") # This works because there are no detection covariates #backTransform(fm, type="state") # This doesn't work because covariates are present lc <- linearComb(fm, c(1, 0), type="state") # Estimate abundance on the log scale when forest=0 backTransform(lc) # Abundance on the original scale ## End(Not run)
## Not run: data(mallard) mallardUMF <- unmarkedFramePCount(mallard.y, siteCovs = mallard.site, obsCovs = mallard.obs) (fm <- pcount(~ 1 ~ forest, mallardUMF)) # Fit a model backTransform(fm, type="det") # This works because there are no detection covariates #backTransform(fm, type="state") # This doesn't work because covariates are present lc <- linearComb(fm, c(1, 0), type="state") # Estimate abundance on the log scale when forest=0 backTransform(lc) # Abundance on the original scale ## End(Not run)
Data frames for 2 species from the breeding bird survey (BBS). Each
data frame has a row for each site and columns for each sampling
event. There is a point count and occurrence–designated by
.bin
– version for each species.
data(birds)
data(birds)
catbird
A data frame of point count observations for the catbird.
catbird.bin
A data frame of occurrence observations for the catbird.
woodthrush
A data frame of point count observations for the wood thrush.
woodthrush.bin
A data frame of point count observations for the wood thrush.
Royle J. N-mixture models for estimating population size from spatially replicated counts. Biometrics. 2004. 60(1):108–115.
data(birds)
data(birds)
Extract coefficients
## S4 method for signature 'unmarkedFit' coef(object, type, altNames = TRUE, fixedOnly=TRUE) ## S4 method for signature 'unmarkedEstimate' coef(object, altNames = TRUE, fixedOnly=TRUE, ...) ## S4 method for signature 'linCombOrBackTrans' coef(object)
## S4 method for signature 'unmarkedFit' coef(object, type, altNames = TRUE, fixedOnly=TRUE) ## S4 method for signature 'unmarkedEstimate' coef(object, altNames = TRUE, fixedOnly=TRUE, ...) ## S4 method for signature 'linCombOrBackTrans' coef(object)
object |
Object of appropriate S4 class |
type |
Either 'state' or 'det' |
altNames |
Return specific names for parameter estimates? |
fixedOnly |
Return only fixed effect parameters? |
... |
Further arguments. Not currently used |
A named numeric vector of parameter estimates.
Object from linearComb
unmarkedEstimate object
Fitted model
Estimate parameters of the colonization-extinction model, including covariate-dependent rates and detection process.
colext(psiformula= ~1, gammaformula = ~ 1, epsilonformula = ~ 1, pformula = ~ 1, data, starts, method="BFGS", se=TRUE, ...)
colext(psiformula= ~1, gammaformula = ~ 1, epsilonformula = ~ 1, pformula = ~ 1, data, starts, method="BFGS", se=TRUE, ...)
psiformula |
Right-hand sided formula for the initial probability of occupancy at each site. |
gammaformula |
Right-hand sided formula for colonization probability. |
epsilonformula |
Right-hand sided formula for extinction probability. |
pformula |
Right-hand sided formula for detection probability. |
data |
unmarkedMultFrame object that supplies the data (see |
starts |
optionally, initial values for parameters in the optimization. |
method |
Optimization method used by |
se |
logical specifying whether or not to compute standard errors. |
... |
Additional arguments to optim, such as lower and upper bounds |
This function fits the colonization-extinction model of
MacKenzie et al (2003). The colonization and extinction rates can be
modeled with covariates that vary yearly at each site using a logit
link. These covariates are supplied by special unmarkedMultFrame
yearlySiteCovs
slot. These parameters are specified using the
gammaformula
and epsilonformula
arguments. The initial
probability of occupancy is modeled by covariates specified in the
psiformula
.
The conditional detection rate can also be modeled as a function of
covariates that vary at the secondary sampling period (ie., repeat
visits). These covariates are specified by the first part of the
formula
argument and the data is supplied via the usual
obsCovs
slot.
The projected and smoothed trajectories (Weir et al 2009) can be
obtained from the smoothed.mean
and projected.mean
slots
(see examples).
unmarkedFitColExt object describing model fit.
MacKenzie, D.I. et al. (2002) Estimating Site Occupancy Rates When Detection Probabilities Are Less Than One. Ecology, 83(8), 2248-2255.
MacKenzie, D. I., J. D. Nichols, J. E. Hines, M. G. Knutson, and A. B. Franklin. 2003. Estimating site occupancy, colonization, and local extinction when a species is detected imperfectly. Ecology 84:2200–2207.
MacKenzie, D. I. et al. (2006) Occupancy Estimation and Modeling.Amsterdam: Academic Press.
Weir L. A., Fiske I. J., Royle J. (2009) Trends in Anuran Occupancy from Northeastern States of the North American Amphibian Monitoring Program. Herpetological Conservation and Biology. 4(3):389-402.
nonparboot
, unmarkedMultFrame
, and formatMult
# Fake data R <- 4 # number of sites J <- 3 # number of secondary sampling occasions T <- 2 # number of primary periods y <- matrix(c( 1,1,0, 0,0,0, 0,0,0, 0,0,0, 1,1,1, 1,1,0, 1,0,1, 0,0,1), nrow=R, ncol=J*T, byrow=TRUE) y site.covs <- data.frame(x1=1:4, x2=factor(c('A','B','A','B'))) site.covs yearly.site.covs <- list( year = matrix(c( 'year1', 'year2', 'year1', 'year2', 'year1', 'year2', 'year1', 'year2'), nrow=R, ncol=T, byrow=TRUE) ) yearly.site.covs obs.covs <- list( x4 = matrix(c( -1,0,1, -1,1,1, -2,0,0, 0,0,2, -3,1,0, 1,1,2, 0,0,0, 0,1,-1), nrow=R, ncol=J*T, byrow=TRUE), x5 = matrix(c( 'a','b','c', 'a','b','c', 'd','b','a', 'd','b','a', 'a','a','c', 'd','b','a', 'a','b','a', 'd','b','a'), nrow=R, ncol=J*T, byrow=TRUE)) obs.covs umf <- unmarkedMultFrame(y=y, siteCovs=site.covs, yearlySiteCovs=yearly.site.covs, obsCovs=obs.covs, numPrimary=2) # organize data umf # look at data summary(umf) # summarize fm <- colext(~1, ~1, ~1, ~1, umf) # fit a model fm ## Not run: # Real data data(frogs) umf <- formatMult(masspcru) obsCovs(umf) <- scale(obsCovs(umf)) ## Use 1/4 of data just for run speed in example umf <- umf[which((1:numSites(umf)) %% 4 == 0),] ## constant transition rates (fm <- colext(psiformula = ~ 1, gammaformula = ~ 1, epsilonformula = ~ 1, pformula = ~ JulianDate + I(JulianDate^2), umf, control = list(trace=1, maxit=1e4))) ## get the trajectory estimates smoothed(fm) projected(fm) # Empirical Bayes estimates of number of sites occupied in each year re <- ranef(fm) modes <- colSums(bup(re, stat="mode")) plot(1:7, modes, xlab="Year", ylab="Sites occupied", ylim=c(0, 70)) ## Find bootstrap standard errors for smoothed trajectory fm <- nonparboot(fm, B = 100) # This takes a while! [email protected] ## try yearly transition rates yearlySiteCovs(umf) <- data.frame(year = factor(rep(1:7, numSites(umf)))) (fm.yearly <- colext(psiformula = ~ 1, gammaformula = ~ year, epsilonformula = ~ year, pformula = ~ JulianDate + I(JulianDate^2), umf, control = list(trace=1, maxit=1e4))) ## End(Not run)
# Fake data R <- 4 # number of sites J <- 3 # number of secondary sampling occasions T <- 2 # number of primary periods y <- matrix(c( 1,1,0, 0,0,0, 0,0,0, 0,0,0, 1,1,1, 1,1,0, 1,0,1, 0,0,1), nrow=R, ncol=J*T, byrow=TRUE) y site.covs <- data.frame(x1=1:4, x2=factor(c('A','B','A','B'))) site.covs yearly.site.covs <- list( year = matrix(c( 'year1', 'year2', 'year1', 'year2', 'year1', 'year2', 'year1', 'year2'), nrow=R, ncol=T, byrow=TRUE) ) yearly.site.covs obs.covs <- list( x4 = matrix(c( -1,0,1, -1,1,1, -2,0,0, 0,0,2, -3,1,0, 1,1,2, 0,0,0, 0,1,-1), nrow=R, ncol=J*T, byrow=TRUE), x5 = matrix(c( 'a','b','c', 'a','b','c', 'd','b','a', 'd','b','a', 'a','a','c', 'd','b','a', 'a','b','a', 'd','b','a'), nrow=R, ncol=J*T, byrow=TRUE)) obs.covs umf <- unmarkedMultFrame(y=y, siteCovs=site.covs, yearlySiteCovs=yearly.site.covs, obsCovs=obs.covs, numPrimary=2) # organize data umf # look at data summary(umf) # summarize fm <- colext(~1, ~1, ~1, ~1, umf) # fit a model fm ## Not run: # Real data data(frogs) umf <- formatMult(masspcru) obsCovs(umf) <- scale(obsCovs(umf)) ## Use 1/4 of data just for run speed in example umf <- umf[which((1:numSites(umf)) %% 4 == 0),] ## constant transition rates (fm <- colext(psiformula = ~ 1, gammaformula = ~ 1, epsilonformula = ~ 1, pformula = ~ JulianDate + I(JulianDate^2), umf, control = list(trace=1, maxit=1e4))) ## get the trajectory estimates smoothed(fm) projected(fm) # Empirical Bayes estimates of number of sites occupied in each year re <- ranef(fm) modes <- colSums(bup(re, stat="mode")) plot(1:7, modes, xlab="Year", ylab="Sites occupied", ylim=c(0, 70)) ## Find bootstrap standard errors for smoothed trajectory fm <- nonparboot(fm, B = 100) # This takes a while! fm@smoothed.mean.bsse ## try yearly transition rates yearlySiteCovs(umf) <- data.frame(year = factor(rep(1:7, numSites(umf)))) (fm.yearly <- colext(psiformula = ~ 1, gammaformula = ~ year, epsilonformula = ~ year, pformula = ~ JulianDate + I(JulianDate^2), umf, control = list(trace=1, maxit=1e4))) ## End(Not run)
This function computes the weight for the MPLE penalty of Moreno & Lele (2010).
computeMPLElambda(formula, data, knownOcc=numeric(0), starts, method="BFGS",engine=c("C", "R"))
computeMPLElambda(formula, data, knownOcc=numeric(0), starts, method="BFGS",engine=c("C", "R"))
formula |
Double right-hand side formula describing covariates of detection and occupancy in that order. |
data |
An |
knownOcc |
Vector of sites that are known to be occupied. These should be supplied as row numbers of the y matrix, eg, c(3,8) if sites 3 and 8 were known to be occupied a priori. |
starts |
Vector of parameter starting values. |
method |
Optimization method used by |
engine |
Either "C" or "R" to use fast C++ code or native R code during the optimization. |
See occuPEN
for details and examples.
The computed lambda.
Rebecca A. Hutchinson
Moreno, M. and S. R. Lele. 2010. Improved estimation of site occupancy using penalized likelihood. Ecology 91: 341-346.
unmarked
, unmarkedFrameOccu
,
occu
, occuPEN
, occuPEN_CV
,
nonparboot
Methods for function confint
in Package ‘unmarked’
## S4 method for signature 'unmarkedBackTrans' confint(object, parm, level) ## S4 method for signature 'unmarkedEstimate' confint(object, parm, level) ## S4 method for signature 'unmarkedLinComb' confint(object, parm, level) ## S4 method for signature 'unmarkedFit' confint(object, parm, level, type, method)
## S4 method for signature 'unmarkedBackTrans' confint(object, parm, level) ## S4 method for signature 'unmarkedEstimate' confint(object, parm, level) ## S4 method for signature 'unmarkedLinComb' confint(object, parm, level) ## S4 method for signature 'unmarkedFit' confint(object, parm, level, type, method)
object |
Object of appropriate S4 class |
parm |
Name of parameter(s) of interest |
level |
Level of confidence |
type |
Either "state" or "det" |
method |
Either "normal" or "profile" |
A vector of lower and upper confidence intervals. These are asymtotic unless method='profile' is used on unmarkedFit objects in which case they are profile likelihood intervals.
267 1-kmsq quadrats were surveyed 3 times per year during 1999-2007.
data(crossbill)
data(crossbill)
A data frame with 267 observations on the following 58 variables.
id
Plot ID
ele
Elevation
forest
Percent forest cover
surveys
a numeric vector
det991
Detection data for 1999, survey 1
det992
Detection data for 1999, survey 2
det993
Detection data for 1999, survey 3
det001
Detection data for 2000, survey 1
det002
a numeric vector
det003
a numeric vector
det011
a numeric vector
det012
a numeric vector
det013
a numeric vector
det021
a numeric vector
det022
a numeric vector
det023
a numeric vector
det031
a numeric vector
det032
a numeric vector
det033
a numeric vector
det041
a numeric vector
det042
a numeric vector
det043
a numeric vector
det051
a numeric vector
det052
a numeric vector
det053
a numeric vector
det061
a numeric vector
det062
a numeric vector
det063
Detection data for 2006, survey 3
det071
Detection data for 2007, survey 1
det072
Detection data for 2007, survey 2
det073
Detection data for 2007, survey 3
date991
Day of the season for 1999, survey 1
date992
Day of the season for 1999, survey 2
date993
Day of the season for 1999, survey 3
date001
Day of the season for 2000, survey 1
date002
a numeric vector
date003
a numeric vector
date011
a numeric vector
date012
a numeric vector
date013
a numeric vector
date021
a numeric vector
date022
a numeric vector
date023
a numeric vector
date031
a numeric vector
date032
a numeric vector
date033
a numeric vector
date041
a numeric vector
date042
a numeric vector
date043
a numeric vector
date051
a numeric vector
date052
a numeric vector
date053
a numeric vector
date061
a numeric vector
date062
a numeric vector
date063
a numeric vector
date071
a numeric vector
date072
a numeric vector
date073
Day of the season for 2007, survey 3
Schmid, H. N. Zbinden, and V. Keller. 2004. Uberwachung der Bestandsentwicklung haufiger Brutvogel in der Schweiz, Swiss Ornithological Institute Sempach Switzerland
Switzerland
for corresponding covariate data
defined for all 1-kmsq pixels in Switzerland. Useful for making
species distribution maps.
data(crossbill) str(crossbill)
data(crossbill) str(crossbill)
Test predictive accuracy of fitted models using several cross-validation approaches. The dataset is divided by site only into folds or testing and training datasets (i.e., encounter histories within sites are never split up).
## S4 method for signature 'unmarkedFit' crossVal( object, method=c("Kfold","holdout","leaveOneOut"), folds=10, holdoutPct=0.25, statistic=RMSE_MAE, parallel=FALSE, ncores, ...) ## S4 method for signature 'unmarkedFitList' crossVal( object, method=c("Kfold","holdout","leaveOneOut"), folds=10, holdoutPct=0.25, statistic=RMSE_MAE, parallel=FALSE, ncores, sort = c("none", "increasing", "decreasing"), ...)
## S4 method for signature 'unmarkedFit' crossVal( object, method=c("Kfold","holdout","leaveOneOut"), folds=10, holdoutPct=0.25, statistic=RMSE_MAE, parallel=FALSE, ncores, ...) ## S4 method for signature 'unmarkedFitList' crossVal( object, method=c("Kfold","holdout","leaveOneOut"), folds=10, holdoutPct=0.25, statistic=RMSE_MAE, parallel=FALSE, ncores, sort = c("none", "increasing", "decreasing"), ...)
object |
A fitted model inheriting class |
method |
Cross validation method to use as string. Valid options are
|
folds |
Number of folds to use for k-fold cross validation |
holdoutPct |
Proportion of dataset (value between 0-1) to use as the "holdout" or "test" set, for the holdout method |
statistic |
Function that calculates statistics for each fold. The
function must take an |
parallel |
If |
ncores |
Number of parallel cores to use. |
sort |
If doing cross-validation on a |
... |
Other arguments passed to the statistic function. |
unmarkedCrossVal
or unmarkedCrossValList
object containing
calculated statistic values for each fold.
Ken Kellner [email protected]
## Not run: #Get data data(frogs) pferUMF <- unmarkedFrameOccu(pfer.bin) siteCovs(pferUMF) <- data.frame(sitevar1 = rnorm(numSites(pferUMF))) obsCovs(pferUMF) <- data.frame(obsvar1 = rnorm(numSites(pferUMF) * obsNum(pferUMF))) #Fit occupancy model fm <- occu(~ obsvar1 ~ 1, pferUMF) #k-fold cross validation with 10 folds (kfold = crossVal(fm, method="Kfold", folds=10)) #holdout method with 25 (holdout = crossVal(fm,method='holdout', holdoutPct=0.25)) #Leave-one-out method (leave = crossVal(fm, method='leaveOneOut')) #Fit a second model and combine into a fitList fm2 <- occu(~1 ~1, pferUMF) fl <- fitList(fm2,fm) #Cross-validation for all fits in fitList using holdout method (cvlist <- crossVal(fl, method='holdout')) ## End(Not run)
## Not run: #Get data data(frogs) pferUMF <- unmarkedFrameOccu(pfer.bin) siteCovs(pferUMF) <- data.frame(sitevar1 = rnorm(numSites(pferUMF))) obsCovs(pferUMF) <- data.frame(obsvar1 = rnorm(numSites(pferUMF) * obsNum(pferUMF))) #Fit occupancy model fm <- occu(~ obsvar1 ~ 1, pferUMF) #k-fold cross validation with 10 folds (kfold = crossVal(fm, method="Kfold", folds=10)) #holdout method with 25 (holdout = crossVal(fm,method='holdout', holdoutPct=0.25)) #Leave-one-out method (leave = crossVal(fm, method='leaveOneOut')) #Fit a second model and combine into a fitList fm2 <- occu(~1 ~1, pferUMF) fl <- fitList(fm2,fm) #Cross-validation for all fits in fitList using holdout method (cvlist <- crossVal(fl, method='holdout')) ## End(Not run)
Spatially-referenced elevation, forest cover, and vegetation data for Santa Cruz Island.
data(cruz)
data(cruz)
A data frame with 2787 observations on the following 5 variables.
x
Easting (meters)
y
Northing (meters)
elevation
a numeric vector, FEET (multiply by 0.3048 to convert to meters)
forest
a numeric vector, proportion cover
chaparral
a numeric vector, proportion cover
The resolution is 300x300 meters.
The Coordinate system is EPSG number 26911
NAD_1983_UTM_Zone_11N Projection: Transverse_Mercator False_Easting: 500000.000000 False_Northing: 0.000000 Central_Meridian: -117.000000 Scale_Factor: 0.999600 Latitude_Of_Origin: 0.000000 Linear Unit: Meter GCS_North_American_1983 Datum: D_North_American_1983
Brian Cohen of the Nature Conservancy helped prepare the data
Sillett, S. and Chandler, R.B. and Royle, J.A. and Kery, M. and Morrison, S.A. In Press. Hierarchical distance sampling models to estimate population size and habitat-specific abundance of an island endemic. Ecological Applications
## Not run: library(lattice) data(cruz) str(cruz) levelplot(elevation ~ x + y, cruz, aspect="iso", col.regions=terrain.colors(100)) if(require(raster)) { elev <- rasterFromXYZ(cruz[,1:3], crs="+proj=utm +zone=11 +ellps=GRS80 +datum=NAD83 +units=m +no_defs") elev plot(elev) } ## End(Not run)
## Not run: library(lattice) data(cruz) str(cruz) levelplot(elevation ~ x + y, cruz, aspect="iso", col.regions=terrain.colors(100)) if(require(raster)) { elev <- rasterFromXYZ(cruz[,1:3], crs="+proj=utm +zone=11 +ellps=GRS80 +datum=NAD83 +units=m +no_defs") elev plot(elev) } ## End(Not run)
This function converts an appropriatedly formated comma-separated values file (.csv) to a format usable by unmarked's fitting functions (see Details).
csvToUMF(filename, long=FALSE, type, species, ...)
csvToUMF(filename, long=FALSE, type, species, ...)
filename |
string describing filename of file to read in |
long |
|
species |
if data is in long format with multiple species, then this can specify a particular species to extract if there is a column named "species". |
type |
specific type of unmarkedFrame. |
... |
further arguments to be passed to the unmarkedFrame constructor. |
This function provides a quick way to take a .csv file with headers
named as described below and provides the data required and returns of
data in the format required by the model-fitting functions in
unmarked
. The .csv file can be in one of 2 formats: long or
wide. See the first 2 lines of the examples for what these
formats look like.
The .csv file is formatted as follows:
col 1 is site labels.
if data is in long format, col 2 is date of observation.
next J columns are the observations (y) - counts or 0/1's.
next is a series of columns for the site variables (one column per variable). The column header is the variable name.
next is a series of columns for the observation-level variables. These are in sets of J columns for each variable, e.g., var1-1 var1-2 var1-3 var2-1 var2-2 var2-3, etc. The column header of the first variable in each group must indicate the variable name.
an unmarkedFrame object
Ian Fiske [email protected]
# examine a correctly formatted long .csv head(read.csv(system.file("csv","frog2001pcru.csv", package="unmarked"))) # examine a correctly formatted wide .csv head(read.csv(system.file("csv","widewt.csv", package="unmarked"))) # convert them! dat1 <- csvToUMF(system.file("csv","frog2001pcru.csv", package="unmarked"), long = TRUE, type = "unmarkedFrameOccu") dat2 <- csvToUMF(system.file("csv","frog2001pfer.csv", package="unmarked"), long = TRUE, type = "unmarkedFrameOccu") dat3 <- csvToUMF(system.file("csv","widewt.csv", package="unmarked"), long = FALSE, type = "unmarkedFrameOccu")
# examine a correctly formatted long .csv head(read.csv(system.file("csv","frog2001pcru.csv", package="unmarked"))) # examine a correctly formatted wide .csv head(read.csv(system.file("csv","widewt.csv", package="unmarked"))) # convert them! dat1 <- csvToUMF(system.file("csv","frog2001pcru.csv", package="unmarked"), long = TRUE, type = "unmarkedFrameOccu") dat2 <- csvToUMF(system.file("csv","frog2001pfer.csv", package="unmarked"), long = TRUE, type = "unmarkedFrameOccu") dat3 <- csvToUMF(system.file("csv","widewt.csv", package="unmarked"), long = FALSE, type = "unmarkedFrameOccu")
These functions represent the currently available detection
functions used for modeling line and point transect data with
distsamp. Detection functions begin with "g", and density
functions begin with a "d".
gxhn(x, sigma) gxexp(x, rate) gxhaz(x, shape, scale) dxhn(x, sigma) dxexp(x, rate) dxhaz(x, shape, scale) drhn(r, sigma) drexp(r, rate) drhaz(r, shape, scale)
gxhn(x, sigma) gxexp(x, rate) gxhaz(x, shape, scale) dxhn(x, sigma) dxexp(x, rate) dxhaz(x, shape, scale) drhn(r, sigma) drexp(r, rate) drhaz(r, shape, scale)
x |
Perpendicular distance |
r |
Radial distance |
sigma |
Shape parameter of half-normal detection function |
rate |
Shape parameter of negative-exponential detection function |
shape |
Shape parameter of hazard-rate detection function |
scale |
Scale parameter of hazard-rate detection function |
distsamp
for example of using these for plotting
detection function
# Detection probabilities at 25m for range of half-normal sigma values. round(gxhn(25, 10:15), 2) # Plot negative exponential distributions plot(function(x) gxexp(x, rate=10), 0, 50, xlab="distance", ylab="Detection probability") plot(function(x) gxexp(x, rate=20), 0, 50, add=TRUE, lty=2) plot(function(x) gxexp(x, rate=30), 0, 50, add=TRUE, lty=3) # Plot half-normal probability density functions for line- and point-transects par(mfrow=c(2, 1)) plot(function(x) dxhn(x, 20), 0, 50, xlab="distance", ylab="Probability density", main="Line-transect") plot(function(x) drhn(x, 20), 0, 50, xlab="distance", ylab="Probability density", main="Point-transect")
# Detection probabilities at 25m for range of half-normal sigma values. round(gxhn(25, 10:15), 2) # Plot negative exponential distributions plot(function(x) gxexp(x, rate=10), 0, 50, xlab="distance", ylab="Detection probability") plot(function(x) gxexp(x, rate=20), 0, 50, add=TRUE, lty=2) plot(function(x) gxexp(x, rate=30), 0, 50, add=TRUE, lty=3) # Plot half-normal probability density functions for line- and point-transects par(mfrow=c(2, 1)) plot(function(x) dxhn(x, 20), 0, 50, xlab="distance", ylab="Probability density", main="Line-transect") plot(function(x) drhn(x, 20), 0, 50, xlab="distance", ylab="Probability density", main="Point-transect")
Fit the hierarchical distance sampling model of Royle et al. (2004) to line or point transect data recorded in discrete distance intervals.
distsamp(formula, data, keyfun=c("halfnorm", "exp", "hazard", "uniform"), output=c("density", "abund"), unitsOut=c("ha", "kmsq"), starts, method="BFGS", se=TRUE, engine=c("C", "R", "TMB"), rel.tol=0.001, ...)
distsamp(formula, data, keyfun=c("halfnorm", "exp", "hazard", "uniform"), output=c("density", "abund"), unitsOut=c("ha", "kmsq"), starts, method="BFGS", se=TRUE, engine=c("C", "R", "TMB"), rel.tol=0.001, ...)
formula |
Double right-hand formula describing detection covariates followed by abundance covariates. ~1 ~1 would be a null model. |
data |
object of class |
keyfun |
One of the following detection functions: "halfnorm", "hazard", "exp", or "uniform." See details. |
output |
Model either "density" or "abund" |
unitsOut |
Units of density. Either "ha" or "kmsq" for hectares and square kilometers, respectively. |
starts |
Vector of starting values for parameters. |
method |
Optimization method used by |
se |
logical specifying whether or not to compute standard errors. |
engine |
Use code written in C++ or R |
rel.tol |
Requested relative accuracy of the integral, see
|
... |
Additional arguments to optim, such as lower and upper bounds |
Unlike conventional distance sampling, which uses the 'conditional on detection' likelihood formulation, this model is based upon the unconditional likelihood and allows for modeling both abundance and detection function parameters.
The latent transect-level abundance distribution
assumed to be
Poisson with mean
(but see
gdistsamp
for alternatives).
The detection process is modeled as multinomial:
,
where
is the multinomial cell probability for transect i in
distance class j. These are computed based upon a detection function
, such as the half-normal,
negative exponential, or hazard rate.
Parameters and
can be vectors
affected by transect-specific covariates using the log link.
unmarkedFitDS object (child class of unmarkedFit-class
)
describing the model fit.
You cannot use obsCovs.
Richard Chandler [email protected]
Royle, J. A., D. K. Dawson, and S. Bates (2004) Modeling abundance effects in distance sampling. Ecology 85, pp. 1591-1597.
Sillett, S. and Chandler, R.B. and Royle, J.A. and Kery, M. and Morrison, S.A. In Press. Hierarchical distance sampling models to estimate population size and habitat-specific abundance of an island endemic. Ecological Applications
unmarkedFrameDS
,
unmarkedFit-class
fitList
,
formatDistData
, parboot
,
sight2perpdist
, detFuns
,
gdistsamp
, ranef
.
Also look at vignette("distsamp").
## Line transect examples data(linetran) ltUMF <- with(linetran, { unmarkedFrameDS(y = cbind(dc1, dc2, dc3, dc4), siteCovs = data.frame(Length, area, habitat), dist.breaks = c(0, 5, 10, 15, 20), tlength = linetran$Length * 1000, survey = "line", unitsIn = "m") }) ltUMF summary(ltUMF) hist(ltUMF) # Half-normal detection function. Density output (log scale). No covariates. (fm1 <- distsamp(~ 1 ~ 1, ltUMF)) # Some methods to use on fitted model summary(fm1) predict(fm1, type = "state")[1,] # animals / ha exp(coef(fm1, type="state", altNames=TRUE)) # same predict(fm1, type = "det")[1,] # half-normal SD hist(fm1, xlab="Distance (m)") # Only works when there are no det covars # Empirical Bayes estimates of posterior distribution for N_i plot(ranef(fm1, K=50)) # Effective strip half-width (eshw <- integrate(gxhn, 0, 20, sigma=10.9)$value) # Detection probability eshw / 20 # 20 is strip-width # Halfnormal. Covariates affecting both density and and detection. (fm2 <- distsamp(~area + habitat ~ habitat, ltUMF)) # Hazard-rate detection function. (fm3 <- distsamp(~ 1 ~ 1, ltUMF, keyfun="hazard")) # Plot detection function. fmhz.shape <- exp(coef(fm3, type="det")) fmhz.scale <- exp(coef(fm3, type="scale")) plot(function(x) gxhaz(x, shape=fmhz.shape, scale=fmhz.scale), 0, 25, xlab="Distance (m)", ylab="Detection probability") ## Point transect examples # Analysis of the Island Scrub-jay data. # See Sillett et al. (In press) data(issj) str(issj) jayumf <- unmarkedFrameDS(y=as.matrix(issj[,1:3]), siteCovs=data.frame(scale(issj[,c("elevation","forest","chaparral")])), dist.breaks=c(0,100,200,300), unitsIn="m", survey="point") (fm1jay <- distsamp(~chaparral ~chaparral, jayumf)) ## Not run: data(pointtran) ptUMF <- with(pointtran, { unmarkedFrameDS(y = cbind(dc1, dc2, dc3, dc4, dc5), siteCovs = data.frame(area, habitat), dist.breaks = seq(0, 25, by=5), survey = "point", unitsIn = "m") }) # Half-normal. (fmp1 <- distsamp(~ 1 ~ 1, ptUMF)) hist(fmp1, ylim=c(0, 0.07), xlab="Distance (m)") # effective radius sig <- exp(coef(fmp1, type="det")) ea <- 2*pi * integrate(grhn, 0, 25, sigma=sig)$value # effective area sqrt(ea / pi) # effective radius # detection probability ea / (pi*25^2) ## End(Not run)
## Line transect examples data(linetran) ltUMF <- with(linetran, { unmarkedFrameDS(y = cbind(dc1, dc2, dc3, dc4), siteCovs = data.frame(Length, area, habitat), dist.breaks = c(0, 5, 10, 15, 20), tlength = linetran$Length * 1000, survey = "line", unitsIn = "m") }) ltUMF summary(ltUMF) hist(ltUMF) # Half-normal detection function. Density output (log scale). No covariates. (fm1 <- distsamp(~ 1 ~ 1, ltUMF)) # Some methods to use on fitted model summary(fm1) predict(fm1, type = "state")[1,] # animals / ha exp(coef(fm1, type="state", altNames=TRUE)) # same predict(fm1, type = "det")[1,] # half-normal SD hist(fm1, xlab="Distance (m)") # Only works when there are no det covars # Empirical Bayes estimates of posterior distribution for N_i plot(ranef(fm1, K=50)) # Effective strip half-width (eshw <- integrate(gxhn, 0, 20, sigma=10.9)$value) # Detection probability eshw / 20 # 20 is strip-width # Halfnormal. Covariates affecting both density and and detection. (fm2 <- distsamp(~area + habitat ~ habitat, ltUMF)) # Hazard-rate detection function. (fm3 <- distsamp(~ 1 ~ 1, ltUMF, keyfun="hazard")) # Plot detection function. fmhz.shape <- exp(coef(fm3, type="det")) fmhz.scale <- exp(coef(fm3, type="scale")) plot(function(x) gxhaz(x, shape=fmhz.shape, scale=fmhz.scale), 0, 25, xlab="Distance (m)", ylab="Detection probability") ## Point transect examples # Analysis of the Island Scrub-jay data. # See Sillett et al. (In press) data(issj) str(issj) jayumf <- unmarkedFrameDS(y=as.matrix(issj[,1:3]), siteCovs=data.frame(scale(issj[,c("elevation","forest","chaparral")])), dist.breaks=c(0,100,200,300), unitsIn="m", survey="point") (fm1jay <- distsamp(~chaparral ~chaparral, jayumf)) ## Not run: data(pointtran) ptUMF <- with(pointtran, { unmarkedFrameDS(y = cbind(dc1, dc2, dc3, dc4, dc5), siteCovs = data.frame(area, habitat), dist.breaks = seq(0, 25, by=5), survey = "point", unitsIn = "m") }) # Half-normal. (fmp1 <- distsamp(~ 1 ~ 1, ptUMF)) hist(fmp1, ylim=c(0, 0.07), xlab="Distance (m)") # effective radius sig <- exp(coef(fmp1, type="det")) ea <- 2*pi * integrate(grhn, 0, 25, sigma=sig)$value # effective area sqrt(ea / pi) # effective radius # detection probability ea / (pi*25^2) ## End(Not run)
Fit the model of Dail and Madsen (2011) and Hostetler and Chandler (2015) with a distance sampling observation model (Sollmann et al. 2015).
distsampOpen(lambdaformula, gammaformula, omegaformula, pformula, data, keyfun=c("halfnorm", "exp", "hazard", "uniform"), output=c("abund", "density"), unitsOut=c("ha", "kmsq"), mixture=c("P", "NB", "ZIP"), K, dynamics=c("constant", "autoreg", "notrend", "trend", "ricker", "gompertz"), fix=c("none", "gamma", "omega"), immigration=FALSE, iotaformula = ~1, starts, method="BFGS", se=TRUE, ...)
distsampOpen(lambdaformula, gammaformula, omegaformula, pformula, data, keyfun=c("halfnorm", "exp", "hazard", "uniform"), output=c("abund", "density"), unitsOut=c("ha", "kmsq"), mixture=c("P", "NB", "ZIP"), K, dynamics=c("constant", "autoreg", "notrend", "trend", "ricker", "gompertz"), fix=c("none", "gamma", "omega"), immigration=FALSE, iotaformula = ~1, starts, method="BFGS", se=TRUE, ...)
lambdaformula |
Right-hand sided formula for initial abundance |
gammaformula |
Right-hand sided formula for recruitment rate (when dynamics is "constant", "autoreg", or "notrend") or population growth rate (when dynamics is "trend", "ricker", or "gompertz") |
omegaformula |
Right-hand sided formula for apparent survival probability (when dynamics is "constant", "autoreg", or "notrend") or equilibrium abundance (when dynamics is "ricker" or "gompertz") |
pformula |
A right-hand side formula describing the detection function covariates |
data |
An object of class |
keyfun |
One of the following detection functions: "halfnorm", "hazard", "exp", or "uniform" |
output |
Model either "density" or "abund" |
unitsOut |
Units of density. Either "ha" or "kmsq" for hectares and square kilometers, respectively |
mixture |
String specifying mixture: "P", "NB", or "ZIP" for the Poisson, negative binomial, or zero-inflated Poisson distributions respectively |
K |
Integer defining upper bound of discrete integration. This should be higher than the maximum observed count and high enough that it does not affect the parameter estimates. However, the higher the value the slower the computation |
dynamics |
Character string describing the type of population dynamics. "constant" indicates that there is no relationship between omega and gamma. "autoreg" is an auto-regressive model in which recruitment is modeled as gamma*N[i,t-1]. "notrend" model gamma as lambda*(1-omega) such that there is no temporal trend. "trend" is a model for exponential growth, N[i,t] = N[i,t-1]*gamma, where gamma in this case is finite rate of increase (normally referred to as lambda). "ricker" and "gompertz" are models for density-dependent population growth. "ricker" is the Ricker-logistic model, N[i,t] = N[i,t-1]*exp(gamma*(1-N[i,t-1]/omega)), where gamma is the maximum instantaneous population growth rate (normally referred to as r) and omega is the equilibrium abundance (normally referred to as K). "gompertz" is a modified version of the Gompertz-logistic model, N[i,t] = N[i,t-1]*exp(gamma*(1-log(N[i,t-1]+1)/log(omega+1))), where the interpretations of gamma and omega are similar to in the Ricker model |
fix |
If "omega", omega is fixed at 1. If "gamma", gamma is fixed at 0 |
immigration |
Logical specifying whether or not to include an immigration term (iota) in population dynamics |
iotaformula |
Right-hand sided formula for average number of immigrants to a site per time step |
starts |
Vector of starting values |
method |
Optimization method used by |
se |
Logical specifying whether or not to compute standard errors |
... |
Additional arguments to optim, such as lower and upper bounds |
These models generalize distance sampling models (Buckland et al. 2001) by relaxing the closure assumption (Dail and Madsen 2011, Hostetler and Chandler 2015, Sollmann et al. 2015).
The models include two or three additional parameters: gamma, either the recruitment rate (births and immigrations), the finite rate of increase, or the maximum instantaneous rate of increase; omega, either the apparent survival rate (deaths and emigrations) or the equilibrium abundance (carrying capacity); and iota, the number of immigrants per site and year. Estimates of population size at each time period can be derived from these parameters, and thus so can trend estimates. Or, trend can be estimated directly using dynamics="trend".
When immigration is set to FALSE (the default), iota is not modeled. When immigration is set to TRUE and dynamics is set to "autoreg", the model will separately estimate birth rate (gamma) and number of immigrants (iota). When immigration is set to TRUE and dynamics is set to "trend", "ricker", or "gompertz", the model will separately estimate local contributions to population growth (gamma and omega) and number of immigrants (iota).
The latent abundance distribution, can be set as a Poisson, negative binomial, or zero-inflated
Poisson random
variable, depending on the setting of the
mixture
argument,
mixture = "P"
, mixture = "NB"
, mixture = "ZIP"
respectively. For the first two distributions, the mean of is
. If
, then an
additional parameter,
, describes dispersion (lower
implies higher variance). For the ZIP distribution,
the mean is
, where psi is the
zero-inflation parameter.
For "constant", "autoreg", or "notrend" dynamics, the latent abundance state
following the initial sampling period arises
from a
Markovian process in which survivors are modeled as , and recruits
follow
.
Alternative population dynamics can be specified
using the
dynamics
and immigration
arguments.
,
, and
are modeled
using the the log link.
is modeled using
the logit link.
is either modeled using the logit link (for
"constant", "autoreg", or "notrend" dynamics) or the log link (for "ricker"
or "gompertz" dynamics). For "trend" dynamics,
is not modeled.
For the distance sampling detection process, half-normal ("halfnorm"
),
exponential ("exp"
), hazard ("hazard"
), and uniform
("uniform"
) key functions are available.
An object of class unmarkedFitDSO
This function can be extremely slow, especially if there are covariates of gamma or omega. Consider testing the timing on a small subset of the data, perhaps with se=FALSE. Finding the lowest value of K that does not affect estimates will also help with speed.
When gamma or omega are modeled using year-specific covariates, the covariate data for the final year will be ignored; however, they must be supplied.
If the time gap between primary periods is not constant, an M by T
matrix of integers should be supplied to unmarkedFrameDSO
using the primaryPeriod
argument.
Secondary sampling periods are optional, but can greatly improve the precision of the estimates.
Optimization may fail if the initial value of the intercept for the detection parameter (sigma) is too small or large relative to transect width. By default, this parameter is initialized at log(average band width). You may have to adjust this starting value.
Richard Chandler, Jeff Hostetler, Andy Royle, Ken Kellner
Buckland, S.T., Anderson, D.R., Burnham, K.P., Laake, J.L., Borchers, D.L. and Thomas, L. (2001) Introduction to Distance Sampling: Estimating Abundance of Biological Populations. Oxford University Press, Oxford, UK.
Dail, D. and L. Madsen (2011) Models for Estimating Abundance from Repeated Counts of an Open Metapopulation. Biometrics. 67: 577-587.
Hostetler, J. A. and R. B. Chandler (2015) Improved State-space Models for Inference about Spatial and Temporal Variation in Abundance from Count Data. Ecology 96: 1713-1723.
Sollmann, R., Gardner, B., Chandler, R.B., Royle, J.A. and Sillett, T.S. (2015) An open-population hierarchical distance sampling model. Ecology 96: 325-331.
distsamp, gdistsamp, unmarkedFrameDSO
## Not run: #Generate some data set.seed(123) lambda=4; gamma=0.5; omega=0.8; sigma=25; M=100; T=10; J=4 y <- array(NA, c(M, J, T)) N <- matrix(NA, M, T) S <- G <- matrix(NA, M, T-1) db <- c(0, 25, 50, 75, 100) #Half-normal, line transect g <- function(x, sig) exp(-x^2/(2*sig^2)) cp <- u <- a <- numeric(J) L <- 1 a[1] <- L*db[2] cp[1] <- integrate(g, db[1], db[2], sig=sigma)$value for(j in 2:J) { a[j] <- db[j+1] - sum(a[1:j]) cp[j] <- integrate(g, db[j], db[j+1], sig=sigma)$value } u <- a / sum(a) cp <- cp / a * u cp[j+1] <- 1-sum(cp) for(i in 1:M) { N[i,1] <- rpois(1, lambda) y[i,1:J,1] <- rmultinom(1, N[i,1], cp)[1:J] for(t in 1:(T-1)) { S[i,t] <- rbinom(1, N[i,t], omega) G[i,t] <- rpois(1, gamma) N[i,t+1] <- S[i,t] + G[i,t] y[i,1:J,t+1] <- rmultinom(1, N[i,t+1], cp)[1:J] } } y <- matrix(y, M) #Make a covariate sc <- data.frame(x1 = rnorm(M)) umf <- unmarkedFrameDSO(y = y, siteCovs=sc, numPrimary=T, dist.breaks=db, survey="line", unitsIn="m", tlength=rep(1, M)) (fit <- distsampOpen(~x1, ~1, ~1, ~1, data = umf, K=50, keyfun="halfnorm")) #Compare to truth cf <- coef(fit) data.frame(model=c(exp(cf[1]), cf[2], exp(cf[3]), plogis(cf[4]), exp(cf[5])), truth=c(lambda, 0, gamma, omega, sigma)) #Predict head(predict(fit, type='lambda')) #Check fit with parametric bootstrap pb <- parboot(fit, nsims=15) plot(pb) # Empirical Bayes estimates of abundance for each site / year re <- ranef(fit) plot(re, layout=c(10,5), xlim=c(-1, 10)) ## End(Not run)
## Not run: #Generate some data set.seed(123) lambda=4; gamma=0.5; omega=0.8; sigma=25; M=100; T=10; J=4 y <- array(NA, c(M, J, T)) N <- matrix(NA, M, T) S <- G <- matrix(NA, M, T-1) db <- c(0, 25, 50, 75, 100) #Half-normal, line transect g <- function(x, sig) exp(-x^2/(2*sig^2)) cp <- u <- a <- numeric(J) L <- 1 a[1] <- L*db[2] cp[1] <- integrate(g, db[1], db[2], sig=sigma)$value for(j in 2:J) { a[j] <- db[j+1] - sum(a[1:j]) cp[j] <- integrate(g, db[j], db[j+1], sig=sigma)$value } u <- a / sum(a) cp <- cp / a * u cp[j+1] <- 1-sum(cp) for(i in 1:M) { N[i,1] <- rpois(1, lambda) y[i,1:J,1] <- rmultinom(1, N[i,1], cp)[1:J] for(t in 1:(T-1)) { S[i,t] <- rbinom(1, N[i,t], omega) G[i,t] <- rpois(1, gamma) N[i,t+1] <- S[i,t] + G[i,t] y[i,1:J,t+1] <- rmultinom(1, N[i,t+1], cp)[1:J] } } y <- matrix(y, M) #Make a covariate sc <- data.frame(x1 = rnorm(M)) umf <- unmarkedFrameDSO(y = y, siteCovs=sc, numPrimary=T, dist.breaks=db, survey="line", unitsIn="m", tlength=rep(1, M)) (fit <- distsampOpen(~x1, ~1, ~1, ~1, data = umf, K=50, keyfun="halfnorm")) #Compare to truth cf <- coef(fit) data.frame(model=c(exp(cf[1]), cf[2], exp(cf[3]), plogis(cf[4]), exp(cf[5])), truth=c(lambda, 0, gamma, omega, sigma)) #Predict head(predict(fit, type='lambda')) #Check fit with parametric bootstrap pb <- parboot(fit, nsims=15) plot(pb) # Empirical Bayes estimates of abundance for each site / year re <- ranef(fit) plot(re, layout=c(10,5), xlim=c(-1, 10)) ## End(Not run)
Organize models for model selection or model-averaged prediction.
fitList(..., fits, autoNames=c("object", "formula"))
fitList(..., fits, autoNames=c("object", "formula"))
... |
Fitted models. Preferrably named. |
fits |
An alternative way of providing the models. A (preferrably named) list of fitted models. |
autoNames |
Option to change the names |
Two requirements exist to conduct AIC-based model-selection and model-averaging in unmarked. First, the data objects (ie, unmarkedFrames) must be identical among fitted models. Second, the response matrix must be identical among fitted models after missing values have been removed. This means that if a response value was removed in one model due to missingness, it needs to be removed from all models.
Richard Chandler [email protected]
data(linetran) (dbreaksLine <- c(0, 5, 10, 15, 20)) lengths <- linetran$Length * 1000 ltUMF <- with(linetran, { unmarkedFrameDS(y = cbind(dc1, dc2, dc3, dc4), siteCovs = data.frame(Length, area, habitat), dist.breaks = dbreaksLine, tlength = lengths, survey = "line", unitsIn = "m") }) fm1 <- distsamp(~ 1 ~1, ltUMF) fm2 <- distsamp(~ area ~1, ltUMF) fm3 <- distsamp( ~ 1 ~area, ltUMF) ## Two methods of creating an unmarkedFitList using fitList() # Method 1 fmList <- fitList(Null=fm1, .area=fm2, area.=fm3) # Method 2. Note that the arugment name "fits" must be included in call. models <- list(Null=fm1, .area=fm2, area.=fm3) fmList <- fitList(fits = models) # Extract coefficients and standard errors coef(fmList) SE(fmList) # Model-averaged prediction predict(fmList, type="state") # Model selection modSel(fmList, nullmod="Null")
data(linetran) (dbreaksLine <- c(0, 5, 10, 15, 20)) lengths <- linetran$Length * 1000 ltUMF <- with(linetran, { unmarkedFrameDS(y = cbind(dc1, dc2, dc3, dc4), siteCovs = data.frame(Length, area, habitat), dist.breaks = dbreaksLine, tlength = lengths, survey = "line", unitsIn = "m") }) fm1 <- distsamp(~ 1 ~1, ltUMF) fm2 <- distsamp(~ area ~1, ltUMF) fm3 <- distsamp( ~ 1 ~area, ltUMF) ## Two methods of creating an unmarkedFitList using fitList() # Method 1 fmList <- fitList(Null=fm1, .area=fm2, area.=fm3) # Method 2. Note that the arugment name "fits" must be included in call. models <- list(Null=fm1, .area=fm2, area.=fm3) fmList <- fitList(fits = models) # Extract coefficients and standard errors coef(fmList) SE(fmList) # Model-averaged prediction predict(fmList, type="state") # Model selection modSel(fmList, nullmod="Null")
Calculates expected values from a model. The approach varies based on the model type but is usually the product of the state and detection parameters.
## S4 method for signature 'unmarkedFit' fitted(object, ...)
## S4 method for signature 'unmarkedFit' fitted(object, ...)
object |
A |
... |
Other arguments, currently ignored. |
In most cases, fitted
returns a matrix of expected values with the
same dimensions as the y
matrix (M x J or M x JT). For some models with
multiple observation processes or multiple species, such as gdistremoval
or occuMulti
, the output may be a named list of matrices.
Convert individual-level distance data to the transect-level
format required by distsamp
or gdistsamp
formatDistData(distData, distCol, transectNameCol, dist.breaks, occasionCol, effortMatrix)
formatDistData(distData, distCol, transectNameCol, dist.breaks, occasionCol, effortMatrix)
distData |
data.frame where each row is a detected individual. Must have at least 2 columns. One for distances and the other for transect names. |
distCol |
character, name of the column in distData that contains the distances. The distances should be numeric. |
transectNameCol |
character, column name containing transect names. The transect column should be a factor. |
dist.breaks |
numeric vector of distance interval cutpoints. Length must equal J+1. |
occasionCol |
optional character. If transects were visited more
than once, this can be used to format data for |
effortMatrix |
optional matrix of 1 and 0s that is M * T in size and will allow for the insertion of NAs where the matrix = 0, indicating that a survey was not completed. When not supplied a matrix of all 1s is created since it is assumed all surveys were completed. |
This function creates a site (M) by distance interval (J) response
matrix from a data.frame containing the detection distances for each
individual and the transect names. Alternatively, if each transect was
surveyed T times, the resulting matrix is M x JT, which is the format
required by gdistsamp
, seeunmarkedFrameGDS
.
An M x J or M x JT matrix containing the binned distance data. Transect names will become rownames and colnames will describe the distance intervals.
It is important that the factor containing transect names includes levels for all the transects surveyed, not just those with >=1 detection. Likewise, if transects were visited more than once, the factor containing the occasion numbers should include levels for all occasions. See the example for how to add levels to a factor.
# Create a data.frame containing distances of animals detected # along 4 transects. dat <- data.frame(transect=gl(4,5, labels=letters[1:4]), distance=rpois(20, 10)) dat # Look at your transect names. levels(dat$transect) # Suppose that you also surveyed a transect named "e" where no animals were # detected. You must add it to the levels of dat$transect levels(dat$transect) <- c(levels(dat$transect), "e") levels(dat$transect) # Distance cut points defining distance intervals cp <- c(0, 8, 10, 12, 14, 18) # Create formated response matrix yDat <- formatDistData(dat, "distance", "transect", cp) yDat # Now you could merge yDat with transect-level covariates and # then use unmarkedFrameDS to prepare data for distsamp ## Example for data from multiple occasions dat2 <- data.frame(distance=1:100, site=gl(5, 20), visit=factor(rep(1:4, each=5))) cutpt <- seq(0, 100, by=25) y2 <- formatDistData(dat2, "distance", "site", cutpt, "visit") umf <- unmarkedFrameGDS(y=y2, numPrimary=4, survey="point", dist.breaks=cutpt, unitsIn="m") ## Example for datda from multiple occasions with effortMatrix dat3 <- data.frame(distance=1:100, site=gl(5, 20), visit=factor(rep(1:4, each=5))) cutpt <- seq(0, 100, by=25) effortMatrix <- matrix(ncol=4, nrow=5, rbinom(20,1,0.8)) y3 <- formatDistData(dat2, "distance", "site", cutpt, "visit", effortMatrix)
# Create a data.frame containing distances of animals detected # along 4 transects. dat <- data.frame(transect=gl(4,5, labels=letters[1:4]), distance=rpois(20, 10)) dat # Look at your transect names. levels(dat$transect) # Suppose that you also surveyed a transect named "e" where no animals were # detected. You must add it to the levels of dat$transect levels(dat$transect) <- c(levels(dat$transect), "e") levels(dat$transect) # Distance cut points defining distance intervals cp <- c(0, 8, 10, 12, 14, 18) # Create formated response matrix yDat <- formatDistData(dat, "distance", "transect", cp) yDat # Now you could merge yDat with transect-level covariates and # then use unmarkedFrameDS to prepare data for distsamp ## Example for data from multiple occasions dat2 <- data.frame(distance=1:100, site=gl(5, 20), visit=factor(rep(1:4, each=5))) cutpt <- seq(0, 100, by=25) y2 <- formatDistData(dat2, "distance", "site", cutpt, "visit") umf <- unmarkedFrameGDS(y=y2, numPrimary=4, survey="point", dist.breaks=cutpt, unitsIn="m") ## Example for datda from multiple occasions with effortMatrix dat3 <- data.frame(distance=1:100, site=gl(5, 20), visit=factor(rep(1:4, each=5))) cutpt <- seq(0, 100, by=25) effortMatrix <- matrix(ncol=4, nrow=5, rbinom(20,1,0.8)) y3 <- formatDistData(dat2, "distance", "site", cutpt, "visit", effortMatrix)
This convenience function converts multi-year data in long format to unmarkedMultFrame Object. See Details for more information.
formatMult(df.in)
formatMult(df.in)
df.in |
a data.frame appropriately formatted (see Details). |
df.in
is a data frame with columns formatted as follows:
Column 1 = year number
Column 2 = site name or number
Column 3 = julian date or chronological sample number during year
Column 4 = observations (y)
Column 5 – Final Column = covariates
Note that if the data is already in wide format, it may be easier to create an unmarkedMultFrame object
directly with a call to unmarkedMultFrame
.
unmarkedMultFrame object
Convert a data.frame between wide and long formats.
formatWide(dfin, sep = ".", obsToY, type, ...) formatLong(dfin, species = NULL, type, ...)
formatWide(dfin, sep = ".", obsToY, type, ...) formatLong(dfin, species = NULL, type, ...)
dfin |
A data.frame to be reformatted. |
sep |
A seperator of column names in wide format. |
obsToY |
Optional matrix specifying relationship between covariate column structure and response matrix structure. |
type |
Type of unmarkedFrame to create? |
species |
Character name of species response column |
... |
Further arguments to the unmarkedFrame* constructor functions |
Note that not all possible unmarkedFrame
* classes have been tested with these
functions. Multinomial data sets (e.g., removal, double-observer, capture-recapture)
are almost certainly easier to enter directly to the constructor function and are not
supported by formatLong
or formatWide
.
In order for these functions to work, the columns of dfin
need to be in the
correct order. formatLong
requires that the columns are in the
following scheme:
site name or number.
date or observation number.
response variable (detections, counts, etc).
The remaining columns are observation-level covariates.
formatWide
requires particular names for the columns.
The column order for formatWide
is
(optional) site name, named “site”.
response, named “y.1”, “y.2”, ..., “y.J”.
columns of site-level covariates, each with a relevant name per column.
groups of columns of observation-level covariates, each group having the name form “someObsCov.1”, “someObsCov.2”, ..., “someObsCov.J”.
A data.frame
frogs contains NAAMP data for Pseudacris feriarum (pfer) and Pseudacris crucifer (pcru) in 2001.
data(frogs)
data(frogs)
matrix of observed calling indices for pcru
matrix of detections for pcru
array of covariates measured at the observation-level for pcru
matrix of observed calling indices for pfer
matrix of detections for pfer
array of covariates measured at the observation-level for pfer
The rows of pcru.y, pcru.bin, pfer.y, and pfer.bin correspond to sites
and columns correspond to visits to each site. The first 2 dimensions
of pfer.data and pcru.data are matrices of covariates that correspond to
the observation matrices (sites observation), with the 3rd
dimension corresponding to separate covariates.
https://www.pwrc.usgs.gov/naamp/
Mossman MJ, Weir LA. North American Amphibian Monitoring Program (NAAMP). Amphibian Declines: the conservation status of United States species. University of California Press, Berkeley, California, USA. 2005:307-313.
data(frogs) str(pcru.data)
data(frogs) str(pcru.data)
Fit the model of Amundson et al. (2014) to point count datasets containing both distance and time of observation data. The Amundson et al. (2014) model is extended to account for temporary emigration by estimating an additional availability probability if multiple counts at a site are available. Abundance can be modeled as a Poisson, negative binomial, or Zero-inflated Poisson. Multiple distance sampling key functions are also available.
gdistremoval(lambdaformula=~1, phiformula=~1, removalformula=~1, distanceformula=~1, data, keyfun=c("halfnorm", "exp", "hazard", "uniform"), output=c("abund", "density"), unitsOut=c("ha", "kmsq"), mixture=c('P', 'NB', 'ZIP'), K, starts, method = "BFGS", se = TRUE, engine=c("C","TMB"), threads=1, ...)
gdistremoval(lambdaformula=~1, phiformula=~1, removalformula=~1, distanceformula=~1, data, keyfun=c("halfnorm", "exp", "hazard", "uniform"), output=c("abund", "density"), unitsOut=c("ha", "kmsq"), mixture=c('P', 'NB', 'ZIP'), K, starts, method = "BFGS", se = TRUE, engine=c("C","TMB"), threads=1, ...)
lambdaformula |
A right-hand side formula describing the abundance covariates |
phiformula |
A right-hand side formula describing the availability covariates |
removalformula |
A right-hand side formula describing removal probability covariates |
distanceformula |
A right-hand side formula describing the detection function covariates |
data |
An object of class |
keyfun |
One of the following detection functions: "halfnorm", "hazard", "exp", or "uniform" |
output |
Model either "abund" or "density" |
unitsOut |
Units of density. Either "ha" or "kmsq" for hectares and square kilometers, respectively |
mixture |
Either "P", "NB", or "ZIP" for the Poisson, negative binomial, and Zero-inflated Poisson models of abundance |
K |
An integer value specifying the upper bound used in the integration |
starts |
A numeric vector of starting values for the model parameters |
method |
Optimization method used by |
se |
logical specifying whether or not to compute standard errors |
engine |
Either "C" to use C++ code or "TMB" to use TMB for optimization |
threads |
Set the number of threads to use for optimization in C++, if
OpenMP is available on your system. Increasing the number of threads
may speed up optimization in some cases by running the likelihood
calculation in parallel. If |
... |
Additional arguments to optim, such as lower and upper bounds |
An object of class unmarkedFitGDR
Ken Kellner [email protected]
Amundson, C.L., Royle, J.A. and Handel, C.M., 2014. A hierarchical model combining distance sampling and time removal to estimate detection probability during avian point counts. The Auk 131: 476-494.
unmarkedFrameGDR
, gdistsamp
, gmultmix
Extends the distance sampling model of Royle et al. (2004) to estimate the probability of being available for detection. Also allows abundance to be modeled using the negative binomial and zero-inflated Poisson distributions.
gdistsamp(lambdaformula, phiformula, pformula, data, keyfun = c("halfnorm", "exp", "hazard", "uniform"), output = c("abund", "density"), unitsOut = c("ha", "kmsq"), mixture = c("P", "NB", "ZIP"), K, starts, method = "BFGS", se = TRUE, engine=c("C","R"), rel.tol=1e-4, threads=1, ...)
gdistsamp(lambdaformula, phiformula, pformula, data, keyfun = c("halfnorm", "exp", "hazard", "uniform"), output = c("abund", "density"), unitsOut = c("ha", "kmsq"), mixture = c("P", "NB", "ZIP"), K, starts, method = "BFGS", se = TRUE, engine=c("C","R"), rel.tol=1e-4, threads=1, ...)
lambdaformula |
A right-hand side formula describing the abundance covariates. |
phiformula |
A right-hand side formula describing the availability covariates. |
pformula |
A right-hand side formula describing the detection function covariates. |
data |
An object of class |
keyfun |
One of the following detection functions: "halfnorm", "hazard", "exp", or "uniform." See details. |
output |
Model either "density" or "abund" |
unitsOut |
Units of density. Either "ha" or "kmsq" for hectares and square kilometers, respectively. |
mixture |
Either "P", "NB", or "ZIP" for the Poisson, negative binomial, or zero-inflated Poisson models of abundance. |
K |
An integer value specifying the upper bound used in the integration. |
starts |
A numeric vector of starting values for the model parameters. |
method |
Optimization method used by |
se |
logical specifying whether or not to compute standard errors. |
engine |
Either "C" to use fast C++ code or "R" to use native R code during the optimization. |
rel.tol |
relative accuracy for the integration of the detection function. See integrate. You might try adjusting this if you get an error message related to the integral. Alternatively, try providing different starting values. |
threads |
Set the number of threads to use for optimization in C++, if
OpenMP is available on your system. Increasing the number of threads
may speed up optimization in some cases by running the likelihood
calculation in parallel. If |
... |
Additional arguments to optim, such as lower and upper bounds |
Extends the model of Royle et al. (2004) by estimating the probability
of being available for detection . To estimate this
additional parameter, replicate distance sampling data must be
collected at each transect. Thus the data are collected at i = 1, 2,
..., R transects on t = 1, 2, ..., T occassions. As with the model of
Royle et al. (2004), the detections must be binned into distance
classes. These data must be formatted in a matrix with R rows, and JT
columns where J is the number of distance classses. See
unmarkedFrameGDS
for more information about data
formatting.
The definition of availability depends on the context. The model is
If there is no movement, then is local abundance, and
is the number of individuals that are available
to be detected. In this case,
. Animals might
be missed on the transect line because they are difficult to see or
detected. This relaxes the assumption of conventional distance
sampling that
.
However, when there is movement in the form of temporary emigration,
local abundance is ; it's the fraction of
that are on the plot at time t. In this case,
is the temporary emigration parameter, and we need to
assume that
in order to interpret
as local abundance. See Chandler et al. (2011)
for an analysis of the model under this form of temporary emigration.
If there is movement and then it
isn't possible to estimate local abundance at time t. In this case,
would be the total number of individuals that ever use
plot i (the super-population), and
would be the
number available to be detected at time t. Since a fraction of the
unavailable individuals could be off the plot, and another fraction
could be on the plot, it isn't possible to infer local abundance and
density during occasion t.
An object of class unmarkedFitGDS.
If you aren't interested in estimating , but you want
to use the negative binomial or ZIP distributions, set numPrimary=1
when formatting the data.
You cannot use obsCovs, but you can use yearlySiteCovs (a confusing name since this model isn't for multi-year data. It's just a hold-over from the colext methods of formatting data upon which it is based.)
Richard Chandler [email protected]
Royle, J. A., D. K. Dawson, and S. Bates. 2004. Modeling abundance effects in distance sampling. Ecology 85:1591-1597.
Chandler, R. B, J. A. Royle, and D. I. King. 2011. Inference about density and temporary emigration in unmarked populations. Ecology 92:1429–1435.
# Simulate some line-transect data set.seed(36837) R <- 50 # number of transects T <- 5 # number of replicates strip.width <- 50 transect.length <- 100 breaks <- seq(0, 50, by=10) lambda <- 5 # Abundance phi <- 0.6 # Availability sigma <- 30 # Half-normal shape parameter J <- length(breaks)-1 y <- array(0, c(R, J, T)) for(i in 1:R) { M <- rpois(1, lambda) # Individuals within the 1-ha strip for(t in 1:T) { # Distances from point d <- runif(M, 0, strip.width) # Detection process if(length(d)) { cp <- phi*exp(-d^2 / (2 * sigma^2)) # half-normal w/ g(0)<1 d <- d[rbinom(length(d), 1, cp) == 1] y[i,,t] <- table(cut(d, breaks, include.lowest=TRUE)) } } } y <- matrix(y, nrow=R) # convert array to matrix # Organize data umf <- unmarkedFrameGDS(y = y, survey="line", unitsIn="m", dist.breaks=breaks, tlength=rep(transect.length, R), numPrimary=T) summary(umf) # Fit the model m1 <- gdistsamp(~1, ~1, ~1, umf, output="density", K=50) summary(m1) predict(m1, type = 'lambda')[1,] predict(m1, type = 'phi')[1,] predict(m1, type = 'det')[1,] ## Not run: # Empirical Bayes estimates of abundance at each site re <- ranef(m1) plot(re, layout=c(10,5), xlim=c(-1, 20)) ## End(Not run)
# Simulate some line-transect data set.seed(36837) R <- 50 # number of transects T <- 5 # number of replicates strip.width <- 50 transect.length <- 100 breaks <- seq(0, 50, by=10) lambda <- 5 # Abundance phi <- 0.6 # Availability sigma <- 30 # Half-normal shape parameter J <- length(breaks)-1 y <- array(0, c(R, J, T)) for(i in 1:R) { M <- rpois(1, lambda) # Individuals within the 1-ha strip for(t in 1:T) { # Distances from point d <- runif(M, 0, strip.width) # Detection process if(length(d)) { cp <- phi*exp(-d^2 / (2 * sigma^2)) # half-normal w/ g(0)<1 d <- d[rbinom(length(d), 1, cp) == 1] y[i,,t] <- table(cut(d, breaks, include.lowest=TRUE)) } } } y <- matrix(y, nrow=R) # convert array to matrix # Organize data umf <- unmarkedFrameGDS(y = y, survey="line", unitsIn="m", dist.breaks=breaks, tlength=rep(transect.length, R), numPrimary=T) summary(umf) # Fit the model m1 <- gdistsamp(~1, ~1, ~1, umf, output="density", K=50) summary(m1) predict(m1, type = 'lambda')[1,] predict(m1, type = 'phi')[1,] predict(m1, type = 'det')[1,] ## Not run: # Empirical Bayes estimates of abundance at each site re <- ranef(m1) plot(re, layout=c(10,5), xlim=c(-1, 20)) ## End(Not run)
Methods for function getB
in Package ‘unmarked’. These methods
return a matrix of probabilities detections were certain for occupancy
models that account for false positives.
Methods for function getFP
in Package ‘unmarked’. These methods
return a matrix of false positive detection probabilities.
Calculates detection probability for each site and occasion and returns the values as a matrix with (generally) the same dimensions as the y-matrix.
## S4 method for signature 'unmarkedFit' getP(object, ...)
## S4 method for signature 'unmarkedFit' getP(object, ...)
object |
A |
... |
Other arguments, currently ignored. |
A matrix of estimated detection probabilities with the
same dimensions as the y
matrix (M x J or M x JT). For some models with
multiple observation processes or multiple species, such as gdistremoval
or occuMulti
, the output may be a named list of matrices.
Multinomial calling index data.
data(gf)
data(gf)
A list with 2 components
220 x 3 matrix of count indices
list of covariates
Royle, J. Andrew, and William A. Link. 2005. A General Class of Multinomial Mixture Models for Anuran Calling Survey Data. Ecology 86, no. 9: 2505–2512.
data(gf) str(gf.data) str(gf.obs)
data(gf) str(gf.data) str(gf.obs)
A three level hierarchical model for designs involving repeated counts that yield multinomial outcomes. Possible data collection methods include repeated removal sampling and double observer sampling. The three model parameters are abundance, availability, and detection probability.
gmultmix(lambdaformula, phiformula, pformula, data, mixture = c("P", "NB", "ZIP"), K, starts, method = "BFGS", se = TRUE, engine=c("C","R"), threads=1, ...)
gmultmix(lambdaformula, phiformula, pformula, data, mixture = c("P", "NB", "ZIP"), K, starts, method = "BFGS", se = TRUE, engine=c("C","R"), threads=1, ...)
lambdaformula |
Righthand side (RHS) formula describing abundance covariates |
phiformula |
RHS formula describing availability covariates |
pformula |
RHS formula describing detection covariates |
data |
An object of class unmarkedFrameGMM |
mixture |
Either "P", "NB", or "ZIP" for the Poisson, negative binomial, or zero-inflated Poisson models of abundance |
K |
The upper bound of integration |
starts |
Starting values |
method |
Optimization method used by |
se |
Logical. Should standard errors be calculated? |
engine |
Either "C" to use fast C++ code or "R" to use native R code during the optimization. |
threads |
Set the number of threads to use for optimization in C++, if
OpenMP is available on your system. Increasing the number of threads
may speed up optimization in some cases by running the likelihood
calculation in parallel. If |
... |
Additional arguments to optim, such as lower and upper bounds |
The latent transect-level super-population abundance distribution
can be set as a
Poisson, negative binomial, or zero-inflated Poisson random variable,
depending on the setting of the
mixture
argument.
mixture = "P"
, mixture = "NB"
, and mixture = "ZIP"
select the Poisson, negative binomial, and zero-inflated Poisson distributions
respectively. The mean of is
.
If
, then an additional parameter,
,
describes dispersion (lower
implies higher variance). If
, then an additional zero-inflation parameter
is estimated.
The number of individuals available for detection at time j
is a modeled as binomial:
.
The detection process is modeled as multinomial:
,
where
is the multinomial cell probability for
plot i at time t on occasion j.
Cell probabilities are computed via a user-defined function related to the
sampling design. Alternatively, the default functions
removalPiFun
or doublePiFun
can be used for equal-interval removal
sampling or
double observer sampling. Note that the function for computing cell
probabilites
is specified when setting up the data using unmarkedFrameGMM
.
Parameters ,
and
can be
modeled as linear functions of covariates using the log, logit and logit
links
respectively.
An object of class unmarkedFitGMM.
In the case where availability for detection is due to random temporary emigration, population density at time j, D(i,j), can be estimated by N(i,j)/plotArea.
This model is also applicable to sampling designs in which the local population size is closed during the J repeated counts, and availability is related to factors such as the probability of vocalizing. In this case, density can be estimated by M(i)/plotArea.
If availability is a function of both temporary emigration and other processess such as song rate, then density cannot be directly estimated, but inference about the super-population size, M(i), is possible.
Three types of covariates can be supplied, site-level,
site-by-year-level, and observation-level. These must be formatted
correctly when organizing the data with unmarkedFrameGPC
Richard Chandler [email protected] and Andy Royle
Royle, J. A. (2004) Generalized estimators of avian abundance from count survey data. Animal Biodiversity and Conservation 27, pp. 375–386.
Chandler, R. B., J. A. Royle, and D. I. King. 2011. Inference about density and temporary emigration in unmarked populations. Ecology 92:1429-1435.
unmarkedFrameGMM
for setting up the data and metadata.
multinomPois
for surveys where no secondary sampling periods were
used. Example functions to calculate multinomial cell probabilities are
described piFuns
# Simulate data using the multinomial-Poisson model with a # repeated constant-interval removal design. n <- 100 # number of sites T <- 4 # number of primary periods J <- 3 # number of secondary periods lam <- 3 phi <- 0.5 p <- 0.3 #set.seed(26) y <- array(NA, c(n, T, J)) M <- rpois(n, lam) # Local population size N <- matrix(NA, n, T) # Individuals available for detection for(i in 1:n) { N[i,] <- rbinom(T, M[i], phi) y[i,,1] <- rbinom(T, N[i,], p) # Observe some Nleft1 <- N[i,] - y[i,,1] # Remove them y[i,,2] <- rbinom(T, Nleft1, p) # ... Nleft2 <- Nleft1 - y[i,,2] y[i,,3] <- rbinom(T, Nleft2, p) } y.ijt <- cbind(y[,1,], y[,2,], y[,3,], y[,4,]) umf1 <- unmarkedFrameGMM(y=y.ijt, numPrimary=T, type="removal") (m1 <- gmultmix(~1, ~1, ~1, data=umf1, K=30)) predict(m1, type="lambda")[1,] # Individuals per plot predict(m1, type="phi")[1,] # Probability of being avilable (p <- predict(m1, type="det")[1,]) # Probability of detection # Multinomial cell probabilities under removal design p <- p$Predicted c(p, (1-p) * p, (1-p)^2 * p) # Or more generally: head(getP(m1)) # Empirical Bayes estimates of super-population size re <- ranef(m1) plot(re, layout=c(5,5), xlim=c(-1,20), subset=site%in%1:25)
# Simulate data using the multinomial-Poisson model with a # repeated constant-interval removal design. n <- 100 # number of sites T <- 4 # number of primary periods J <- 3 # number of secondary periods lam <- 3 phi <- 0.5 p <- 0.3 #set.seed(26) y <- array(NA, c(n, T, J)) M <- rpois(n, lam) # Local population size N <- matrix(NA, n, T) # Individuals available for detection for(i in 1:n) { N[i,] <- rbinom(T, M[i], phi) y[i,,1] <- rbinom(T, N[i,], p) # Observe some Nleft1 <- N[i,] - y[i,,1] # Remove them y[i,,2] <- rbinom(T, Nleft1, p) # ... Nleft2 <- Nleft1 - y[i,,2] y[i,,3] <- rbinom(T, Nleft2, p) } y.ijt <- cbind(y[,1,], y[,2,], y[,3,], y[,4,]) umf1 <- unmarkedFrameGMM(y=y.ijt, numPrimary=T, type="removal") (m1 <- gmultmix(~1, ~1, ~1, data=umf1, K=30)) predict(m1, type="lambda")[1,] # Individuals per plot predict(m1, type="phi")[1,] # Probability of being avilable (p <- predict(m1, type="det")[1,]) # Probability of detection # Multinomial cell probabilities under removal design p <- p$Predicted c(p, (1-p) * p, (1-p)^2 * p) # Or more generally: head(getP(m1)) # Empirical Bayes estimates of super-population size re <- ranef(m1) plot(re, layout=c(5,5), xlim=c(-1,20), subset=site%in%1:25)
Fit multi-scale occupancy models as described in Nichols et al. (2008) to repeated presence-absence data collected using the robust design. This model allows for inference about occupancy, availability, and detection probability.
goccu(psiformula, phiformula, pformula, data, linkPsi = c("logit", "cloglog"), starts, method = "BFGS", se = TRUE, ...)
goccu(psiformula, phiformula, pformula, data, linkPsi = c("logit", "cloglog"), starts, method = "BFGS", se = TRUE, ...)
psiformula |
Right-hand sided formula describing occupancy covariates |
phiformula |
Right-hand sided formula describing availability covariates |
pformula |
Right-hand sided formula for detection probability covariates |
data |
An object of class unmarkedFrameGOccu or unmarkedMultFrame |
linkPsi |
Link function for the occupancy model. Options are
|
starts |
Starting values |
method |
Optimization method used by |
se |
Logical. Should standard errors be calculated? |
... |
Additional arguments to |
Primary periods could represent spatial or temporal sampling replicates. For example, you could have several spatial sub-units within each site, where each sub-unit was then sampled repeatedly. This is a frequent design for eDNA studies. Or, you could have multiple primary periods of sampling at each site (conducted at different times within a season), each of which contains several secondary sampling periods. In both cases the robust design structure can be used to estimate an availability probability in addition to detection probability. See Kery and Royle (2015) 10.10 for more details.
An object of class unmarkedFitGOccu
Ken Kellner [email protected]
Kery, M., & Royle, J. A. (2015). Applied hierarchical modeling in ecology: Volume 1: Prelude and static models. Elsevier Science.
Nichols, J. D., Bailey, L. L., O'Connell Jr, A. F., Talancy, N. W., Campbell Grant, E. H., Gilbert, A. T., Annand E. M., Husband, T. P., & Hines, J. E. (2008). Multi-scale occupancy estimation and modelling using multiple detection methods. Journal of Applied Ecology, 45(5), 1321-1329.
occu
, colext
,
unmarkedMultFrame
, unmarkedFrameGOccu
set.seed(123) M <- 100 T <- 5 J <- 4 psi <- 0.5 phi <- 0.3 p <- 0.4 z <- rbinom(M, 1, psi) zmat <- matrix(z, nrow=M, ncol=T) zz <- rbinom(M*T, 1, zmat*phi) zz <- matrix(zz, nrow=M, ncol=T) zzmat <- zz[,rep(1:T, each=J)] y <- rbinom(M*T*J, 1, zzmat*p) y <- matrix(y, M, J*T) umf <- unmarkedMultFrame(y=y, numPrimary=T) ## Not run: mod <- goccu(psiformula = ~1, phiformula = ~1, pformula = ~1, umf) plogis(coef(mod)) ## End(Not run)
set.seed(123) M <- 100 T <- 5 J <- 4 psi <- 0.5 phi <- 0.3 p <- 0.4 z <- rbinom(M, 1, psi) zmat <- matrix(z, nrow=M, ncol=T) zz <- rbinom(M*T, 1, zmat*phi) zz <- matrix(zz, nrow=M, ncol=T) zzmat <- zz[,rep(1:T, each=J)] y <- rbinom(M*T*J, 1, zzmat*p) y <- matrix(y, M, J*T) umf <- unmarkedMultFrame(y=y, numPrimary=T) ## Not run: mod <- goccu(psiformula = ~1, phiformula = ~1, pformula = ~1, umf) plogis(coef(mod)) ## End(Not run)
Fit the model of Chandler et al. (2011) to repeated count data collected using the robust design. This model allows for inference about population size, availability, and detection probability.
gpcount(lambdaformula, phiformula, pformula, data, mixture = c("P", "NB", "ZIP"), K, starts, method = "BFGS", se = TRUE, engine = c("C", "R"), threads=1, ...)
gpcount(lambdaformula, phiformula, pformula, data, mixture = c("P", "NB", "ZIP"), K, starts, method = "BFGS", se = TRUE, engine = c("C", "R"), threads=1, ...)
lambdaformula |
Right-hand sided formula describing covariates of abundance. |
phiformula |
Right-hand sided formula describing availability covariates |
pformula |
Right-hand sided formula for detection probability covariates |
data |
An object of class unmarkedFrameGPC |
mixture |
Either "P", "NB", or "ZIP" for Poisson, negative binomial, or zero-inflated Poisson distributions |
K |
The maximum possible value of M, the super-population size. |
starts |
Starting values |
method |
Optimization method used by |
se |
Logical. Should standard errors be calculated? |
engine |
Either "C" or "R" for the C++ or R versions of the likelihood. The C++ code is faster, but harder to debug. |
threads |
Set the number of threads to use for optimization in C++, if
OpenMP is available on your system. Increasing the number of threads
may speed up optimization in some cases by running the likelihood
calculation in parallel. If |
... |
Additional arguments to |
The latent transect-level super-population abundance distribution
can be set as either a
Poisson, negative binomial, or zero-inflated Poisson random variable, depending on the
setting of the
mixture
argument. The expected value of
is
. If
,
then an additional parameter,
, describes
dispersion (lower
implies higher variance). If
, then an additional zero-inflation parameter
is estimated.
The number of individuals available for detection at time j
is a modeled as binomial:
.
The detection process is also modeled as binomial:
.
Parameters ,
and
can be
modeled as linear functions of covariates using the log, logit and logit
links respectively.
An object of class unmarkedFitGPC
In the case where availability for detection is due to random temporary emigration, population density at time j, D(i,j), can be estimated by N(i,j)/plotArea.
This model is also applicable to sampling designs in which the local population size is closed during the J repeated counts, and availability is related to factors such as the probability of vocalizing. In this case, density can be estimated by M(i)/plotArea.
If availability is a function of both temporary emigration and other processess such as song rate, then density cannot be directly estimated, but inference about the super-population size, M(i), is possible.
Three types of covariates can be supplied, site-level,
site-by-year-level, and observation-level. These must be formatted
correctly when organizing the data with unmarkedFrameGPC
Richard Chandler [email protected]
Royle, J. A. 2004. N-Mixture models for estimating population size from spatially replicated counts. Biometrics 60:108–105.
Chandler, R. B., J. A. Royle, and D. I. King. 2011. Inference about density and temporary emigration in unmarked populations. Ecology 92:1429-1435.
gmultmix
, gdistsamp
,
unmarkedFrameGPC
set.seed(54) nSites <- 20 nVisits <- 4 nReps <- 3 lambda <- 5 phi <- 0.7 p <- 0.5 M <- rpois(nSites, lambda) # super-population size N <- matrix(NA, nSites, nVisits) y <- array(NA, c(nSites, nReps, nVisits)) for(i in 1:nVisits) { N[,i] <- rbinom(nSites, M, phi) # population available during vist j } colMeans(N) for(i in 1:nSites) { for(j in 1:nVisits) { y[i,,j] <- rbinom(nReps, N[i,j], p) } } ym <- matrix(y, nSites) ym[1,] <- NA ym[2, 1:nReps] <- NA ym[3, (nReps+1):(nReps+nReps)] <- NA umf <- unmarkedFrameGPC(y=ym, numPrimary=nVisits) ## Not run: fmu <- gpcount(~1, ~1, ~1, umf, K=40, control=list(trace=TRUE, REPORT=1)) predict(fmu, type="lambda")[1,] predict(fmu, type="phi")[1,] predict(fmu, type="det")[1,] ## End(Not run)
set.seed(54) nSites <- 20 nVisits <- 4 nReps <- 3 lambda <- 5 phi <- 0.7 p <- 0.5 M <- rpois(nSites, lambda) # super-population size N <- matrix(NA, nSites, nVisits) y <- array(NA, c(nSites, nReps, nVisits)) for(i in 1:nVisits) { N[,i] <- rbinom(nSites, M, phi) # population available during vist j } colMeans(N) for(i in 1:nSites) { for(j in 1:nVisits) { y[i,,j] <- rbinom(nReps, N[i,j], p) } } ym <- matrix(y, nSites) ym[1,] <- NA ym[2, 1:nReps] <- NA ym[3, (nReps+1):(nReps+nReps)] <- NA umf <- unmarkedFrameGPC(y=ym, numPrimary=nVisits) ## Not run: fmu <- gpcount(~1, ~1, ~1, umf, K=40, control=list(trace=TRUE, REPORT=1)) predict(fmu, type="lambda")[1,] predict(fmu, type="phi")[1,] predict(fmu, type="det")[1,] ## End(Not run)
Model abundance using a combination of distance sampling data (DS) and other similar data types, including simple point counts (PC) and occupancy/detection-nondetection (OC/DND) data.
IDS(lambdaformula = ~1, detformulaDS = ~1, detformulaPC = NULL, detformulaOC = NULL, dataDS, dataPC = NULL, dataOC = NULL, availformula = NULL, durationDS = NULL, durationPC = NULL, durationOC = NULL, keyfun = "halfnorm", maxDistPC, maxDistOC, K = 100, unitsOut = "ha", starts = NULL, method = "BFGS", ...)
IDS(lambdaformula = ~1, detformulaDS = ~1, detformulaPC = NULL, detformulaOC = NULL, dataDS, dataPC = NULL, dataOC = NULL, availformula = NULL, durationDS = NULL, durationPC = NULL, durationOC = NULL, keyfun = "halfnorm", maxDistPC, maxDistOC, K = 100, unitsOut = "ha", starts = NULL, method = "BFGS", ...)
lambdaformula |
Formula for abundance |
detformulaDS |
Formula for distance-based (DS) detection probability |
detformulaPC |
Formula for point count detection probability. If
|
detformulaOC |
Formula for occupancy/detection-nondetection detection
probability. If |
dataDS |
An object of class |
dataPC |
An object of class |
dataOC |
An object of class |
availformula |
Optional. If specified, formula for availability. Only possible to use if you have variable detection survey lengths (see below) |
durationDS |
Optional. Vector of survey durations at each distance sampling site |
durationPC |
Optional. Vector of survey durations at each PC site |
durationOC |
Optional. Vector of survey durations at each OC/DND site |
keyfun |
Distance sampling key function; either "halfnorm" or "exp" |
maxDistPC |
Maximum observation distance for PC surveys; defaults to maximum of distance bins from the distance sampling data |
maxDistOC |
Maximum observation distance for OC/DND surveys; defaults to maximum of distance bins from the distance sampling data |
K |
Integer, upper bound for integrating out latent abundance. Only used if you have included OC/DND data |
unitsOut |
Units of density for output. Either "ha" or "kmsq" for hectares and square kilometers, respectively |
starts |
A numeric vector of starting values for the model parameters |
method |
Optimization method used by |
... |
Additional arguments to optim, such as lower and upper bounds |
This function facilitates a combined analysis of distance sampling data (DS) with other similar data types, including simple point counts (PC) and occupancy/detection-nondetection (OC/DND) data. The combined approach capitalizes on the strengths and minimizes the weaknesses of each type. The PC and OC/DND data are viewed as latent distance sampling surveys with an underlying abundance model shared by all data types. All analyses must include some distance sampling data, but can include either PC or DND data or both. If surveys are of variable duration, it is also possible to estimate availability.
Input data must be provided as a series of separate unmarkedFrame
s:
unmarkedFrameDS
for the distance sampling data, unmarkedFramePCount
for the point count data, and unmarkedFrameOccu
for OC/DND data.
See the help files for these objects for guidance on how to organize the data.
An object of class unmarkedFitIDS
Simulations indicated estimates of availability were very unreliable when including detection/non-detection data, so the function will not allow you to use DND data and estimate availability at the same time. In general estimation of availability can be difficult; use simulations to see how well it works for your specific situation.
Ken Kellner [email protected]
Kery M, Royle JA, Hallman T, Robinson WD, Strebel N, Kellner KF. 2024. Integrated distance sampling models for simple point counts. Ecology.
## Not run: # Simulate data based on a real dataset # Formulas for each model formulas <- list(lam=~elev, ds=~1, phi=~1) # Sample sizes design <- list(Mds=2912, J=6, Mpc=506) # Model parameters coefs <- list(lam = c(intercept=3, elev=-0.5), ds = c(intercept=-2.5), phi = c(intercept=-1.3)) # Survey durations durs <- list(ds = rep(5, design$Mds), pc=runif(design$Mpc, 3, 30)) set.seed(456) sim_umf <- simulate("IDS", # name of model we are simulating for nsim=1, # number of replicates formulas=formulas, coefs=coefs, design=design, # arguments used by unmarkedFrameDS dist.breaks = seq(0, 0.30, length.out=7), unitsIn="km", # arguments used by IDS # could also have e.g. keyfun here durationDS=durs$ds, durationPC=durs$pc, durationOC=durs$oc, maxDistPC=0.5, maxDistOC=0.5, unitsOut="kmsq") # Look at the results lapply(sim_umf, head) # Fit a model (mod_sim <- IDS(lambdaformula = ~elev, detformulaDS = ~1, dataDS=sim_umf$ds, dataPC=sim_umf$pc, availformula = ~1, durationDS=durs$ds, durationPC=durs$pc, maxDistPC=0.5, unitsOut="kmsq")) # Compare with known parameter values # Note: this is an unusually good estimate of availability # It is hard to estimate in most cases cbind(truth=unlist(coefs), est=coef(mod_sim)) # Predict density at each distance sampling site head(predict(mod_sim, 'lam')) ## End(Not run)
## Not run: # Simulate data based on a real dataset # Formulas for each model formulas <- list(lam=~elev, ds=~1, phi=~1) # Sample sizes design <- list(Mds=2912, J=6, Mpc=506) # Model parameters coefs <- list(lam = c(intercept=3, elev=-0.5), ds = c(intercept=-2.5), phi = c(intercept=-1.3)) # Survey durations durs <- list(ds = rep(5, design$Mds), pc=runif(design$Mpc, 3, 30)) set.seed(456) sim_umf <- simulate("IDS", # name of model we are simulating for nsim=1, # number of replicates formulas=formulas, coefs=coefs, design=design, # arguments used by unmarkedFrameDS dist.breaks = seq(0, 0.30, length.out=7), unitsIn="km", # arguments used by IDS # could also have e.g. keyfun here durationDS=durs$ds, durationPC=durs$pc, durationOC=durs$oc, maxDistPC=0.5, maxDistOC=0.5, unitsOut="kmsq") # Look at the results lapply(sim_umf, head) # Fit a model (mod_sim <- IDS(lambdaformula = ~elev, detformulaDS = ~1, dataDS=sim_umf$ds, dataPC=sim_umf$pc, availformula = ~1, durationDS=durs$ds, durationPC=durs$pc, maxDistPC=0.5, unitsOut="kmsq")) # Compare with known parameter values # Note: this is an unusually good estimate of availability # It is hard to estimate in most cases cbind(truth=unlist(coefs), est=coef(mod_sim)) # Predict density at each distance sampling site head(predict(mod_sim, 'lam')) ## End(Not run)
This function uses an ad-hoc averaging approach to impute missing entries in obsCovs. The missing entry is replaced by an average of the average for the site and the average for the visit number.
imputeMissing(umf, whichCovs = seq(length=ncol(obsCovs(umf))))
imputeMissing(umf, whichCovs = seq(length=ncol(obsCovs(umf))))
umf |
The data set who's obsCovs are being imputed. |
whichCovs |
An integer vector giving the indices of the covariates to be imputed.
This defaults to all covariates in |
A version of umf
that has the requested obsCovs imputed.
Ian Fiske
data(frogs) pcru.obscovs <- data.frame(MinAfterSunset=as.vector(t(pcru.data[,,1])), Wind=as.vector(t(pcru.data[,,2])), Sky=as.vector(t(pcru.data[,,3])), Temperature=as.vector(t(pcru.data[,,4]))) pcruUMF <- unmarkedFrameOccu(y = pcru.bin, obsCovs = pcru.obscovs) pcruUMF.i1 <- imputeMissing(pcruUMF) pcruUMF.i2 <- imputeMissing(pcruUMF, whichCovs = 2)
data(frogs) pcru.obscovs <- data.frame(MinAfterSunset=as.vector(t(pcru.data[,,1])), Wind=as.vector(t(pcru.data[,,2])), Sky=as.vector(t(pcru.data[,,3])), Temperature=as.vector(t(pcru.data[,,4]))) pcruUMF <- unmarkedFrameOccu(y = pcru.bin, obsCovs = pcru.obscovs) pcruUMF.i1 <- imputeMissing(pcruUMF) pcruUMF.i2 <- imputeMissing(pcruUMF, whichCovs = 2)
Data were collected at 307 survey locations ("point transects") on Santa Cruz Island, California during the Fall of 2008. The distance data are binned into 3 distance intervals [0-100], (100-200], and (200-300]. The coordinates of the survey locations as well as 3 habitat covariates are also included.
data(issj)
data(issj)
A data frame with 307 observations on the following 8 variables.
issj[0-100]
Number of individuals detected within 100m
issj(100-200]
Detections in the interval (100-200m]
issj(200-300]
Detections in the interval (200-300m]
x
Easting (meters)
y
Northing (meters)
elevation
Elevation in meters
forest
Forest cover
chaparral
Chaparral cover
Sillett, S. and Chandler, R.B. and Royle, J.A. and Kery, M. and Morrison, S.A. In Press. Hierarchical distance sampling models to estimate population size and habitat-specific abundance of an island endemic. Ecological Applications
Island-wide covariates are also available cruz
data(issj) str(issj) head(issj) umf <- unmarkedFrameDS(y=as.matrix(issj[,1:3]), siteCovs=issj[,6:8], dist.breaks=c(0,100,200,300), unitsIn="m", survey="point") summary(umf)
data(issj) str(issj) head(issj) umf <- unmarkedFrameDS(y=as.matrix(issj[,1:3]), siteCovs=issj[,6:8], dist.breaks=c(0,100,200,300), unitsIn="m", survey="point") summary(umf)
The Swiss breeding bird survey ("Monitoring Haufige Brutvogel" MHB) has monitored the populations of 150 common species since 1999. The MHB sample consists of 267 1-km squares that are laid out as a grid across Switzerland. Fieldwork is conducted by about 200 skilled birdwatchers, most of them volunteers. Avian populations are monitored using a simplified territory mapping protocol, where each square is surveyed up to three times during the breeding season (only twice above the tree line). Surveys are conducted along a transect that does not change over the years.
The list jay
has the data for European Jay territories for 238 sites surveyed in 2002.
data("jay")
data("jay")
jay
is a list with 3 elements:
a data frame with rows for 238 sites and columns for each of the observable detection histories. For the sites visited 3 times, these are "100", "010", "001", "110", "101", "011", "111"
. Sites visited twice have "10x", "01x", "11x"
.
Each row gives the number of territories with the corresponding detection history, with NA for the detection histories not applicable: sites visited 3 times have NAs in the last 3 columns while those visited twice have NAs in the first 7 columns.
a data frame with rows for 238 sites, and the following columns:
elev : the mean elevation of the quadrat, m.
length : the length of the route walked in the quadrat, km.
forest : percentage forest cover.
a data frame with rows for 238 sites, and the following columns:
x, y : the coordinates of the site.
date1, date2, date3 : the Julian date of the visit, with 1 April = 1. Sites visited twice have NA in the 3rd column.
dur1, dur2, dur3 : the duration of the survey, mins. For 10 visits the duration is not available, so there are additional NAs in these columns.
In previous versions, jay
had additional information not required for the analysis, and a data frame with essentially the same information as the Switzerland
data set.
Swiss Ornithological Institute
Royle, J.A., Kery, M., Gauthier, R., Schmid, H. (2007) Hierarchical spatial models of abundance and occurrence from imperfect survey data. Ecological Monographs, 77, 465-481.
Kery & Royle (2016) Applied Hierarachical Modeling in Ecology Section 7.9
data(jay) str(jay) # Carry out a simple analysis, without covariates: # Create a customised piFun (see ?piFun for details) crPiFun <- function(p) { p1 <- p[,1] # Extract the columns of the p matrix, one for p2 <- p[,2] # each of J = 3 sample occasions p3 <- p[,3] cbind( # define multinomial cell probabilities: "100" = p1 * (1-p2) * (1-p3), "010" = (1-p1) * p2 * (1-p3), "001" = (1-p1) * (1-p2) * p3, "110" = p1 * p2 * (1-p3), "101" = p1 * (1-p2) * p3, "011" = (1-p1) * p2 * p3, "111" = p1 * p2 * p3, "10x" = p1*(1-p2), "01x" = (1-p1)*p2, "11x" = p1*p2) } # Build the unmarkedFrame object mhb.umf <- unmarkedFrameMPois(y=as.matrix(jay$caphist), obsToY=matrix(1, 3, 10), piFun="crPiFun") # Fit a model ( fm1 <- multinomPois(~1 ~1, mhb.umf) )
data(jay) str(jay) # Carry out a simple analysis, without covariates: # Create a customised piFun (see ?piFun for details) crPiFun <- function(p) { p1 <- p[,1] # Extract the columns of the p matrix, one for p2 <- p[,2] # each of J = 3 sample occasions p3 <- p[,3] cbind( # define multinomial cell probabilities: "100" = p1 * (1-p2) * (1-p3), "010" = (1-p1) * p2 * (1-p3), "001" = (1-p1) * (1-p2) * p3, "110" = p1 * p2 * (1-p3), "101" = p1 * (1-p2) * p3, "011" = (1-p1) * p2 * p3, "111" = p1 * p2 * p3, "10x" = p1*(1-p2), "01x" = (1-p1)*p2, "11x" = p1*p2) } # Build the unmarkedFrame object mhb.umf <- unmarkedFrameMPois(y=as.matrix(jay$caphist), obsToY=matrix(1, 3, 10), piFun="crPiFun") # Fit a model ( fm1 <- multinomPois(~1 ~1, mhb.umf) )
Abundance and occurrence are fundamentally related.
lambda2psi(lambda)
lambda2psi(lambda)
lambda |
Numeric vector with values >= 0 |
A vector of psi values of the same length as lambda.
pcount
, multinomPois
, distsamp
lambda2psi(0:5)
lambda2psi(0:5)
Methods for function linearComb
in Package ‘unmarked’
Typically called internally
Returns
linear combinations of parameters from a fitted model. Coefficients
are supplied through coefficients. The required argument type
specifies which model estimate to use. You can use
names(fittedmodel)
to view possible values for the type
argument.
data(ovendata) ovenFrame <- unmarkedFrameMPois(ovendata.list$data, siteCovs=as.data.frame(scale(ovendata.list$covariates[,-1])), type = "removal") fm <- multinomPois(~ 1 ~ ufc + trba, ovenFrame) linearComb(fm, c(1, 0.5, 0.5), type = "state") linearComb(fm, matrix(c(1, 0.5, 0.5, 1, 0, 0, 1, 0, 0.5), 3, 3, byrow=TRUE), type="state")
data(ovendata) ovenFrame <- unmarkedFrameMPois(ovendata.list$data, siteCovs=as.data.frame(scale(ovendata.list$covariates[,-1])), type = "removal") fm <- multinomPois(~ 1 ~ ufc + trba, ovenFrame) linearComb(fm, c(1, 0.5, 0.5), type = "state") linearComb(fm, matrix(c(1, 0.5, 0.5, 1, 0, 0, 1, 0, 0.5), 3, 3, byrow=TRUE), type="state")
Response matrix of animals detected in four distance classes plus transect lengths and two covariates.
data(linetran)
data(linetran)
A data frame with 12 observations on the following 7 variables.
dc1
Counts in distance class 1 [0-5 m)
dc2
Counts in distance class 2 [5-10 m)
dc3
Counts in distance class 3 [10-15 m)
dc4
Counts in distance class 4 [15-20 m)
Length
Transect lengths in km
area
Numeric covariate
habitat
a factor with levels A
and B
data(linetran) linetran # Format for distsamp() ltUMF <- with(linetran, { unmarkedFrameDS(y = cbind(dc1, dc2, dc3, dc4), siteCovs = data.frame(Length, area, habitat), dist.breaks = c(0, 5, 10, 15, 20), tlength = linetran$Length * 1000, survey = "line", unitsIn = "m") })
data(linetran) linetran # Format for distsamp() ltUMF <- with(linetran, { unmarkedFrameDS(y = cbind(dc1, dc2, dc3, dc4), siteCovs = data.frame(Length, area, habitat), dist.breaks = c(0, 5, 10, 15, 20), tlength = linetran$Length * 1000, survey = "line", unitsIn = "m") })
These are factory functions that generate piFuns with the required defaults, which are enclosed within the environment of the piFun. See the main entry for piFuns.
makeRemPiFun(times) makeCrPiFun(nOcc) makeCrPiFunMb(nOcc) makeCrPiFunMh(nOcc)
makeRemPiFun(times) makeCrPiFun(nOcc) makeCrPiFunMb(nOcc) makeCrPiFunMh(nOcc)
times |
a vector of times for each interval, |
nOcc |
the number of survey occasions |
makeRemPiFun
produces a piFun for a removal model with the required number of occasions and potentially varying time intervals. The input to the piFun must be probabilities per unit time. This is a generalisation of the piFun in the Examples section of piFuns.
makeCrPiFun
produces a piFun for a standard capture-recapture model, M0, Mt or Mx. Probabilities of detection may vary across occasions. See Kery & Royle (2016) section 7.8.1.
makeCrPiFunMb
produces a piFun for a capture-recapture model with a behavioral response after the first capture, Mb. Probabilities of detection are constant across occasions. The first column is the probability of detection for animals not caught before, column #2 is for animals after the first capture. The remaining columns are ignored. See Kery & Royle (2016) section 7.8.2.
makeCrPiFunMh
produces a piFun for a capture-recapture model with individual heterogeneity in detection probability, Mh, using a logit-normal distribution. Probabilities of detection are constant across occasions. The first column is the mean of the logit-normal on the probability scale. Cell p[1, 2] is a value in [0, 1] which controls the spread of the distribution. The remaining cells are ignored. See Kery & Royle (2016) section 7.8.3.
A piFun with the appropriate defaults.
Kery, M., Royle, J. A. (2016) Applied Hierarchical Modeling in Ecology Vol 1.
# Generate piFuns and check their behaviour: # makeRemPiFun # ============ ( pRem <- matrix(0.4, nrow=5, ncol=3) ) myPi <- makeRemPiFun(times=c(2,3,5)) myPi(pRem) ls(environment(myPi)) # See what's in the environment environment(myPi)$times ( pRem <- matrix(runif(15), 5, 3) ) myPi(pRem) myPi <- makeRemPiFun(c(5,3,2)) environment(myPi)$times myPi(pRem) # More than 3 occasions myPi <- makeRemPiFun(c(1,2,3,5)) try(myPi(pRem)) # Error ( pRem <- matrix(runif(20), 5, 4) ) myPi(pRem) # Probability of escaping detection 1 - rowSums(myPi(pRem)) # makeCrPiFun # =========== p <- matrix(0.4, 2, 3) myPi <- makeCrPiFun(3) myPi(p) myPi # Look at the function ls(environment(myPi)) environment(myPi)$histories p <- matrix(runif(6, 0.1, 0.9), 2, 3) # different p's everywhere myPi(p) p <- matrix(runif(4*5, 0.1, 0.9), 4, 5) # > 3 occasions try(myPi(p)) # Error myPi <- makeCrPiFun(5) ( tmp <- myPi(p) ) 1 - rowSums(tmp) # Probability of non-capture # makeCrPiFunMb # ============== ( pMb <- cbind(rep(0.7, 5), 0.3, NA) ) myPi <- makeCrPiFunMb(3) myPi(pMb) ( pMb <- matrix(runif(15), 5, 3) ) # col #3 will be ignored myPi(pMb) # with > 3 occasions ( pMb <- matrix(runif(15), 3, 5) ) try(myPi(pMb)) myPi <- makeCrPiFunMb(5) myPi(pMb) # makeCrPiFunMh # ============= pMh <- cbind(rep(0.4, 5), NA, NA) pMh[1, 2] <- 0.3 pMh myPi <- makeCrPiFunMh(3) myPi(pMh) pMh <- cbind(runif(5), NA, NA) pMh[1, 2] <- 0.3 pMh myPi(pMh) # with > 3 occasions pMh <- cbind(runif(5), NA, NA, NA, NA) pMh[1, 2] <- 0.3 pMh try(myPi(pMh)) myPi <- makeCrPiFunMh(5) 1 - rowSums(myPi(pMh)) # Probability of non-detection
# Generate piFuns and check their behaviour: # makeRemPiFun # ============ ( pRem <- matrix(0.4, nrow=5, ncol=3) ) myPi <- makeRemPiFun(times=c(2,3,5)) myPi(pRem) ls(environment(myPi)) # See what's in the environment environment(myPi)$times ( pRem <- matrix(runif(15), 5, 3) ) myPi(pRem) myPi <- makeRemPiFun(c(5,3,2)) environment(myPi)$times myPi(pRem) # More than 3 occasions myPi <- makeRemPiFun(c(1,2,3,5)) try(myPi(pRem)) # Error ( pRem <- matrix(runif(20), 5, 4) ) myPi(pRem) # Probability of escaping detection 1 - rowSums(myPi(pRem)) # makeCrPiFun # =========== p <- matrix(0.4, 2, 3) myPi <- makeCrPiFun(3) myPi(p) myPi # Look at the function ls(environment(myPi)) environment(myPi)$histories p <- matrix(runif(6, 0.1, 0.9), 2, 3) # different p's everywhere myPi(p) p <- matrix(runif(4*5, 0.1, 0.9), 4, 5) # > 3 occasions try(myPi(p)) # Error myPi <- makeCrPiFun(5) ( tmp <- myPi(p) ) 1 - rowSums(tmp) # Probability of non-capture # makeCrPiFunMb # ============== ( pMb <- cbind(rep(0.7, 5), 0.3, NA) ) myPi <- makeCrPiFunMb(3) myPi(pMb) ( pMb <- matrix(runif(15), 5, 3) ) # col #3 will be ignored myPi(pMb) # with > 3 occasions ( pMb <- matrix(runif(15), 3, 5) ) try(myPi(pMb)) myPi <- makeCrPiFunMb(5) myPi(pMb) # makeCrPiFunMh # ============= pMh <- cbind(rep(0.4, 5), NA, NA) pMh[1, 2] <- 0.3 pMh myPi <- makeCrPiFunMh(3) myPi(pMh) pMh <- cbind(runif(5), NA, NA) pMh[1, 2] <- 0.3 pMh myPi(pMh) # with > 3 occasions pMh <- cbind(runif(5), NA, NA, NA, NA) pMh[1, 2] <- 0.3 pMh try(myPi(pMh)) myPi <- makeCrPiFunMh(5) 1 - rowSums(myPi(pMh)) # Probability of non-detection
Mallard repeated count data and covariates
data(mallard)
data(mallard)
A list with 3 components
response matrix
site-specific covariates
survey-specific covariates
Kery, M., Royle, J. A., and Schmid, H. (2005) Modeling Avaian Abundance from Replicated Counts Using Binomial Mixture Models. Ecological Applications 15(4), pp. 1450–1461.
data(mallard) str(mallard.y) str(mallard.site) str(mallard.obs)
data(mallard) str(mallard.y) str(mallard.site) str(mallard.obs)
masspcru
contains NAAMP data for Pseudacris
crucifer (pcru) in Massachusetts from 2001 to 2007 in the raw long format.
data(masspcru)
data(masspcru)
Data frame with
Year of data collection.
Stop number.
Day of year.
Observed calling index.
Minutes after sunset of the observation.
Temperature measured during observation.
These data come from the North American Amphibian Monitoring Program. Please see the reference below for more details.
https://www.pwrc.usgs.gov/naamp/
Mossman MJ, Weir LA. North American Amphibian Monitoring Program (NAAMP). Amphibian Declines: the conservation status of United States species. University of California Press, Berkeley, California, USA. 2005:307-313.
data(masspcru) str(masspcru)
data(masspcru) str(masspcru)
Occupancy data and site covariates for coyote, red fox, and bobcat from 1437 camera trap sites sampled 3 times. Each sampling period represents one week. This data is a simplified form of the dataset used by Rota et al. (2016).
data(MesoCarnivores)
data(MesoCarnivores)
A list with four elements:
bobcat
A 1437x3 occupancy matrix for bobcat
coyote
A 1437x3 occupancy matrix for coyote
redfox
A 1437x3 occupancy matrix for red fox
sitecovs
A data frame containing covariates for the 1437 sites, with the following columns:
Dist_5km
Proportion of disturbed land in 5 km radius
HDens_5km
Housing density in 5 km radius
Latitude
Latitude / 100
Longitude
Longitude / 100
People_site
Number of photos of people at site / 1000
Trail
1 if camera was on trail, 0 if not
Used with permission of Roland Kays and Arielle Parsons at North Carolina State University and the North Carolina Museum of Natural Sciences.
Rota, C.T., et al. 2016. A multi-species occupancy model for two or more interacting species. Methods in Ecology and Evolution 7: 1164-1173.
Model selection results from an unmarkedFitList
object |
an object of class "unmarkedFitList" created by the function
|
nullmod |
optional character naming which model in the
|
A S4 object with the following slots
Full |
data.frame with formula, estimates, standard errors and model selection information. Converge is optim convergence code. CondNum is model condition number. n is the number of sites. delta is delta AIC. cumltvWt is cumulative AIC weight. Rsq is Nagelkerke's (1991) R-squared index, which is only returned when the nullmod argument is specified. |
Names |
matrix referencing column names of estimates (row 1) and standard errors (row 2). |
Two requirements exist to conduct AIC-based model-selection and model-averaging in unmarked. First, the data objects (ie, unmarkedFrames) must be identical among fitted models. Second, the response matrix must be identical among fitted models after missing values have been removed. This means that if a response value was removed in one model due to missingness, it needs to be removed from all models.
Richard Chandler [email protected]
Nagelkerke, N.J.D. (2004) A Note on a General Definition of the Coefficient of Determination. Biometrika 78, pp. 691-692.
data(linetran) (dbreaksLine <- c(0, 5, 10, 15, 20)) lengths <- linetran$Length * 1000 ltUMF <- with(linetran, { unmarkedFrameDS(y = cbind(dc1, dc2, dc3, dc4), siteCovs = data.frame(Length, area, habitat), dist.breaks = dbreaksLine, tlength = lengths, survey = "line", unitsIn = "m") }) fm1 <- distsamp(~ 1 ~1, ltUMF) fm2 <- distsamp(~ area ~1, ltUMF) fm3 <- distsamp( ~ 1 ~area, ltUMF) fl <- fitList(Null=fm1, A.=fm2, .A=fm3) fl ms <- modSel(fl, nullmod="Null") ms coef(ms) # Estimates only SE(ms) # Standard errors only (toExport <- as(ms, "data.frame")) # Everything
data(linetran) (dbreaksLine <- c(0, 5, 10, 15, 20)) lengths <- linetran$Length * 1000 ltUMF <- with(linetran, { unmarkedFrameDS(y = cbind(dc1, dc2, dc3, dc4), siteCovs = data.frame(Length, area, habitat), dist.breaks = dbreaksLine, tlength = lengths, survey = "line", unitsIn = "m") }) fm1 <- distsamp(~ 1 ~1, ltUMF) fm2 <- distsamp(~ area ~1, ltUMF) fm3 <- distsamp( ~ 1 ~area, ltUMF) fl <- fitList(Null=fm1, A.=fm2, .A=fm3) fl ms <- modSel(fl, nullmod="Null") ms coef(ms) # Estimates only SE(ms) # Standard errors only (toExport <- as(ms, "data.frame")) # Everything
Fit the multinomial-Poisson mixture model to data collected using survey methods such as removal sampling or double observer sampling.
multinomPois(formula, data, starts, method = "BFGS", se = TRUE, engine=c("C","R","TMB"), ...)
multinomPois(formula, data, starts, method = "BFGS", se = TRUE, engine=c("C","R","TMB"), ...)
formula |
double right-hand side formula for detection and abundance covariates, in that order. |
data |
unmarkedFrame supplying data. |
starts |
vector of starting values. |
method |
Optimization method used by |
se |
logical specifying whether or not to compute standard errors. |
engine |
Either "C" to use fast C++ code or "R" to use native R code during the optimization. |
... |
Additional arguments to optim, such as lower and upper bounds |
This function takes advantage of the closed form of the integrated likelihood when a latent Poisson distribution is assumed for abundance at each site and a multinomial distribution is taken for the observation state. Many common sampling methods can be framed in this context. For example, double-observer point counts and removal sampling can be analyzed with this function by specifying the proper multinomial cell probablilities. This is done with by supplying the appropriate function (piFun) argument. removalPiFun and doublePiFun are supplied as example cell probability functions.
unmarkedFit object describing the model fit.
Ian Fiske
Royle, J. A. (2004). Generalized estimators of avian abundance from count survey data. Animal Biodiversity and Conservation, 27(1), 375-386.
Royle, J. A., & Dorazio, R. M. (2006). Hierarchical Models of Animal Abundance and Occurrence. Journal Of Agricultural Biological And Environmental Statistics, 11(3), 249.
# Simulate independent double observer data nSites <- 50 lambda <- 10 p1 <- 0.5 p2 <- 0.3 cp <- c(p1*(1-p2), p2*(1-p1), p1*p2) set.seed(9023) N <- rpois(nSites, lambda) y <- matrix(NA, nSites, 3) for(i in 1:nSites) { y[i,] <- rmultinom(1, N[i], c(cp, 1-sum(cp)))[1:3] } # Fit model observer <- matrix(c('A','B'), nSites, 2, byrow=TRUE) umf <- unmarkedFrameMPois(y=y, obsCovs=list(observer=observer), type="double") fm <- multinomPois(~observer-1 ~1, umf) # Estimates of fixed effects e <- coef(fm) exp(e[1]) plogis(e[2:3]) # Estimates of random effects re <- ranef(fm, K=20) #ltheme <- canonical.theme(color = FALSE) #lattice.options(default.theme = ltheme) plot(re, layout=c(10,5)) ## Real data data(ovendata) ovenFrame <- unmarkedFrameMPois(ovendata.list$data, siteCovs=as.data.frame(scale(ovendata.list$covariates[,-1])), type = "removal") (fm1 <- multinomPois(~ 1 ~ ufc + trba, ovenFrame)) # Detection probability for a single pass predict(fm1, type="det")[1,] # Detection probability after 4 removal passes rowSums(getP(fm1)) # Empirical Bayes estimates of abundance at first 25 sites # Very low uncertainty because p is very high plot(ranef(fm1, K=10), layout=c(10,7), xlim=c(-1, 10))
# Simulate independent double observer data nSites <- 50 lambda <- 10 p1 <- 0.5 p2 <- 0.3 cp <- c(p1*(1-p2), p2*(1-p1), p1*p2) set.seed(9023) N <- rpois(nSites, lambda) y <- matrix(NA, nSites, 3) for(i in 1:nSites) { y[i,] <- rmultinom(1, N[i], c(cp, 1-sum(cp)))[1:3] } # Fit model observer <- matrix(c('A','B'), nSites, 2, byrow=TRUE) umf <- unmarkedFrameMPois(y=y, obsCovs=list(observer=observer), type="double") fm <- multinomPois(~observer-1 ~1, umf) # Estimates of fixed effects e <- coef(fm) exp(e[1]) plogis(e[2:3]) # Estimates of random effects re <- ranef(fm, K=20) #ltheme <- canonical.theme(color = FALSE) #lattice.options(default.theme = ltheme) plot(re, layout=c(10,5)) ## Real data data(ovendata) ovenFrame <- unmarkedFrameMPois(ovendata.list$data, siteCovs=as.data.frame(scale(ovendata.list$covariates[,-1])), type = "removal") (fm1 <- multinomPois(~ 1 ~ ufc + trba, ovenFrame)) # Detection probability for a single pass predict(fm1, type="det")[1,] # Detection probability after 4 removal passes rowSums(getP(fm1)) # Empirical Bayes estimates of abundance at first 25 sites # Very low uncertainty because p is very high plot(ranef(fm1, K=10), layout=c(10,7), xlim=c(-1, 10))
Fit the model of Dail and Madsen (2011) and Hostetler and Chandler (2015) for designs involving repeated counts that yield multinomial outcomes. Possible data collection methods include repeated removal sampling and double observer sampling.
multmixOpen(lambdaformula, gammaformula, omegaformula, pformula, data, mixture=c("P", "NB", "ZIP"), K, dynamics=c("constant", "autoreg", "notrend", "trend", "ricker", "gompertz"), fix=c("none", "gamma", "omega"), immigration=FALSE, iotaformula = ~1, starts, method="BFGS", se=TRUE, ...)
multmixOpen(lambdaformula, gammaformula, omegaformula, pformula, data, mixture=c("P", "NB", "ZIP"), K, dynamics=c("constant", "autoreg", "notrend", "trend", "ricker", "gompertz"), fix=c("none", "gamma", "omega"), immigration=FALSE, iotaformula = ~1, starts, method="BFGS", se=TRUE, ...)
lambdaformula |
Right-hand sided formula for initial abundance |
gammaformula |
Right-hand sided formula for recruitment rate (when dynamics is "constant", "autoreg", or "notrend") or population growth rate (when dynamics is "trend", "ricker", or "gompertz") |
omegaformula |
Right-hand sided formula for apparent survival probability (when dynamics is "constant", "autoreg", or "notrend") or equilibrium abundance (when dynamics is "ricker" or "gompertz") |
pformula |
A right-hand side formula describing the detection function covariates |
data |
An object of class |
mixture |
String specifying mixture: "P", "NB", or "ZIP" for the Poisson, negative binomial, or zero-inflated Poisson distributions respectively |
K |
Integer defining upper bound of discrete integration. This should be higher than the maximum observed count and high enough that it does not affect the parameter estimates. However, the higher the value the slower the computation |
dynamics |
Character string describing the type of population dynamics. "constant" indicates that there is no relationship between omega and gamma. "autoreg" is an auto-regressive model in which recruitment is modeled as gamma*N[i,t-1]. "notrend" model gamma as lambda*(1-omega) such that there is no temporal trend. "trend" is a model for exponential growth, N[i,t] = N[i,t-1]*gamma, where gamma in this case is finite rate of increase (normally referred to as lambda). "ricker" and "gompertz" are models for density-dependent population growth. "ricker" is the Ricker-logistic model, N[i,t] = N[i,t-1]*exp(gamma*(1-N[i,t-1]/omega)), where gamma is the maximum instantaneous population growth rate (normally referred to as r) and omega is the equilibrium abundance (normally referred to as K). "gompertz" is a modified version of the Gompertz-logistic model, N[i,t] = N[i,t-1]*exp(gamma*(1-log(N[i,t-1]+1)/log(omega+1))), where the interpretations of gamma and omega are similar to in the Ricker model |
fix |
If "omega", omega is fixed at 1. If "gamma", gamma is fixed at 0 |
immigration |
Logical specifying whether or not to include an immigration term (iota) in population dynamics |
iotaformula |
Right-hand sided formula for average number of immigrants to a site per time step |
starts |
Vector of starting values |
method |
Optimization method used by |
se |
Logical specifying whether or not to compute standard errors |
... |
Additional arguments to optim, such as lower and upper bounds |
These models generalize multinomial N-mixture models (Royle et al. 2004) by relaxing the closure assumption (Dail and Madsen 2011, Hostetler and Chandler 2015, Sollmann et al. 2015).
The models include two or three additional parameters: gamma, either the recruitment rate (births and immigrations), the finite rate of increase, or the maximum instantaneous rate of increase; omega, either the apparent survival rate (deaths and emigrations) or the equilibrium abundance (carrying capacity); and iota, the number of immigrants per site and year. Estimates of population size at each time period can be derived from these parameters, and thus so can trend estimates. Or, trend can be estimated directly using dynamics="trend".
When immigration is set to FALSE (the default), iota is not modeled. When immigration is set to TRUE and dynamics is set to "autoreg", the model will separately estimate birth rate (gamma) and number of immigrants (iota). When immigration is set to TRUE and dynamics is set to "trend", "ricker", or "gompertz", the model will separately estimate local contributions to population growth (gamma and omega) and number of immigrants (iota).
The latent abundance distribution, can be set as a Poisson, negative binomial, or zero-inflated
Poisson random
variable, depending on the setting of the
mixture
argument,
mixture = "P"
, mixture = "NB"
, mixture = "ZIP"
respectively. For the first two distributions, the mean of is
. If
, then an
additional parameter,
, describes dispersion (lower
implies higher variance). For the ZIP distribution,
the mean is
, where psi is the
zero-inflation parameter.
For "constant", "autoreg", or "notrend" dynamics, the latent abundance state
following the initial sampling period arises
from a
Markovian process in which survivors are modeled as , and recruits
follow
.
Alternative population dynamics can be specified
using the
dynamics
and immigration
arguments.
,
, and
are modeled
using the the log link.
is modeled using
the logit link.
is either modeled using the logit link (for
"constant", "autoreg", or "notrend" dynamics) or the log link (for "ricker"
or "gompertz" dynamics). For "trend" dynamics,
is not modeled.
The detection process is modeled as multinomial:
,
where
is the multinomial cell probability for
plot i at time t on occasion j.
Options for the detection process include equal-interval removal sampling
("removal"
), double observer sampling ("double"
), or
dependent double-observer sampling ("depDouble"
). This option is
specified when setting up the data using unmarkedFrameMMO
. Note
that unlike the related functions multinomPois
and
gmultmix
, custom functions for the detection process (i.e.,
piFun
s) are not supported. To request additional options contact the author.
An object of class unmarkedFitMMO
This function can be extremely slow, especially if there are covariates of gamma or omega. Consider testing the timing on a small subset of the data, perhaps with se=FALSE. Finding the lowest value of K that does not affect estimates will also help with speed.
When gamma or omega are modeled using year-specific covariates, the covariate data for the final year will be ignored; however, they must be supplied.
If the time gap between primary periods is not constant, an M by T
matrix of integers should be supplied to unmarkedFrameMMO
using the primaryPeriod
argument.
Secondary sampling periods are optional, but can greatly improve the precision of the estimates.
Ken Kellner [email protected], Richard Chandler
Dail, D. and L. Madsen (2011) Models for Estimating Abundance from Repeated Counts of an Open Metapopulation. Biometrics. 67: 577-587.
Hostetler, J. A. and R. B. Chandler (2015) Improved State-space Models for Inference about Spatial and Temporal Variation in Abundance from Count Data. Ecology 96: 1713-1723.
Royle, J. A. (2004). Generalized estimators of avian abundance from count survey data. Animal Biodiversity and Conservation 27(1), 375-386.
multinomPois, gmultmix, unmarkedFrameMMO
#Generate some data set.seed(123) lambda=4; gamma=0.5; omega=0.8; p=0.5 M <- 100; T <- 5 y <- array(NA, c(M, 3, T)) N <- matrix(NA, M, T) S <- G <- matrix(NA, M, T-1) for(i in 1:M) { N[i,1] <- rpois(1, lambda) y[i,1,1] <- rbinom(1, N[i,1], p) # Observe some Nleft1 <- N[i,1] - y[i,1,1] # Remove them y[i,2,1] <- rbinom(1, Nleft1, p) # ... Nleft2 <- Nleft1 - y[i,2,1] y[i,3,1] <- rbinom(1, Nleft2, p) for(t in 1:(T-1)) { S[i,t] <- rbinom(1, N[i,t], omega) G[i,t] <- rpois(1, gamma) N[i,t+1] <- S[i,t] + G[i,t] y[i,1,t+1] <- rbinom(1, N[i,t+1], p) # Observe some Nleft1 <- N[i,t+1] - y[i,1,t+1] # Remove them y[i,2,t+1] <- rbinom(1, Nleft1, p) # ... Nleft2 <- Nleft1 - y[i,2,t+1] y[i,3,t+1] <- rbinom(1, Nleft2, p) } } y=matrix(y, M) #Create some random covariate data sc <- data.frame(x1=rnorm(100)) ## Not run: #Create unmarked frame umf <- unmarkedFrameMMO(y=y, numPrimary=5, siteCovs=sc, type="removal") #Fit model (fit <- multmixOpen(~x1, ~1, ~1, ~1, K=30, data=umf)) #Compare to truth cf <- coef(fit) data.frame(model=c(exp(cf[1]), cf[2], exp(cf[3]), plogis(cf[4]), plogis(cf[5])), truth=c(lambda, 0, gamma, omega, p)) #Predict head(predict(fit, type='lambda')) #Check fit with parametric bootstrap pb <- parboot(fit, nsims=15) plot(pb) # Empirical Bayes estimates of abundance for each site / year re <- ranef(fit) plot(re, layout=c(10,5), xlim=c(-1, 10)) ## End(Not run)
#Generate some data set.seed(123) lambda=4; gamma=0.5; omega=0.8; p=0.5 M <- 100; T <- 5 y <- array(NA, c(M, 3, T)) N <- matrix(NA, M, T) S <- G <- matrix(NA, M, T-1) for(i in 1:M) { N[i,1] <- rpois(1, lambda) y[i,1,1] <- rbinom(1, N[i,1], p) # Observe some Nleft1 <- N[i,1] - y[i,1,1] # Remove them y[i,2,1] <- rbinom(1, Nleft1, p) # ... Nleft2 <- Nleft1 - y[i,2,1] y[i,3,1] <- rbinom(1, Nleft2, p) for(t in 1:(T-1)) { S[i,t] <- rbinom(1, N[i,t], omega) G[i,t] <- rpois(1, gamma) N[i,t+1] <- S[i,t] + G[i,t] y[i,1,t+1] <- rbinom(1, N[i,t+1], p) # Observe some Nleft1 <- N[i,t+1] - y[i,1,t+1] # Remove them y[i,2,t+1] <- rbinom(1, Nleft1, p) # ... Nleft2 <- Nleft1 - y[i,2,t+1] y[i,3,t+1] <- rbinom(1, Nleft2, p) } } y=matrix(y, M) #Create some random covariate data sc <- data.frame(x1=rnorm(100)) ## Not run: #Create unmarked frame umf <- unmarkedFrameMMO(y=y, numPrimary=5, siteCovs=sc, type="removal") #Fit model (fit <- multmixOpen(~x1, ~1, ~1, ~1, K=30, data=umf)) #Compare to truth cf <- coef(fit) data.frame(model=c(exp(cf[1]), cf[2], exp(cf[3]), plogis(cf[4]), plogis(cf[5])), truth=c(lambda, 0, gamma, omega, p)) #Predict head(predict(fit, type='lambda')) #Check fit with parametric bootstrap pb <- parboot(fit, nsims=15) plot(pb) # Empirical Bayes estimates of abundance for each site / year re <- ranef(fit) plot(re, layout=c(10,5), xlim=c(-1, 10)) ## End(Not run)
Fit N-mixture models with time-to-detection data.
nmixTTD(stateformula= ~1, detformula = ~1, data, K=100, mixture = c("P","NB"), ttdDist = c("exp", "weibull"), starts, method="BFGS", se=TRUE, engine = c("C", "R"), threads = 1, ...)
nmixTTD(stateformula= ~1, detformula = ~1, data, K=100, mixture = c("P","NB"), ttdDist = c("exp", "weibull"), starts, method="BFGS", se=TRUE, engine = c("C", "R"), threads = 1, ...)
stateformula |
Right-hand sided formula for the abundance at each site. |
detformula |
Right-hand sided formula for mean time-to-detection. |
data |
|
K |
The upper summation index used to numerically integrate out the latent abundance. This should be set high enough so that it does not affect the parameter estimates. Computation time will increase with K. |
mixture |
String specifying mixture distribution: "P" for Poisson or "NB" for negative binomial. |
ttdDist |
Distribution to use for time-to-detection; either
|
starts |
optionally, initial values for parameters in the optimization. |
method |
Optimization method used by |
se |
logical specifying whether or not to compute standard errors. |
engine |
Either "C" or "R" to use fast C++ code or native R code during the optimization. |
threads |
Set the number of threads to use for optimization in C++, if
OpenMP is available on your system. Increasing the number of threads
may speed up optimization in some cases by running the likelihood
calculation in parallel. If |
... |
Additional arguments to optim, such as lower and upper bounds |
This model extends time-to-detection (TTD) occupancy models to estimate site
abundance using data from single or repeated visits. Latent abundance can be
modeled as Poisson (mixture="P"
) or negative binomial (mixture="NB"
).
Time-to-detection can be modeled as an exponential (ttdDist="exp"
) or
Weibull (ttdDist="weibull"
) random variable with rate parameter
and, for the Weibull, an additional shape parameter
. Note that
occuTTD
puts covariates on and not
, i.e.,
the expected time between events.
Assuming that there are independent individuals at a site, and all
individuals have the same individual detection rate, the expected
detection rate across all individuals
is equal to the the
individual-level detection rate
multipled by the number of individuals
present
.
In the case where there are no detections before the maximum sample time at
a site (surveyLength
) is reached, we are not sure if the site has
or if we just didn't wait long enough for a detection. We therefore
must censor (
the exponential or Weibull distribution at the maximum survey
length,
. Thus, assuming true abundance at site
is
, and an exponential distribution for the TTD
(parameterized
with the rate), then:
Note that when , the exponential rate
and the
scale is therefore
, and thus the value will be censored
at
.
Because in unmarked
values of NA
are typically used to indicate
missing values that were a result of the sampling structure (e.g., lost data),
we indicate a censored in
nmixTTD
instead by setting
in the
y
matrix provided to
unmarkedFrameOccuTTD
. You can provide either a single value of
to the
surveyLength
argument of unmarkedFrameOccuTTD
,
or provide a matrix, potentially with a unique value of for each
value of
y
. Note that in the latter case the value of y
that will
be interpreted by nmixTTD
as a censored observation (i.e., )
will differ between observations!
unmarkedFitNmixTTD object describing model fit.
Ken Kellner [email protected]
Strebel, N., Fiss, C., Kellner, K. F., Larkin, J. L., Kery, M., & Cohen, J (2021). Estimating abundance based on time-to-detection data. Methods in Ecology and Evolution 12: 909-920.
unmarked
, unmarkedFrameOccuTTD
## Not run: # Simulate data M = 1000 # Number of sites nrep <- 3 # Number of visits per site Tmax = 5 # Max duration of a visit alpha1 = -1 # Covariate on rate beta1 = 1 # Covariate on density mu.lambda = 1 # Rate at alpha1 = 0 mu.dens = 1 # Density at beta1 = 0 covDet <- matrix(rnorm(M*nrep),nrow = M,ncol = nrep) #Detection covariate covDens <- rnorm(M) #Abundance/density covariate dens <- exp(log(mu.dens) + beta1 * covDens) sum(N <- rpois(M, dens)) # Realized density per site lambda <- exp(log(mu.lambda) + alpha1 * covDet) # per-individual detection rate ttd <- NULL for(i in 1:nrep) { ttd <- cbind(ttd,rexp(M, N*lambda[,i])) # Simulate time to first detection per visit } ttd[N == 0,] <- 5 # Not observed where N = 0; ttd set to Tmax ttd[ttd >= Tmax] <- 5 # Crop at Tmax #Build unmarked frame umf <- unmarkedFrameOccuTTD(y = ttd, surveyLength=5, siteCovs = data.frame(covDens=covDens), obsCovs = data.frame(covDet=as.vector(t(covDet)))) #Fit model fit <- nmixTTD(~covDens, ~covDet, data=umf, K=max(N)+10) #Compare to truth cbind(coef(fit), c(log(mu.dens), beta1, log(mu.lambda), alpha1)) #Predict abundance/density values head(predict(fit, type='state')) ## End(Not run)
## Not run: # Simulate data M = 1000 # Number of sites nrep <- 3 # Number of visits per site Tmax = 5 # Max duration of a visit alpha1 = -1 # Covariate on rate beta1 = 1 # Covariate on density mu.lambda = 1 # Rate at alpha1 = 0 mu.dens = 1 # Density at beta1 = 0 covDet <- matrix(rnorm(M*nrep),nrow = M,ncol = nrep) #Detection covariate covDens <- rnorm(M) #Abundance/density covariate dens <- exp(log(mu.dens) + beta1 * covDens) sum(N <- rpois(M, dens)) # Realized density per site lambda <- exp(log(mu.lambda) + alpha1 * covDet) # per-individual detection rate ttd <- NULL for(i in 1:nrep) { ttd <- cbind(ttd,rexp(M, N*lambda[,i])) # Simulate time to first detection per visit } ttd[N == 0,] <- 5 # Not observed where N = 0; ttd set to Tmax ttd[ttd >= Tmax] <- 5 # Crop at Tmax #Build unmarked frame umf <- unmarkedFrameOccuTTD(y = ttd, surveyLength=5, siteCovs = data.frame(covDens=covDens), obsCovs = data.frame(covDet=as.vector(t(covDet)))) #Fit model fit <- nmixTTD(~covDens, ~covDet, data=umf, K=max(N)+10) #Compare to truth cbind(coef(fit), c(log(mu.dens), beta1, log(mu.lambda), alpha1)) #Predict abundance/density values head(predict(fit, type='state')) ## End(Not run)
unmarked
modelResamples sites (with replacement) from an unmarkedFit
object and
re-fits the model, to generate new non-parametric bootstrap samples of estimates.
These samples can then be used by vcov
in order to get bootstrap-based
estimates of standard errors.
## S4 method for signature 'unmarkedFit' nonparboot(object, B = 1, keepOldSamples = TRUE, ...)
## S4 method for signature 'unmarkedFit' nonparboot(object, B = 1, keepOldSamples = TRUE, ...)
object |
A |
B |
The number of bootstrap samples to get |
keepOldSamples |
Should previously generated bootstrap samples be retained? |
... |
Other arguments, currently ignored |
Calling nonparboot
on an unmarkedFit returns the original
unmarkedFit, with the bootstrap samples added on.
Then subsequent calls to vcov
with the argument
method="nonparboot"
will use these bootstrap samples.
Additionally, standard errors of derived estimates from either
linearComb
or backTransform
can be
instructed to use bootstrap samples by providing the argument
method = "nonparboot"
.
For all fitting functions, only sites are re-sampled. In previous versions of
unmarked
both sites and occasions
of occu
and occuRN
fits were re-sampled, but now
only sites are re-sampled for these fit types to match the other fit types.
Note: by default, if the model fit to a new resampled dataset results in an
error or a convergence failure, nonparboot
will continually
re-sample and re-fit until this is no longer the case. This could
result in the function hanging.
data(ovendata) ovenFrame <- unmarkedFrameMPois(ovendata.list$data, siteCovs=as.data.frame(scale(ovendata.list$covariates[,-1])), type = "removal") (fm <- multinomPois(~ 1 ~ ufc + trba, ovenFrame)) fm <- nonparboot(fm, B = 20) # should use larger B in real life. vcov(fm, method = "hessian") vcov(fm, method = "nonparboot") avg.abundance <- backTransform(linearComb(fm, type = "state", coefficients = c(1, 0, 0))) ## Bootstrap sample information propagates through to derived quantities. vcov(avg.abundance, method = "hessian") vcov(avg.abundance, method = "nonparboot") SE(avg.abundance, method = "nonparboot")
data(ovendata) ovenFrame <- unmarkedFrameMPois(ovendata.list$data, siteCovs=as.data.frame(scale(ovendata.list$covariates[,-1])), type = "removal") (fm <- multinomPois(~ 1 ~ ufc + trba, ovenFrame)) fm <- nonparboot(fm, B = 20) # should use larger B in real life. vcov(fm, method = "hessian") vcov(fm, method = "nonparboot") avg.abundance <- backTransform(linearComb(fm, type = "state", coefficients = c(1, 0, 0))) ## Bootstrap sample information propagates through to derived quantities. vcov(avg.abundance, method = "hessian") vcov(avg.abundance, method = "nonparboot") SE(avg.abundance, method = "nonparboot")
This function fits the single season occupancy model of MacKenzie et al (2002).
occu(formula, data, knownOcc=numeric(0), linkPsi=c("logit", "cloglog"), starts, method="BFGS", se=TRUE, engine=c("C", "R", "TMB"), threads = 1, ...)
occu(formula, data, knownOcc=numeric(0), linkPsi=c("logit", "cloglog"), starts, method="BFGS", se=TRUE, engine=c("C", "R", "TMB"), threads = 1, ...)
formula |
Double right-hand side formula describing covariates of detection and occupancy in that order. |
data |
An |
knownOcc |
Vector of sites that are known to be occupied. These should be supplied as row numbers of the y matrix, eg, c(3,8) if sites 3 and 8 were known to be occupied a priori. |
linkPsi |
Link function for the occupancy model. Options are
|
starts |
Vector of parameter starting values. |
method |
Optimization method used by |
se |
Logical specifying whether or not to compute standard errors. |
engine |
Code to use for optimization. Either "C" for fast C++ code, "R" for native R code, or "TMB" for Template Model Builder. "TMB" is used automatically if your formula contains random effects. |
threads |
Set the number of threads to use for optimization in C++, if
OpenMP is available on your system. Increasing the number of threads
may speed up optimization in some cases by running the likelihood
calculation in parallel. If |
... |
Additional arguments to optim, such as lower and upper bounds |
See unmarkedFrame
and unmarkedFrameOccu
for a
description of how to supply data to the data
argument.
occu
fits the standard occupancy model based on zero-inflated
binomial models (MacKenzie et al. 2006, Royle and Dorazio
2008). The occupancy state process () of site
is
modeled as
The observation process is modeled as
By default, covariates of and
are modeled
using the logit link according to the
formula
argument. The formula is a double right-hand sided formula
like ~ detform ~ occform
where detform
is a formula for the detection process and occform
is a
formula for the partially observed occupancy state. See formula for details on constructing model formulae
in R.
When linkPsi = "cloglog"
, the complimentary log-log link
function is used for instead of the logit link. The cloglog link
relates occupancy probability to the intensity parameter of an underlying
Poisson process (Kery and Royle 2016). Thus, if abundance at a site is
can be modeled as
, where
, then presence/absence data at the
site can be modeled as
where
.
unmarkedFitOccu object describing the model fit.
Ian Fiske
Kery, Marc, and J. Andrew Royle. 2016. Applied Hierarchical Modeling in Ecology, Volume 1. Academic Press.
MacKenzie, D. I., J. D. Nichols, G. B. Lachman, S. Droege, J. Andrew Royle, and C. A. Langtimm. 2002. Estimating Site Occupancy Rates When Detection Probabilities Are Less Than One. Ecology 83: 2248-2255.
MacKenzie, D. I. et al. 2006. Occupancy Estimation and Modeling. Amsterdam: Academic Press.
Royle, J. A. and R. Dorazio. 2008. Hierarchical Modeling and Inference in Ecology. Academic Press.
unmarked
, unmarkedFrameOccu
,
modSel
, parboot
data(frogs) pferUMF <- unmarkedFrameOccu(pfer.bin) plot(pferUMF, panels=4) # add some fake covariates for illustration siteCovs(pferUMF) <- data.frame(sitevar1 = rnorm(numSites(pferUMF))) # observation covariates are in site-major, observation-minor order obsCovs(pferUMF) <- data.frame(obsvar1 = rnorm(numSites(pferUMF) * obsNum(pferUMF))) (fm <- occu(~ obsvar1 ~ 1, pferUMF)) confint(fm, type='det', method = 'normal') confint(fm, type='det', method = 'profile') # estimate detection probability and 95% CI at obsvars=0.5 nd <- data.frame(obsvar1 = 0.5) predict(fm, type = "det", newdata = nd, appendData = TRUE) # Empirical Bayes estimates of proportion of sites occupied re <- ranef(fm) sum(bup(re, stat="mode"))
data(frogs) pferUMF <- unmarkedFrameOccu(pfer.bin) plot(pferUMF, panels=4) # add some fake covariates for illustration siteCovs(pferUMF) <- data.frame(sitevar1 = rnorm(numSites(pferUMF))) # observation covariates are in site-major, observation-minor order obsCovs(pferUMF) <- data.frame(obsvar1 = rnorm(numSites(pferUMF) * obsNum(pferUMF))) (fm <- occu(~ obsvar1 ~ 1, pferUMF)) confint(fm, type='det', method = 'normal') confint(fm, type='det', method = 'profile') # estimate detection probability and 95% CI at obsvars=0.5 nd <- data.frame(obsvar1 = 0.5) predict(fm, type = "det", newdata = nd, appendData = TRUE) # Empirical Bayes estimates of proportion of sites occupied re <- ranef(fm) sum(bup(re, stat="mode"))
This function fits a single season occupancy model using count data.
occuCOP(data, psiformula = ~1, lambdaformula = ~1, psistarts, lambdastarts, starts, method = "BFGS", se = TRUE, engine = c("C", "R"), na.rm = TRUE, return.negloglik = NULL, L1 = FALSE, ...)
occuCOP(data, psiformula = ~1, lambdaformula = ~1, psistarts, lambdastarts, starts, method = "BFGS", se = TRUE, engine = c("C", "R"), na.rm = TRUE, return.negloglik = NULL, L1 = FALSE, ...)
data |
An |
psiformula |
Formula describing the occupancy covariates. |
lambdaformula |
Formula describing the detection covariates. |
psistarts |
Vector of starting values for likelihood maximisation with |
lambdastarts |
Vector of starting values for likelihood maximisation with |
starts |
Vector of starting values for likelihood maximisation with |
method |
Optimisation method used by |
se |
Logical specifying whether to compute ( |
engine |
Code to use for optimisation. Either |
na.rm |
Logical specifying whether to fit the model ( |
return.negloglik |
A list of vectors of parameters ( |
L1 |
Logical specifying whether the length of observations ( |
... |
Additional arguments to pass to |
See unmarkedFrameOccuCOP
for a description of how to supply data to the data
argument. See unmarkedFrame
for a more general documentation of unmarkedFrame
objects for the different models implemented in unmarked.
occuCOP
fits a single season occupancy model using count data, as described in Pautrel et al. (2023).
The occupancy sub-model is:
With the occupany state of site
.
if site
is occupied by the species, i.e. if the species is present in site
.
if site
is not occupied.
With the occupancy probability of site
.
The observation sub-model is:
With the count of detection events in site
during observation
.
With the detection rate in site
during observation
(for example, 1 detection per day.).
With the length of observation
in site
(for example, 7 days.).
What we call "observation" () here can be a sampling occasion, a transect, a discretised session. Consequently, the unit of
and
can be either a time-unit (day, hour, ...) or a space-unit (kilometer, meter, ...).
and
In order to perform unconstrained optimisation, parameters are transformed.
The occupancy probability () is transformed with the logit function (
psi_transformed = qlogis(psi)
). It can be back-transformed with the "inverse logit" function (psi = plogis(psi_transformed)
).
The detection rate () is transformed with the log function (
lambda_transformed = log(lambda)
). It can be back-transformed with the exponential function (lambda = exp(lambda_transformed)
).
unmarkedFitOccuCOP
object describing the model fit. See the unmarkedFit
classes.
Léa Pautrel
Pautrel, L., Moulherat, S., Gimenez, O. & Etienne, M.-P. Submitted. Analysing biodiversity observation data collected in continuous time: Should we use discrete or continuous-time occupancy models? Preprint at doi:10.1101/2023.11.17.567350.
unmarked
,
unmarkedFrameOccuCOP
,
unmarkedFit-class
set.seed(123) options(max.print = 50) # We simulate data in 100 sites with 3 observations of 7 days per site. nSites <- 100 nObs <- 3 # For an occupancy covariate, we associate each site to a land-use category. landuse <- sample(factor(c("Forest", "Grassland", "City"), ordered = TRUE), size = nSites, replace = TRUE) simul_psi <- ifelse(landuse == "Forest", 0.8, ifelse(landuse == "Grassland", 0.4, 0.1)) z <- rbinom(n = nSites, size = 1, prob = simul_psi) # For a detection covariate, we create a fake wind variable. wind <- matrix(rexp(n = nSites * nObs), nrow = nSites, ncol = nObs) simul_lambda <- wind / 5 L = matrix(7, nrow = nSites, ncol = nObs) # We now simulate count detection data y <- matrix(rpois(n = nSites * nObs, lambda = simul_lambda * L), nrow = nSites, ncol = nObs) * z # We create our unmarkedFrameOccuCOP object umf <- unmarkedFrameOccuCOP( y = y, L = L, siteCovs = data.frame("landuse" = landuse), obsCovs = list("wind" = wind) ) print(umf) # We fit our model without covariates fitNull <- occuCOP(data = umf) print(fitNull) # We fit our model with covariates fitCov <- occuCOP(data = umf, psiformula = ~ landuse, lambdaformula = ~ wind) print(fitCov) # We back-transform the parameter's estimates ## Back-transformed occupancy probability with no covariates predict(fitNull, "psi")[1,] ## Back-transformed occupancy probability depending on habitat use predict(fitCov, "psi", newdata = data.frame("landuse" = c("Forest", "Grassland", "City")), appendData = TRUE) ## Back-transformed detection rate with no covariates predict(fitNull, "lambda")[1,] ## Back-transformed detection rate depending on wind predict(fitCov, "lambda", appendData = TRUE) ## This is not easily readable. We can show the results in a clearer way, by: ## - adding the site and observation ## - printing only the wind covariate used to get the predicted lambda cbind( data.frame( "site" = rep(1:nSites, each = nObs), "observation" = rep(1:nObs, times = nSites), "wind" = getData(fitCov)@obsCovs ), predict(fitCov, "lambda", appendData = FALSE) ) # We can choose the initial parameters when fitting our model. # For psi, intituively, the initial value can be the proportion of sites # in which we have observations. (psi_init <- mean(rowSums(y) > 0)) # For lambda, the initial value can be the mean count of detection events # in sites in which there was at least one observation. (lambda_init <- mean(y[rowSums(y) > 0, ])) # We have to transform them. occuCOP( data = umf, psiformula = ~ 1, lambdaformula = ~ 1, psistarts = qlogis(psi_init), lambdastarts = log(lambda_init) ) # If we have covariates, we need to have the right length for the start vectors. # psi ~ landuse --> 3 param to estimate: Intercept, landuseForest, landuseGrassland # lambda ~ wind --> 2 param to estimate: Intercept, wind occuCOP( data = umf, psiformula = ~ landuse, lambdaformula = ~ wind, psistarts = rep(qlogis(psi_init), 3), lambdastarts = rep(log(lambda_init), 2) ) # And with covariates, we could have chosen better initial values, such as the # proportion of sites in which we have observations per land-use category. (psi_init_covs <- c( "City" = mean(rowSums(y[landuse == "City", ]) > 0), "Forest" = mean(rowSums(y[landuse == "Forest", ]) > 0), "Grassland" = mean(rowSums(y[landuse == "Grassland", ]) > 0) )) occuCOP( data = umf, psiformula = ~ landuse, lambdaformula = ~ wind, psistarts = qlogis(psi_init_covs)) # We can fit our model with a different optimisation algorithm. occuCOP(data = umf, method = "Nelder-Mead") # We can run our model with a C++ or with a R likelihood function. ## They give the same result. occuCOP(data = umf, engine = "C", psistarts = 0, lambdastarts = 0) occuCOP(data = umf, engine = "R", psistarts = 0, lambdastarts = 0) ## The C++ (the default) is faster. system.time(occuCOP(data = umf, engine = "C", psistarts = 0, lambdastarts = 0)) system.time(occuCOP(data = umf, engine = "R", psistarts = 0, lambdastarts = 0)) ## However, if you want to understand how the likelihood is calculated, ## you can easily access the R likelihood function. print(occuCOP(data = umf, engine = "R", psistarts = 0, lambdastarts = 0)@nllFun) # Finally, if you do not want to fit your model but only get the likelihood, # you can get the negative log-likelihood for a given set of parameters. occuCOP(data = umf, return.negloglik = list( c("psi" = qlogis(0.25), "lambda" = log(2)), c("psi" = qlogis(0.5), "lambda" = log(1)), c("psi" = qlogis(0.75), "lambda" = log(0.5)) ))
set.seed(123) options(max.print = 50) # We simulate data in 100 sites with 3 observations of 7 days per site. nSites <- 100 nObs <- 3 # For an occupancy covariate, we associate each site to a land-use category. landuse <- sample(factor(c("Forest", "Grassland", "City"), ordered = TRUE), size = nSites, replace = TRUE) simul_psi <- ifelse(landuse == "Forest", 0.8, ifelse(landuse == "Grassland", 0.4, 0.1)) z <- rbinom(n = nSites, size = 1, prob = simul_psi) # For a detection covariate, we create a fake wind variable. wind <- matrix(rexp(n = nSites * nObs), nrow = nSites, ncol = nObs) simul_lambda <- wind / 5 L = matrix(7, nrow = nSites, ncol = nObs) # We now simulate count detection data y <- matrix(rpois(n = nSites * nObs, lambda = simul_lambda * L), nrow = nSites, ncol = nObs) * z # We create our unmarkedFrameOccuCOP object umf <- unmarkedFrameOccuCOP( y = y, L = L, siteCovs = data.frame("landuse" = landuse), obsCovs = list("wind" = wind) ) print(umf) # We fit our model without covariates fitNull <- occuCOP(data = umf) print(fitNull) # We fit our model with covariates fitCov <- occuCOP(data = umf, psiformula = ~ landuse, lambdaformula = ~ wind) print(fitCov) # We back-transform the parameter's estimates ## Back-transformed occupancy probability with no covariates predict(fitNull, "psi")[1,] ## Back-transformed occupancy probability depending on habitat use predict(fitCov, "psi", newdata = data.frame("landuse" = c("Forest", "Grassland", "City")), appendData = TRUE) ## Back-transformed detection rate with no covariates predict(fitNull, "lambda")[1,] ## Back-transformed detection rate depending on wind predict(fitCov, "lambda", appendData = TRUE) ## This is not easily readable. We can show the results in a clearer way, by: ## - adding the site and observation ## - printing only the wind covariate used to get the predicted lambda cbind( data.frame( "site" = rep(1:nSites, each = nObs), "observation" = rep(1:nObs, times = nSites), "wind" = getData(fitCov)@obsCovs ), predict(fitCov, "lambda", appendData = FALSE) ) # We can choose the initial parameters when fitting our model. # For psi, intituively, the initial value can be the proportion of sites # in which we have observations. (psi_init <- mean(rowSums(y) > 0)) # For lambda, the initial value can be the mean count of detection events # in sites in which there was at least one observation. (lambda_init <- mean(y[rowSums(y) > 0, ])) # We have to transform them. occuCOP( data = umf, psiformula = ~ 1, lambdaformula = ~ 1, psistarts = qlogis(psi_init), lambdastarts = log(lambda_init) ) # If we have covariates, we need to have the right length for the start vectors. # psi ~ landuse --> 3 param to estimate: Intercept, landuseForest, landuseGrassland # lambda ~ wind --> 2 param to estimate: Intercept, wind occuCOP( data = umf, psiformula = ~ landuse, lambdaformula = ~ wind, psistarts = rep(qlogis(psi_init), 3), lambdastarts = rep(log(lambda_init), 2) ) # And with covariates, we could have chosen better initial values, such as the # proportion of sites in which we have observations per land-use category. (psi_init_covs <- c( "City" = mean(rowSums(y[landuse == "City", ]) > 0), "Forest" = mean(rowSums(y[landuse == "Forest", ]) > 0), "Grassland" = mean(rowSums(y[landuse == "Grassland", ]) > 0) )) occuCOP( data = umf, psiformula = ~ landuse, lambdaformula = ~ wind, psistarts = qlogis(psi_init_covs)) # We can fit our model with a different optimisation algorithm. occuCOP(data = umf, method = "Nelder-Mead") # We can run our model with a C++ or with a R likelihood function. ## They give the same result. occuCOP(data = umf, engine = "C", psistarts = 0, lambdastarts = 0) occuCOP(data = umf, engine = "R", psistarts = 0, lambdastarts = 0) ## The C++ (the default) is faster. system.time(occuCOP(data = umf, engine = "C", psistarts = 0, lambdastarts = 0)) system.time(occuCOP(data = umf, engine = "R", psistarts = 0, lambdastarts = 0)) ## However, if you want to understand how the likelihood is calculated, ## you can easily access the R likelihood function. print(occuCOP(data = umf, engine = "R", psistarts = 0, lambdastarts = 0)@nllFun) # Finally, if you do not want to fit your model but only get the likelihood, # you can get the negative log-likelihood for a given set of parameters. occuCOP(data = umf, return.negloglik = list( c("psi" = qlogis(0.25), "lambda" = log(2)), c("psi" = qlogis(0.5), "lambda" = log(1)), c("psi" = qlogis(0.75), "lambda" = log(0.5)) ))
This function fits the single season occupancy model while allowing for false positive detections.
occuFP(detformula = ~ 1, FPformula = ~ 1, Bformula = ~ 1, stateformula = ~ 1, data, starts, method="BFGS", se = TRUE, engine = "R", ...)
occuFP(detformula = ~ 1, FPformula = ~ 1, Bformula = ~ 1, stateformula = ~ 1, data, starts, method="BFGS", se = TRUE, engine = "R", ...)
detformula |
formula describing covariates of detection. |
FPformula |
formula describing covariates of false positive detection probability. |
Bformula |
formula describing covariates of probability detections are certain. |
stateformula |
formula describing covariates of occupancy. |
data |
An |
starts |
Vector of parameter starting values. |
method |
Optimization method used by |
se |
Logical specifying whether or not to compute standard errors. |
engine |
Currently only choice is R. |
... |
Additional arguments to optim, such as lower and upper bounds |
See unmarkedFrame
and unmarkedFrameOccuFP
for a
description of how to supply data to the data
argument.
occuFP
fits an extension of the standard single-season occupancy model (MacKenzie et al. 2002), which allows
false positive detections. The occupancy status of a site is the same way as with the occu
function, where stateformula
is used to specify factors that lead to differences in occupancy probabilities among sites.
The observation process differs in that both false negative and false positive errors are modeled for observations. The
function allows data to be of 3 types. These types are specified using in unmarkedFrameOccuFP
as type. Occassions
are specified to belong to 1 of the 3 data types and all or a subset of the data types can be combined in the same model.
For type 1 data, the detection process is assumed to fit the assumptions of the standard MacKenzie model
where false negative probabilities are estimated but false positive detections are assumed not to occur. If all of your
data is of this type you should use occu
to analyze data. The detection parameter p, which is modeled using the
detformula is the only observation parameter for these data.
For type 2 data, both false negative and false positive detection probabilities are estimated. If all data is of this type the likelihood follows Royle and Link (2006). Both p (the true positive detection probability) and fp (the false positive detection probability described by fpformula) are estimated for occassions when this data type occurs
For type 3 data, observations are assumed to include both certain detections (false positives assumed not to occur) and uncertain detections that may include false positive detections. When only this data type occurs, the estimator is the same as the multiple detection state model described in Miller et al. (2011). Three observation parameters occur for this data type: p - true positive detection probability, fp - false positive detection probability, and b - the probability a true positive detection was designated as certain.
When both type 1 and type 2 data occur, the estimator is equivalent to the multiple detection method model described in Miller et al. (2011). The frog data example in the same paper uses an analysis where type 1 (dipnet surveys) and type 3 (call surveys) data were used.
Data in the y matrix of the unmarked frame should be all 0s and 1s for type 1 and type 2 data. For type 3 data, uncertain detections are given a value of 1 and certain detections a value of 2.
unmarkedFitOccuFP object describing the model fit.
David Miller
MacKenzie, D. I., J. D. Nichols, G. B. Lachman, S. Droege, J. Andrew Royle, and C. A. Langtimm. 2002. Estimating Site Occupancy Rates When Detection Probabilities Are Less Than One. Ecology 83: 2248-2255.
Miller, D.A., J.D. Nichols, B.T. McClintock, E.H.C. Grant, L.L. Bailey, and L.A. Weir. 2011. Improving occupancy estimation when two types of observational error occur: non-detection and species misidentification. Ecology 92:1422-1428.
Royle, J.A., and W.A. Link. 2006. Generalized site occupancy models allowing for false positive and false negative errors. Ecology 87:835-841.
unmarked
, unmarkedFrameOccuFP
,
modSel
, parboot
n = 100 o = 10 o1 = 5 y = matrix(0,n,o) p = .7 r = .5 fp = 0.05 y[1:(n*.5),(o-o1+1):o] <- rbinom((n*o1*.5),1,p) y[1:(n*.5),1:(o-o1)] <- rbinom((o-o1)*n*.5,1,r) y[(n*.5+1):n,(o-o1+1):o] <- rbinom((n*o1*.5),1,fp) type <- c((o-o1),o1,0) ### vector with the number of each data type site <- c(rep(1,n*.5*.8),rep(0,n*.5*.2),rep(1,n*.5*.2),rep(0,n*.8*.5)) occ <- matrix(c(rep(0,n*(o-o1)),rep(1,n*o1)),n,o) site <- data.frame(habitat = site) occ <- list(METH = occ) umf1 <- unmarkedFrameOccuFP(y,site,occ, type = type) m1 <- occuFP(detformula = ~ METH, FPformula = ~1, stateformula = ~ habitat, data = umf1) predict(m1, type = 'fp') coef(m1) confint(m1, type = 'det')
n = 100 o = 10 o1 = 5 y = matrix(0,n,o) p = .7 r = .5 fp = 0.05 y[1:(n*.5),(o-o1+1):o] <- rbinom((n*o1*.5),1,p) y[1:(n*.5),1:(o-o1)] <- rbinom((o-o1)*n*.5,1,r) y[(n*.5+1):n,(o-o1+1):o] <- rbinom((n*o1*.5),1,fp) type <- c((o-o1),o1,0) ### vector with the number of each data type site <- c(rep(1,n*.5*.8),rep(0,n*.5*.2),rep(1,n*.5*.2),rep(0,n*.8*.5)) occ <- matrix(c(rep(0,n*(o-o1)),rep(1,n*o1)),n,o) site <- data.frame(habitat = site) occ <- list(METH = occ) umf1 <- unmarkedFrameOccuFP(y,site,occ, type = type) m1 <- occuFP(detformula = ~ METH, FPformula = ~1, stateformula = ~ habitat, data = umf1) predict(m1, type = 'fp') coef(m1) confint(m1, type = 'det')
This function fits single-season and dynamic multi-state occupancy models with both the multinomial and conditional binomial parameterizations.
occuMS(detformulas, psiformulas, phiformulas=NULL, data, parameterization=c("multinomial","condbinom"), starts, method="BFGS", se=TRUE, engine=c("C","R"), silent=FALSE, ...)
occuMS(detformulas, psiformulas, phiformulas=NULL, data, parameterization=c("multinomial","condbinom"), starts, method="BFGS", se=TRUE, engine=c("C","R"), silent=FALSE, ...)
detformulas |
Character vector of formulas for detection probabilities. See details for a description of how to order these formulas. |
psiformulas |
Character vector of formulas for occupancy probabilities. See details for a description of how to order these formulas. |
phiformulas |
Character vector of formulas for state transition probabilities. Only used if you are fitting a dynamic model. See details for a description of how to order these formulas. |
data |
An |
parameterization |
Either |
starts |
Vector of parameter starting values. |
method |
Optimization method used by |
se |
Logical specifying whether or not to compute standard errors. |
engine |
Either "C" to use fast C++ code or "R" to use native R code during the optimization. |
silent |
Boolean; if |
... |
Additional arguments to optim, such as lower and upper bounds |
Traditional occupancy models fit data with exactly two states: detection and
non-detection (MacKenzie et al. 2002).
The occuMS
function fits models to occupancy data for which there are
greater than 2 states (Nichols et al 2007, MacKenzie et al. 2009). For example,
detections may be further divided into multiple biologically relevant categories,
e.g. breeding vs. non-breeding, or some/many individuals present. As with
detection status, classification of these additional occupancy states is likely
to be imperfect.
Multiple parameterizations for multi-state occupancy models have been proposed.
The occuMS
function fits two at present: the "conditional binomial"
parameterization of Nichols et al. (2007), and the more general "multinomial"
parameterization of MacKenzie et al. (2009). Both single-season
and dynamic models are possible with occuMS
(MacKenzie et al. 2009).
The conditional binomial parameterization
(parameterization = 'condbinom'
) models occupancy and the presence or
absence of an additional biological state of interest given the species
is present (typically breeding status). Thus, there should be exactly 3 occupancy
states in the data: 0 (non-detection); 1 (detection, no evidence of breeding);
or 2 (detection, evidence of breeding).
Two state parameters are estimated:
, the probability of occupancy, and
, the probability of
successful reproduction given an occupied state (although this could be some
other binary biological condition). Covariates (in
siteCovs
) can be
supplied for either or both of these parameters with the stateformulas
argument, which takes a character vector of R-style formulas with length = 2,
with formulas in the order (,
). For example, to fit a model
where
varies with a landcover covariate and
is constant,
stateformulas = c('~landcover','~1')
.
There are three detection parameters associated with the
conditional binomial parameterization: , the probability of
detecting the species given true state 1;
, the probability of detecting
the species given true state 2; and
, the probability of detecting
state 2 (i.e., breeding), given that the species has been detected.
See MacKenzie et al. (2009), pages 825-826 for more details.
As with occupancy, covariates (in
obsCovs
) can be supplied for these
detection probabilities with the detformulas
argument, which takes a
character vector of formulas with length = 3 in the order
(,
,
). So, to fit a model where
varies
with temperature and the other two parameters are constant,
detformulas = c('~temp','~1','~1')
.
The multinomial parameterization (parameterization = "multinomial"
) is
more general, allowing an arbitrary number of occupancy states .
- 1 occupancy probabilities
are estimated. Thus, if there
are
= 4 occupancy states (0, 1, 2, 3),
occuMS
estimates ,
, and
(the probability of state 0 can be obtained by
subtracting the others from 1). Covariates can be supplied for each occupancy
probability with a character vector with length
, e.g.
stateformulas =
c('~landcover','~1','~1')
where varies with
landcover and
and
are constant.
The number of detection probabilities estimated quickly expands as
increases, equal to
. In the simplest case
(when
= 3), there are 3 detection probabilities:
,
the probability of detecting state 1 given true state 1;
,
the probability of detecting state 1 given true state 2; and
,
the probability of detecting state 2 given true state 2.
Covariates can be supplied for any or all of these detection probabilities with
the
detformulas
argument, which takes a character vector of formulas
with length = 3 in the order (,
,
). So,
to fit a model where
varies with temperature and the other two detection
probabilities are constant,
detformulas = c('~temp','~1','~1')
.
If there were = 4 occupancy states, there are 6 estimated detection
probabilities and the order is (
,
,
,
,
,
), and so on. See MacKenzie et al. (2009)
for a more detailed explanation.
Dynamic (multi-season) models can be fit as well for both parameterizations
(MacKenzie et al. 2009). In a standard dynamic occupancy model, additional
parameters for probabilities of colonization (i.e., state 0 -> 1) and
extinction (1 -> 0) are estimated. In a multi-state context, we must estimate a
transition probability matrix () between all possible states. You can
provide formulas for some of the probabilities in this matrix using the
phiformulas
argument. The approach differs depending on parameterization.
For the conditional binomial parameterization, phiformulas
is a
character vector of length 6. The first three elements are formulas for the
probability a site is occupied at time given that it was previously
in states 0, 1, or 2 at time
(
phi0, phi1, phi2
). Elements 4-6
are formulas for the probability of reproduction (or other biological state)
given state 0, 1, or 2 at time (
R0, R1, R2
). See
umf@phiOrder$cond_binom
for a reminder of the correct order, where
umf
is your unmarkedFrameOccuMS
.
For the multinomial parameterization, phiformulas
can be used to provide
formulas for some transitions between different occupancy states. You can't
give formulas for the probabilities of remaining in the same state between
seasons to keep the model identifiable. Thus, if there are 3 possible states
(0, 1, 2), phiformulas
should contain 6 formulas for the following
transitions: p(0->1), p(0->2), p(1->0), p(1->2), p(2->0), p(2->1)
,
in that order (and similar for more than 3 states). The remaining probabilities
of staying in the same state between seasons can be obtained via subtraction.
See umf@phiOrder$multinomial
for the correct order matching the number
of states in your dataset.
See unmarkedFrame
and unmarkedFrameOccuMS
for a
description of how to supply data to the data
argument.
unmarkedFitOccuMS object describing the model fit.
Ken Kellner [email protected]
MacKenzie, D. I., J. D. Nichols, G. B. Lachman, S. Droege, J. Andrew Royle, and C. A. Langtimm. 2002. Estimating Site Occupancy Rates When Detection Probabilities Are Less Than One. Ecology 83: 2248-2255.
MacKenzie, D. I., Nichols, J. D., Seamans, M. E., and R. J. Gutierrez, 2009. Modeling species occurrence dynamics with multiple states and imperfect detection. Ecology 90: 823-835.
Nichols, J. D., Hines, J. E., Mackenzie, D. I., Seamans, M. E., and R. J. Gutierrez. 2007. Occupancy estimation and modeling with multiple states and state uncertainty. Ecology 88: 1395-1400.
## Not run: #Simulate data #Parameters N <- 500; J <- 5; S <- 3 site_covs <- matrix(rnorm(N*2),ncol=2) obs_covs <- matrix(rnorm(N*J*2),ncol=2) a1 <- -0.5; b1 <- 1; a2 <- -0.6; b2 <- -0.7 ################################## ## Multinomial parameterization ## ################################## p11 <- -0.4; p12 <- -1.09; p22 <- -0.84 truth <- c(a1,b1,a2,b2,p11,0,p12,p22) #State process lp <- matrix(NA,ncol=S,nrow=N) for (n in 1:N){ lp[n,2] <- exp(a1+b1*site_covs[n,1]) lp[n,3] <- exp(a2+b2*site_covs[n,2]) lp[n,1] <- 1 } psi_mat <- lp/rowSums(lp) z <- rep(NA,N) for (n in 1:N){ z[n] <- sample(0:2, 1, replace=T, prob=psi_mat[n,]) } probs_raw <- matrix(c(1,0,0,1,exp(p11),0,1,exp(p12),exp(p22)),nrow=3,byrow=T) probs_raw <- probs_raw/rowSums(probs_raw) y <- matrix(0,nrow=N,ncol=J) for (n in 1:N){ probs <- switch(z[n]+1, probs_raw[1,], probs_raw[2,], probs_raw[3,]) if(z[n]>0){ y[n,] <- sample(0:2, J, replace=T, probs) } } #Construct unmarkedFrame umf <- unmarkedFrameOccuMS(y=y,siteCovs=as.data.frame(site_covs), obsCovs=as.data.frame(obs_covs)) #Formulas #3 states, so detformulas is a character vector of formulas of #length 3 in following order: #1) p[11]: prob of detecting state 1 given true state 1 #2) p[12]: prob of detecting state 1 given true state 2 #3) p[22]: prob of detecting state 2 given true state 2 detformulas <- c('~V1','~1','~1') #If you had 4 states, it would be p[11],p[12],p[13],p[22],p[23],p[33] and so on #3 states, so stateformulas is a character vector of length 2 in following order: #1) psi[1]: probability of state 1 #2) psi[2]: probability of state 2 #You can get probability of state 0 (unoccupied) as 1 - psi[1] - psi[2] stateformulas <- c('~V1','~V2') #Fit model fit <- occuMS(detformulas, stateformulas, data=umf, parameterization="multinomial") #Look at results fit #Compare with truth cbind(truth=truth,estimate=coef(fit)) #Generate predicted values lapply(predict(fit,type='psi'),head) lapply(predict(fit,type='det'),head) #Fit a null model detformulas <- rep('~1',3) stateformulas <- rep('~1',2) fit_null <- occuMS(detformulas, stateformulas, data=umf, parameterization="multinomial") #Compare fits modSel(fitList(fit,fit_null)) ########################################### ## Conditional binomial parameterization ## ########################################### p11 <- 0.4; p12 <- 0.6; p22 <- 0.8 truth_cb <- c(a1,b1,a2,b2,qlogis(p11),0,qlogis(c(p12,p22))) #Simulate data #State process psi_mat <- matrix(NA,ncol=S,nrow=N) for (n in 1:N){ psi_mat[n,2] <- plogis(a1+b1*site_covs[n,1]) psi_mat[n,3] <- plogis(a2+b2*site_covs[n,2]) } psi_bin <- matrix(NA,nrow=nrow(psi_mat),ncol=ncol(psi_mat)) psi_bin[,1] <- 1-psi_mat[,2] psi_bin[,2] <- (1-psi_mat[,3])*psi_mat[,2] psi_bin[,3] <- psi_mat[,2]*psi_mat[,3] z <- rep(NA,N) for (n in 1:N){ z[n] <- sample(0:2, 1, replace=T, prob=psi_bin[n,]) } #Detection process y_cb <- matrix(0,nrow=N,ncol=J) for (n in 1:N){ #p11 = p1; p12 = p2; p22 = delta probs <- switch(z[n]+1, c(1,0,0), c(1-p11,p11,0), c(1-p12,p12*(1-p22),p12*p22)) if(z[n]>0){ y_cb[n,] <- sample(0:2, J, replace=T, probs) } } #Build unmarked frame umf2 <- unmarkedFrameOccuMS(y=y_cb,siteCovs=as.data.frame(site_covs), obsCovs=as.data.frame(obs_covs)) #Formulas #detformulas is a character vector of formulas of length 3 in following order: #1) p[1]: prob of detecting species given true state 1 #2) p[2]: prob of detecting species given true state 2 #3) delta: prob of detecting state 2 (eg breeding) given species was detected detformulas <- c('~V1','~1','~1') #stateformulas is a character vector of length 2 in following order: #1) psi: probability of occupancy #2) R: probability state 2 (eg breeding) given occupancyc stateformulas <- c('~V1','~V2') #Fit model fit_cb <- occuMS(detformulas, stateformulas, data=umf2, parameterization='condbinom') #Look at results fit_cb #Compare with truth cbind(truth=truth_cb,estimate=coef(fit_cb)) #Generate predicted values lapply(predict(fit_cb,type='psi'),head) lapply(predict(fit_cb,type='det'),head) ################################## ## Dynamic (multi-season) model ## ################################## #Simulate data----------------------------------------------- N <- 500 #Number of sites T <- 3 #Number of primary periods J <- 5 #Number of secondary periods S <- 3 #Number of occupancy states (0,1,2) #Generate covariates site_covs <- as.data.frame(matrix(rnorm(N*2),ncol=2)) yearly_site_covs <- as.data.frame(matrix(rnorm(N*T*2),ncol=2)) obs_covs <- as.data.frame(matrix(rnorm(N*J*T*2),ncol=2)) #True parameter values b <- c( #Occupancy parameters a1=-0.5, b1=1, a2=-0.6, b2=-0.7, #Transition prob (phi) parameters phi01=0.7, phi01_cov=-0.5, phi02=-0.5, phi10=1.2, phi12=0.3, phi12_cov=1.1, phi20=-0.3, phi21=1.4, phi21_cov=0, #Detection prob parameters p11=-0.4, p11_cov=0, p12=-1.09, p22=-0.84 ) #Generate occupancy probs (multinomial parameterization) lp <- matrix(1, ncol=S, nrow=N) lp[,2] <- exp(b[1]+b[2]*site_covs[,1]) lp[,3] <- exp(b[3]+b[4]*site_covs[,2]) psi <- lp/rowSums(lp) #True occupancy state matrix z <- matrix(NA, nrow=N, ncol=T) #Initial occupancy for (n in 1:N){ z[n,1] <- sample(0:(S-1), 1, prob=psi[n,]) } #Raw phi probs phi_raw <- matrix(NA, nrow=N*T, ncol=S^2-S) phi_raw[,1] <- exp(b[5]+b[6]*yearly_site_covs[,1]) #p[0->1] phi_raw[,2] <- exp(b[7]) #p[0->2] phi_raw[,3] <- exp(b[8]) #p[1->0] phi_raw[,4] <- exp(b[9]+b[10]*yearly_site_covs[,2]) #p[1->2] phi_raw[,5] <- exp(b[11]) #p[2->0] phi_raw[,6] <- exp(b[12]+b[13]*yearly_site_covs[,1]) #Generate states in times 2..T px <- 1 for (n in 1:N){ for (t in 2:T){ phi_mat <- matrix(c(1, phi_raw[px,1], phi_raw[px,2], # phi|z=0 phi_raw[px,3], 1, phi_raw[px,4], # phi|z=1 phi_raw[px,5], phi_raw[px,6], 1), # phi|z=2 nrow=S, byrow=T) phi_mat <- phi_mat/rowSums(phi_mat) z[n, t] <- sample(0:(S-1), 1, prob=phi_mat[z[n,(t-1)]+1,]) px <- px + 1 if(t==T) px <- px + 1 #skip last datapoint for each site } } #Raw p probs p_mat <- matrix(c(1, 0, 0, #p|z=0 1, exp(b[14]), 0, #p|z=1 1, exp(b[16]), exp(b[17])), #p|z=2 nrow=S, byrow=T) p_mat <- p_mat/rowSums(p_mat) #Simulate observation data y <- matrix(0, nrow=N, ncol=J*T) for (n in 1:N){ yx <- 1 for (t in 1:T){ if(z[n,t]==0){ yx <- yx + J next } for (j in 1:J){ y[n, yx] <- sample(0:(S-1), 1, prob=p_mat[z[n,t]+1,]) yx <- yx+1 } } } #----------------------------------------------------------------- #Model fitting #Build UMF umf <- unmarkedFrameOccuMS(y=y, siteCovs=site_covs, obsCovs=obs_covs, yearlySiteCovs=yearly_site_covs, numPrimary=3) summary(umf) #Formulas #Initial occupancy psiformulas <- c('~V1','~V2') #on psi[1] and psi[2] #Transition probs #Guide to order: umf@phiOrder$multinomial phiformulas <- c('~V1','~1','~1','~V2','~1','~V1') #Detection probability detformulas <- c('~V1','~1','~1') #on p[1|1], p[1|2], p[2|2] #Fit model (fit <- occuMS(detformulas=detformulas, psiformulas=psiformulas, phiformulas=phiformulas, data=umf)) #Compare with truth compare <- cbind(b,coef(fit), coef(fit)-1.96*SE(fit),coef(fit)+1.96*SE(fit)) colnames(compare) <- c('truth','estimate','lower','upper') round(compare,3) #Estimated phi matrix for site 1 phi_est <- predict(fit, 'phi', se.fit=F) phi_est <- sapply(phi_est, function(x) x$Predicted[1]) phi_est_mat <- matrix(NA, nrow=S, ncol=S) phi_est_mat[c(4,7,2,8,3,6)] <- phi_est diag(phi_est_mat) <- 1 - rowSums(phi_est_mat,na.rm=T) #Actual phi matrix for site 1 phi_act_mat <- diag(S) phi_act_mat[c(4,7,2,8,3,6)] <- phi_raw[1,] phi_act_mat <- phi_act_mat/rowSums(phi_act_mat) #Compare cat('Estimated phi\n') phi_est_mat cat('Actual phi\n') phi_act_mat #Rough check of model fit fit_sim <- simulate(fit, nsim=20) hist(sapply(fit_sim,mean),col='gray') abline(v=mean(umf@y),col='red',lwd=2) #line should fall near middle of histogram ## End(Not run)
## Not run: #Simulate data #Parameters N <- 500; J <- 5; S <- 3 site_covs <- matrix(rnorm(N*2),ncol=2) obs_covs <- matrix(rnorm(N*J*2),ncol=2) a1 <- -0.5; b1 <- 1; a2 <- -0.6; b2 <- -0.7 ################################## ## Multinomial parameterization ## ################################## p11 <- -0.4; p12 <- -1.09; p22 <- -0.84 truth <- c(a1,b1,a2,b2,p11,0,p12,p22) #State process lp <- matrix(NA,ncol=S,nrow=N) for (n in 1:N){ lp[n,2] <- exp(a1+b1*site_covs[n,1]) lp[n,3] <- exp(a2+b2*site_covs[n,2]) lp[n,1] <- 1 } psi_mat <- lp/rowSums(lp) z <- rep(NA,N) for (n in 1:N){ z[n] <- sample(0:2, 1, replace=T, prob=psi_mat[n,]) } probs_raw <- matrix(c(1,0,0,1,exp(p11),0,1,exp(p12),exp(p22)),nrow=3,byrow=T) probs_raw <- probs_raw/rowSums(probs_raw) y <- matrix(0,nrow=N,ncol=J) for (n in 1:N){ probs <- switch(z[n]+1, probs_raw[1,], probs_raw[2,], probs_raw[3,]) if(z[n]>0){ y[n,] <- sample(0:2, J, replace=T, probs) } } #Construct unmarkedFrame umf <- unmarkedFrameOccuMS(y=y,siteCovs=as.data.frame(site_covs), obsCovs=as.data.frame(obs_covs)) #Formulas #3 states, so detformulas is a character vector of formulas of #length 3 in following order: #1) p[11]: prob of detecting state 1 given true state 1 #2) p[12]: prob of detecting state 1 given true state 2 #3) p[22]: prob of detecting state 2 given true state 2 detformulas <- c('~V1','~1','~1') #If you had 4 states, it would be p[11],p[12],p[13],p[22],p[23],p[33] and so on #3 states, so stateformulas is a character vector of length 2 in following order: #1) psi[1]: probability of state 1 #2) psi[2]: probability of state 2 #You can get probability of state 0 (unoccupied) as 1 - psi[1] - psi[2] stateformulas <- c('~V1','~V2') #Fit model fit <- occuMS(detformulas, stateformulas, data=umf, parameterization="multinomial") #Look at results fit #Compare with truth cbind(truth=truth,estimate=coef(fit)) #Generate predicted values lapply(predict(fit,type='psi'),head) lapply(predict(fit,type='det'),head) #Fit a null model detformulas <- rep('~1',3) stateformulas <- rep('~1',2) fit_null <- occuMS(detformulas, stateformulas, data=umf, parameterization="multinomial") #Compare fits modSel(fitList(fit,fit_null)) ########################################### ## Conditional binomial parameterization ## ########################################### p11 <- 0.4; p12 <- 0.6; p22 <- 0.8 truth_cb <- c(a1,b1,a2,b2,qlogis(p11),0,qlogis(c(p12,p22))) #Simulate data #State process psi_mat <- matrix(NA,ncol=S,nrow=N) for (n in 1:N){ psi_mat[n,2] <- plogis(a1+b1*site_covs[n,1]) psi_mat[n,3] <- plogis(a2+b2*site_covs[n,2]) } psi_bin <- matrix(NA,nrow=nrow(psi_mat),ncol=ncol(psi_mat)) psi_bin[,1] <- 1-psi_mat[,2] psi_bin[,2] <- (1-psi_mat[,3])*psi_mat[,2] psi_bin[,3] <- psi_mat[,2]*psi_mat[,3] z <- rep(NA,N) for (n in 1:N){ z[n] <- sample(0:2, 1, replace=T, prob=psi_bin[n,]) } #Detection process y_cb <- matrix(0,nrow=N,ncol=J) for (n in 1:N){ #p11 = p1; p12 = p2; p22 = delta probs <- switch(z[n]+1, c(1,0,0), c(1-p11,p11,0), c(1-p12,p12*(1-p22),p12*p22)) if(z[n]>0){ y_cb[n,] <- sample(0:2, J, replace=T, probs) } } #Build unmarked frame umf2 <- unmarkedFrameOccuMS(y=y_cb,siteCovs=as.data.frame(site_covs), obsCovs=as.data.frame(obs_covs)) #Formulas #detformulas is a character vector of formulas of length 3 in following order: #1) p[1]: prob of detecting species given true state 1 #2) p[2]: prob of detecting species given true state 2 #3) delta: prob of detecting state 2 (eg breeding) given species was detected detformulas <- c('~V1','~1','~1') #stateformulas is a character vector of length 2 in following order: #1) psi: probability of occupancy #2) R: probability state 2 (eg breeding) given occupancyc stateformulas <- c('~V1','~V2') #Fit model fit_cb <- occuMS(detformulas, stateformulas, data=umf2, parameterization='condbinom') #Look at results fit_cb #Compare with truth cbind(truth=truth_cb,estimate=coef(fit_cb)) #Generate predicted values lapply(predict(fit_cb,type='psi'),head) lapply(predict(fit_cb,type='det'),head) ################################## ## Dynamic (multi-season) model ## ################################## #Simulate data----------------------------------------------- N <- 500 #Number of sites T <- 3 #Number of primary periods J <- 5 #Number of secondary periods S <- 3 #Number of occupancy states (0,1,2) #Generate covariates site_covs <- as.data.frame(matrix(rnorm(N*2),ncol=2)) yearly_site_covs <- as.data.frame(matrix(rnorm(N*T*2),ncol=2)) obs_covs <- as.data.frame(matrix(rnorm(N*J*T*2),ncol=2)) #True parameter values b <- c( #Occupancy parameters a1=-0.5, b1=1, a2=-0.6, b2=-0.7, #Transition prob (phi) parameters phi01=0.7, phi01_cov=-0.5, phi02=-0.5, phi10=1.2, phi12=0.3, phi12_cov=1.1, phi20=-0.3, phi21=1.4, phi21_cov=0, #Detection prob parameters p11=-0.4, p11_cov=0, p12=-1.09, p22=-0.84 ) #Generate occupancy probs (multinomial parameterization) lp <- matrix(1, ncol=S, nrow=N) lp[,2] <- exp(b[1]+b[2]*site_covs[,1]) lp[,3] <- exp(b[3]+b[4]*site_covs[,2]) psi <- lp/rowSums(lp) #True occupancy state matrix z <- matrix(NA, nrow=N, ncol=T) #Initial occupancy for (n in 1:N){ z[n,1] <- sample(0:(S-1), 1, prob=psi[n,]) } #Raw phi probs phi_raw <- matrix(NA, nrow=N*T, ncol=S^2-S) phi_raw[,1] <- exp(b[5]+b[6]*yearly_site_covs[,1]) #p[0->1] phi_raw[,2] <- exp(b[7]) #p[0->2] phi_raw[,3] <- exp(b[8]) #p[1->0] phi_raw[,4] <- exp(b[9]+b[10]*yearly_site_covs[,2]) #p[1->2] phi_raw[,5] <- exp(b[11]) #p[2->0] phi_raw[,6] <- exp(b[12]+b[13]*yearly_site_covs[,1]) #Generate states in times 2..T px <- 1 for (n in 1:N){ for (t in 2:T){ phi_mat <- matrix(c(1, phi_raw[px,1], phi_raw[px,2], # phi|z=0 phi_raw[px,3], 1, phi_raw[px,4], # phi|z=1 phi_raw[px,5], phi_raw[px,6], 1), # phi|z=2 nrow=S, byrow=T) phi_mat <- phi_mat/rowSums(phi_mat) z[n, t] <- sample(0:(S-1), 1, prob=phi_mat[z[n,(t-1)]+1,]) px <- px + 1 if(t==T) px <- px + 1 #skip last datapoint for each site } } #Raw p probs p_mat <- matrix(c(1, 0, 0, #p|z=0 1, exp(b[14]), 0, #p|z=1 1, exp(b[16]), exp(b[17])), #p|z=2 nrow=S, byrow=T) p_mat <- p_mat/rowSums(p_mat) #Simulate observation data y <- matrix(0, nrow=N, ncol=J*T) for (n in 1:N){ yx <- 1 for (t in 1:T){ if(z[n,t]==0){ yx <- yx + J next } for (j in 1:J){ y[n, yx] <- sample(0:(S-1), 1, prob=p_mat[z[n,t]+1,]) yx <- yx+1 } } } #----------------------------------------------------------------- #Model fitting #Build UMF umf <- unmarkedFrameOccuMS(y=y, siteCovs=site_covs, obsCovs=obs_covs, yearlySiteCovs=yearly_site_covs, numPrimary=3) summary(umf) #Formulas #Initial occupancy psiformulas <- c('~V1','~V2') #on psi[1] and psi[2] #Transition probs #Guide to order: umf@phiOrder$multinomial phiformulas <- c('~V1','~1','~1','~V2','~1','~V1') #Detection probability detformulas <- c('~V1','~1','~1') #on p[1|1], p[1|2], p[2|2] #Fit model (fit <- occuMS(detformulas=detformulas, psiformulas=psiformulas, phiformulas=phiformulas, data=umf)) #Compare with truth compare <- cbind(b,coef(fit), coef(fit)-1.96*SE(fit),coef(fit)+1.96*SE(fit)) colnames(compare) <- c('truth','estimate','lower','upper') round(compare,3) #Estimated phi matrix for site 1 phi_est <- predict(fit, 'phi', se.fit=F) phi_est <- sapply(phi_est, function(x) x$Predicted[1]) phi_est_mat <- matrix(NA, nrow=S, ncol=S) phi_est_mat[c(4,7,2,8,3,6)] <- phi_est diag(phi_est_mat) <- 1 - rowSums(phi_est_mat,na.rm=T) #Actual phi matrix for site 1 phi_act_mat <- diag(S) phi_act_mat[c(4,7,2,8,3,6)] <- phi_raw[1,] phi_act_mat <- phi_act_mat/rowSums(phi_act_mat) #Compare cat('Estimated phi\n') phi_est_mat cat('Actual phi\n') phi_act_mat #Rough check of model fit fit_sim <- simulate(fit, nsim=20) hist(sapply(fit_sim,mean),col='gray') abline(v=mean(umf@y),col='red',lwd=2) #line should fall near middle of histogram ## End(Not run)
This function fits the multispecies occupancy model of Rota et al (2016).
occuMulti(detformulas, stateformulas, data, maxOrder, penalty=0, boot=30, starts, method="BFGS", se=TRUE, engine=c("C","R"), silent=FALSE, ...)
occuMulti(detformulas, stateformulas, data, maxOrder, penalty=0, boot=30, starts, method="BFGS", se=TRUE, engine=c("C","R"), silent=FALSE, ...)
detformulas |
Character vector of formulas for the detection models, one per species. |
stateformulas |
Character vector of formulas for the natural parameters.
To fix a natural parameter at 0, specify the corresponding formula as |
data |
An |
maxOrder |
Optional; specify maximum interaction order. Defaults to number of species (all possible interactions). Reducing this value may speed up optimization if you aren't interested in higher-order interactions. |
penalty |
Penalty term for likelihood. The total penalty is calculated
as |
boot |
Number of bootstrap samples to use to generate the
variance-covariance matrix when |
starts |
Vector of parameter starting values. |
method |
Optimization method used by |
se |
Logical specifying whether or not to compute standard errors. |
engine |
Either "C" to use fast C++ code or "R" to use native R code during the optimization. |
silent |
Boolean; if |
... |
Additional arguments to optim, such as lower and upper bounds |
See unmarkedFrame
and unmarkedFrameOccuMulti
for a
description of how to supply data to the data
argument.
occuMulti
fits the multispecies occupancy model from Rota et al. (2016),
for two or more interacting species.
The model generalizes the standard single-species occupancy model from
MacKenzie et al. (2002).
The latent occupancy state at site for a set of
potentially
interacting species is a vector
of length
containing a sequence of the values 0 or 1. For example, when
,
the possible states are
,
,
, or
,
corresponding to both species present, only species 1 or species 2 present,
or both species absent, respectively. The latent state modeled as a
multivariate Bernoulli random variable:
where is a vector of length
containing the probability of each possible combination of 0s and 1s, such that
.
For , the corresponding natural parameters
are
The natural parameters can then be modeled as linear functions of covariates.
Covariates for each must be specified with the
stateformulas
argument,
which takes a character vector of individual formulas of length equal to the number of
natural parameters (which in turn depends on the number of species in the model).
The observation process is similar to the standard single-species occupancy
model, except that the observations at site
on occasion
are vectors of length
and there are independent
values of detection probability
for each species
:
Independent detection models (potentially containing different covariates)
must be provided for each species with the detformulas
argument,
which takes a character vector of individual formulas with length equal to
the number of species .
If you are having problems with separation or boundary estimates (indicated by
very large parameter estimates and SEs), use of penalized likelihood may
help: see Clipp et al. (2021). occuMulti
supports use of the
Bayes-inspired penalty of Hutchinson et al. (2015). You can set the penalty
value manually using the penalty
argument, or identify the optimal
penalty using K-fold cross validation with the optimizePenalty
function.
See example below.
unmarkedFitOccuMulti object describing the model fit.
Ken Kellner [email protected]
Clipp, H. L., Evans, A., Kessinger, B. E., Kellner, K. F., and C. T. Rota. 2021. A penalized likelihood for multi-species occupancy models improves predictions of species interactions. Ecology.
Hutchinson, R. A., J. V. Valente, S. C. Emerson, M. G. Betts, and T. G. Dietterich. 2015. Penalized Likelihood Methods Improve Parameter Estimates in Occupancy Models. Methods in Ecology and Evolution. DOI: 10.1111/2041-210X.12368
MacKenzie, D. I., J. D. Nichols, G. B. Lachman, S. Droege, J. Andrew Royle, and C. A. Langtimm. 2002. Estimating Site Occupancy Rates When Detection Probabilities Are Less Than One. Ecology 83: 2248-2255.
Rota, C.T., et al. 2016. A multi-species occupancy model for two or more interacting species. Methods in Ecology and Evolution 7: 1164-1173.
unmarked
, unmarkedFrameOccuMulti
## Not run: #Simulate 3 species data N <- 1000 nspecies <- 3 J <- 5 occ_covs <- as.data.frame(matrix(rnorm(N * 10),ncol=10)) names(occ_covs) <- paste('occ_cov',1:10,sep='') det_covs <- list() for (i in 1:nspecies){ det_covs[[i]] <- matrix(rnorm(N*J),nrow=N) } names(det_covs) <- paste('det_cov',1:nspecies,sep='') #True vals beta <- c(0.5,0.2,0.4,0.5,-0.1,-0.3,0.2,0.1,-1,0.1) f1 <- beta[1] + beta[2]*occ_covs$occ_cov1 f2 <- beta[3] + beta[4]*occ_covs$occ_cov2 f3 <- beta[5] + beta[6]*occ_covs$occ_cov3 f4 <- beta[7] f5 <- beta[8] f6 <- beta[9] f7 <- beta[10] f <- cbind(f1,f2,f3,f4,f5,f6,f7) z <- expand.grid(rep(list(1:0),nspecies))[,nspecies:1] colnames(z) <- paste('sp',1:nspecies,sep='') dm <- model.matrix(as.formula(paste0("~.^",nspecies,"-1")),z) psi <- exp(f %*% t(dm)) psi <- psi/rowSums(psi) #True state ztruth <- matrix(NA,nrow=N,ncol=nspecies) for (i in 1:N){ ztruth[i,] <- as.matrix(z[sample(8,1,prob=psi[i,]),]) } p_true <- c(0.6,0.7,0.5) # fake y data y <- list() for (i in 1:nspecies){ y[[i]] <- matrix(NA,N,J) for (j in 1:N){ for (k in 1:J){ y[[i]][j,k] <- rbinom(1,1,ztruth[j,i]*p_true[i]) } } } names(y) <- c('coyote','tiger','bear') #Create the unmarked data object data = unmarkedFrameOccuMulti(y=y,siteCovs=occ_covs,obsCovs=det_covs) #Summary of data object summary(data) plot(data) # Look at f parameter design matrix data@fDesign # Formulas for state and detection processes # Length should match number/order of columns in fDesign occFormulas <- c('~occ_cov1','~occ_cov2','~occ_cov3','~1','~1','~1','~1') #Length should match number/order of species in data@ylist detFormulas <- c('~1','~1','~1') fit <- occuMulti(detFormulas,occFormulas,data) #Look at output fit plot(fit) #Compare with known values cbind(c(beta,log(p_true/(1-p_true))),fit@opt$par) #predict method lapply(predict(fit,'state'),head) lapply(predict(fit,'det'),head) #marginal occupancy head(predict(fit,'state',species=2)) head(predict(fit,'state',species='bear')) head(predict(fit,'det',species='coyote')) #probability of co-occurrence of two or more species head(predict(fit, 'state', species=c('coyote','tiger'))) #conditional occupancy head(predict(fit,'state',species=2,cond=3)) #tiger | bear present head(predict(fit,'state',species='tiger',cond='bear')) #tiger | bear present head(predict(fit,'state',species='tiger',cond='-bear')) #bear absent head(predict(fit,'state',species='tiger',cond=c('coyote','-bear'))) #residuals (by species) lapply(residuals(fit),head) #ranef (by species) ranef(fit, species='coyote') #parametric bootstrap bt <- parboot(fit,nsim=30) #update model occFormulas <- c('~occ_cov1','~occ_cov2','~occ_cov2+occ_cov3','~1','~1','~1','~1') fit2 <- update(fit,stateformulas=occFormulas) #List of fitted models fl <- fitList(fit,fit2) coef(fl) #Model selection modSel(fl) #Fit model while forcing some natural parameters to be 0 #For example: fit model with no species interactions occFormulas <- c('~occ_cov1','~occ_cov2','~occ_cov2+occ_cov3','0','0','0','0') fit3 <- occuMulti(detFormulas,occFormulas,data) #Alternatively, you can force all interaction parameters above a certain #order to be zero with maxOrder. This will be faster. occFormulas <- c('~occ_cov1','~occ_cov2','~occ_cov2+occ_cov3') fit4 <- occuMulti(detFormulas,occFormulas,data,maxOrder=1) #Add Bayes penalty term to likelihood. This is useful if your parameter #estimates are very large, eg because of separation. fit5 <- occuMulti(detFormulas, occFormulas, data, penalty=1) #Find optimal penalty term value from a range of possible values using #K-fold cross validation, and re-fit the model fit_opt <- optimizePenalty(fit5, penalties=c(0,1,2)) ## End(Not run)
## Not run: #Simulate 3 species data N <- 1000 nspecies <- 3 J <- 5 occ_covs <- as.data.frame(matrix(rnorm(N * 10),ncol=10)) names(occ_covs) <- paste('occ_cov',1:10,sep='') det_covs <- list() for (i in 1:nspecies){ det_covs[[i]] <- matrix(rnorm(N*J),nrow=N) } names(det_covs) <- paste('det_cov',1:nspecies,sep='') #True vals beta <- c(0.5,0.2,0.4,0.5,-0.1,-0.3,0.2,0.1,-1,0.1) f1 <- beta[1] + beta[2]*occ_covs$occ_cov1 f2 <- beta[3] + beta[4]*occ_covs$occ_cov2 f3 <- beta[5] + beta[6]*occ_covs$occ_cov3 f4 <- beta[7] f5 <- beta[8] f6 <- beta[9] f7 <- beta[10] f <- cbind(f1,f2,f3,f4,f5,f6,f7) z <- expand.grid(rep(list(1:0),nspecies))[,nspecies:1] colnames(z) <- paste('sp',1:nspecies,sep='') dm <- model.matrix(as.formula(paste0("~.^",nspecies,"-1")),z) psi <- exp(f %*% t(dm)) psi <- psi/rowSums(psi) #True state ztruth <- matrix(NA,nrow=N,ncol=nspecies) for (i in 1:N){ ztruth[i,] <- as.matrix(z[sample(8,1,prob=psi[i,]),]) } p_true <- c(0.6,0.7,0.5) # fake y data y <- list() for (i in 1:nspecies){ y[[i]] <- matrix(NA,N,J) for (j in 1:N){ for (k in 1:J){ y[[i]][j,k] <- rbinom(1,1,ztruth[j,i]*p_true[i]) } } } names(y) <- c('coyote','tiger','bear') #Create the unmarked data object data = unmarkedFrameOccuMulti(y=y,siteCovs=occ_covs,obsCovs=det_covs) #Summary of data object summary(data) plot(data) # Look at f parameter design matrix data@fDesign # Formulas for state and detection processes # Length should match number/order of columns in fDesign occFormulas <- c('~occ_cov1','~occ_cov2','~occ_cov3','~1','~1','~1','~1') #Length should match number/order of species in data@ylist detFormulas <- c('~1','~1','~1') fit <- occuMulti(detFormulas,occFormulas,data) #Look at output fit plot(fit) #Compare with known values cbind(c(beta,log(p_true/(1-p_true))),fit@opt$par) #predict method lapply(predict(fit,'state'),head) lapply(predict(fit,'det'),head) #marginal occupancy head(predict(fit,'state',species=2)) head(predict(fit,'state',species='bear')) head(predict(fit,'det',species='coyote')) #probability of co-occurrence of two or more species head(predict(fit, 'state', species=c('coyote','tiger'))) #conditional occupancy head(predict(fit,'state',species=2,cond=3)) #tiger | bear present head(predict(fit,'state',species='tiger',cond='bear')) #tiger | bear present head(predict(fit,'state',species='tiger',cond='-bear')) #bear absent head(predict(fit,'state',species='tiger',cond=c('coyote','-bear'))) #residuals (by species) lapply(residuals(fit),head) #ranef (by species) ranef(fit, species='coyote') #parametric bootstrap bt <- parboot(fit,nsim=30) #update model occFormulas <- c('~occ_cov1','~occ_cov2','~occ_cov2+occ_cov3','~1','~1','~1','~1') fit2 <- update(fit,stateformulas=occFormulas) #List of fitted models fl <- fitList(fit,fit2) coef(fl) #Model selection modSel(fl) #Fit model while forcing some natural parameters to be 0 #For example: fit model with no species interactions occFormulas <- c('~occ_cov1','~occ_cov2','~occ_cov2+occ_cov3','0','0','0','0') fit3 <- occuMulti(detFormulas,occFormulas,data) #Alternatively, you can force all interaction parameters above a certain #order to be zero with maxOrder. This will be faster. occFormulas <- c('~occ_cov1','~occ_cov2','~occ_cov2+occ_cov3') fit4 <- occuMulti(detFormulas,occFormulas,data,maxOrder=1) #Add Bayes penalty term to likelihood. This is useful if your parameter #estimates are very large, eg because of separation. fit5 <- occuMulti(detFormulas, occFormulas, data, penalty=1) #Find optimal penalty term value from a range of possible values using #K-fold cross validation, and re-fit the model fit_opt <- optimizePenalty(fit5, penalties=c(0,1,2)) ## End(Not run)
This function fits the occupancy model of MacKenzie et al (2002) with the penalized methods of Hutchinson et al (2015).
occuPEN(formula, data, knownOcc=numeric(0), starts, method="BFGS", engine=c("C", "R"), lambda=0, pen.type = c("Bayes","Ridge","MPLE"), ...)
occuPEN(formula, data, knownOcc=numeric(0), starts, method="BFGS", engine=c("C", "R"), lambda=0, pen.type = c("Bayes","Ridge","MPLE"), ...)
formula |
Double right-hand side formula describing covariates of detection and occupancy in that order. |
data |
An |
knownOcc |
Vector of sites that are known to be occupied. These should be supplied as row numbers of the y matrix, eg, c(3,8) if sites 3 and 8 were known to be occupied a priori. |
starts |
Vector of parameter starting values. |
method |
Optimization method used by |
engine |
Either "C" or "R" to use fast C++ code or native R code during the optimization. |
lambda |
Penalty weight parameter. |
pen.type |
Which form of penalty to use. |
... |
Additional arguments to optim, such as lower and upper bounds |
See unmarkedFrame
and unmarkedFrameOccu
for a
description of how to supply data to the data
argument.
occuPEN
fits the standard occupancy model based on
zero-inflated binomial models (MacKenzie et al. 2006, Royle and
Dorazio 2008) using the penalized likelihood methods described in
Hutchinson et al. (2015). See occu
for model
details. occuPEN
returns parameter estimates that maximize a
penalized likelihood in which the penalty is specified by the
pen.type
argument. The penalty function is weighted by
lambda
.
The MPLE method includes an equation for computing lambda
(Moreno & Lele, 2010). If the value supplied does not equal match the
one computed with this equation, the supplied value is used anyway
(with a warning).
unmarkedFitOccuPEN object describing the model fit.
Rebecca A. Hutchinson
Hutchinson, R. A., J. V. Valente, S. C. Emerson, M. G. Betts, and T. G. Dietterich. 2015. Penalized Likelihood Methods Improve Parameter Estimates in Occupancy Models. Methods in Ecology and Evolution. DOI: 10.1111/2041-210X.12368
MacKenzie, D. I., J. D. Nichols, G. B. Lachman, S. Droege, J. Andrew Royle, and C. A. Langtimm. 2002. Estimating Site Occupancy Rates When Detection Probabilities Are Less Than One. Ecology 83: 2248-2255.
MacKenzie, D. I. et al. 2006. Occupancy Estimation and Modeling. Amsterdam: Academic Press.
Moreno, M. and S. R. Lele. 2010. Improved estimation of site occupancy using penalized likelihood. Ecology 91: 341-346.
Royle, J. A. and R. Dorazio. 2008. Hierarchical Modeling and Inference in Ecology. Academic Press.
unmarked
, unmarkedFrameOccu
,
occu
, computeMPLElambda
,
occuPEN_CV
, nonparboot
# Simulate occupancy data set.seed(344) nSites <- 100 nReps <- 2 covariates <- data.frame(veght=rnorm(nSites), habitat=factor(c(rep('A', nSites/2), rep('B', nSites/2)))) psipars <- c(-1, 1, -1) ppars <- c(1, -1, 0) X <- model.matrix(~veght+habitat, covariates) # design matrix psi <- plogis(X %*% psipars) p <- plogis(X %*% ppars) y <- matrix(NA, nSites, nReps) z <- rbinom(nSites, 1, psi) # true occupancy state for(i in 1:nSites) { y[i,] <- rbinom(nReps, 1, z[i]*p[i]) } # Organize data and look at it umf <- unmarkedFrameOccu(y = y, siteCovs = covariates) obsCovs(umf) <- covariates head(umf) summary(umf) # Fit some models fmMLE <- occu(~veght+habitat ~veght+habitat, umf) fm1pen <- occuPEN(~veght+habitat ~veght+habitat, umf,lambda=0.33,pen.type="Ridge") fm2pen <- occuPEN(~veght+habitat ~veght+habitat, umf,lambda=1,pen.type="Bayes") # MPLE: fm3pen <- occuPEN(~veght+habitat ~veght+habitat, umf,lambda=0.5,pen.type="MPLE") MPLElambda = computeMPLElambda(~veght+habitat ~veght+habitat, umf) fm4pen <- occuPEN(~veght+habitat ~veght+habitat, umf,lambda=MPLElambda,pen.type="MPLE") # nonparametric bootstrap for uncertainty analysis: fm1pen <- nonparboot(fm1pen,B=20) # should use more samples vcov(fm1pen,method="nonparboot")
# Simulate occupancy data set.seed(344) nSites <- 100 nReps <- 2 covariates <- data.frame(veght=rnorm(nSites), habitat=factor(c(rep('A', nSites/2), rep('B', nSites/2)))) psipars <- c(-1, 1, -1) ppars <- c(1, -1, 0) X <- model.matrix(~veght+habitat, covariates) # design matrix psi <- plogis(X %*% psipars) p <- plogis(X %*% ppars) y <- matrix(NA, nSites, nReps) z <- rbinom(nSites, 1, psi) # true occupancy state for(i in 1:nSites) { y[i,] <- rbinom(nReps, 1, z[i]*p[i]) } # Organize data and look at it umf <- unmarkedFrameOccu(y = y, siteCovs = covariates) obsCovs(umf) <- covariates head(umf) summary(umf) # Fit some models fmMLE <- occu(~veght+habitat ~veght+habitat, umf) fm1pen <- occuPEN(~veght+habitat ~veght+habitat, umf,lambda=0.33,pen.type="Ridge") fm2pen <- occuPEN(~veght+habitat ~veght+habitat, umf,lambda=1,pen.type="Bayes") # MPLE: fm3pen <- occuPEN(~veght+habitat ~veght+habitat, umf,lambda=0.5,pen.type="MPLE") MPLElambda = computeMPLElambda(~veght+habitat ~veght+habitat, umf) fm4pen <- occuPEN(~veght+habitat ~veght+habitat, umf,lambda=MPLElambda,pen.type="MPLE") # nonparametric bootstrap for uncertainty analysis: fm1pen <- nonparboot(fm1pen,B=20) # should use more samples vcov(fm1pen,method="nonparboot")
This function fits the occupancy model of MacKenzie et al (2002) with the penalized methods of Hutchinson et al (2015) using k-fold cross-validation to choose the penalty weight.
occuPEN_CV(formula, data, knownOcc=numeric(0), starts, method="BFGS", engine=c("C", "R"), lambdaVec=c(0,2^seq(-4,4)), pen.type = c("Bayes","Ridge"), k = 5, foldAssignments = NA, ...)
occuPEN_CV(formula, data, knownOcc=numeric(0), starts, method="BFGS", engine=c("C", "R"), lambdaVec=c(0,2^seq(-4,4)), pen.type = c("Bayes","Ridge"), k = 5, foldAssignments = NA, ...)
formula |
Double right-hand side formula describing covariates of detection and occupancy in that order. |
data |
An |
knownOcc |
Vector of sites that are known to be occupied. These should be supplied as row numbers of the y matrix, eg, c(3,8) if sites 3 and 8 were known to be occupied a priori. |
starts |
Vector of parameter starting values. |
method |
Optimization method used by |
engine |
Either "C" or "R" to use fast C++ code or native R code during the optimization. |
lambdaVec |
Vector of values to try for lambda. |
pen.type |
Which form of penalty to use. |
k |
Number of folds for k-fold cross-validation. |
foldAssignments |
Vector containing the number of the fold that each site falls into. Length of the vector should be equal to the number of sites, and the vector should contain k unique values. E.g. for 9 sites and 3 folds, c(1,2,3,1,2,3,1,2,3) or c(1,1,1,2,2,2,3,3,3). |
... |
Additional arguments to optim, such as lower and upper bounds |
See unmarkedFrame
and unmarkedFrameOccu
for a
description of how to supply data to the data
argument.
This function wraps k-fold cross-validation around occuPEN_CV
for the "Bayes" and "Ridge" penalties of Hutchinson et al. (2015). The
user may specify the number of folds (k
), the values to try
(lambdaVec
), and the assignments of sites to folds
(foldAssignments
). If foldAssignments
is not provided,
the assignments are done pseudo-randomly, and the function attempts to
put some sites with and without positive detections in each fold. This
randomness introduces variability into the results of this function
across runs; to eliminate the randomness, supply foldAssignments.
unmarkedFitOccuPEN_CV object describing the model fit.
Rebecca A. Hutchinson
Hutchinson, R. A., J. V. Valente, S. C. Emerson, M. G. Betts, and T. G. Dietterich. 2015. Penalized Likelihood Methods Improve Parameter Estimates in Occupancy Models. Methods in Ecology and Evolution. DOI: 10.1111/2041-210X.12368
MacKenzie, D. I., J. D. Nichols, G. B. Lachman, S. Droege, J. Andrew Royle, and C. A. Langtimm. 2002. Estimating Site Occupancy Rates When Detection Probabilities Are Less Than One. Ecology 83: 2248-2255.
unmarked
, unmarkedFrameOccu
,
occu
, occuPEN
, nonparboot
# Simulate occupancy data set.seed(646) nSites <- 60 nReps <- 2 covariates <- data.frame(veght=rnorm(nSites), habitat=factor(c(rep('A', 30), rep('B', 30)))) psipars <- c(-1, 1, -1) ppars <- c(1, -1, 0) X <- model.matrix(~veght+habitat, covariates) # design matrix psi <- plogis(X %*% psipars) p <- plogis(X %*% ppars) y <- matrix(NA, nSites, nReps) z <- rbinom(nSites, 1, psi) # true occupancy state for(i in 1:nSites) { y[i,] <- rbinom(nReps, 1, z[i]*p[i]) } # Organize data and look at it umf <- unmarkedFrameOccu(y = y, siteCovs = covariates) obsCovs(umf) <- covariates head(umf) summary(umf) ## Not run: # Fit some models fmMLE <- occu(~veght+habitat ~veght+habitat, umf) fmMLE@estimates fm1penCV <- occuPEN_CV(~veght+habitat ~veght+habitat, umf,pen.type="Ridge", foldAssignments=rep(1:5,ceiling(nSites/5))[1:nSites]) fm1penCV@lambdaVec fm1penCV@chosenLambda fm1penCV@estimates fm2penCV <- occuPEN_CV(~veght+habitat ~veght+habitat, umf,pen.type="Bayes",foldAssignments=rep(1:5,ceiling(nSites/5))[1:nSites]) fm2penCV@lambdaVec fm2penCV@chosenLambda fm2penCV@estimates # nonparametric bootstrap for uncertainty analysis: # bootstrap is wrapped around the cross-validation fm2penCV <- nonparboot(fm2penCV,B=10) # should use more samples vcov(fm2penCV,method="nonparboot") # Mean squared error of parameters: mean((c(psipars,ppars)-c(fmMLE[1]@estimates,fmMLE[2]@estimates))^2) mean((c(psipars,ppars)-c(fm1penCV[1]@estimates,fm1penCV[2]@estimates))^2) mean((c(psipars,ppars)-c(fm2penCV[1]@estimates,fm2penCV[2]@estimates))^2) ## End(Not run)
# Simulate occupancy data set.seed(646) nSites <- 60 nReps <- 2 covariates <- data.frame(veght=rnorm(nSites), habitat=factor(c(rep('A', 30), rep('B', 30)))) psipars <- c(-1, 1, -1) ppars <- c(1, -1, 0) X <- model.matrix(~veght+habitat, covariates) # design matrix psi <- plogis(X %*% psipars) p <- plogis(X %*% ppars) y <- matrix(NA, nSites, nReps) z <- rbinom(nSites, 1, psi) # true occupancy state for(i in 1:nSites) { y[i,] <- rbinom(nReps, 1, z[i]*p[i]) } # Organize data and look at it umf <- unmarkedFrameOccu(y = y, siteCovs = covariates) obsCovs(umf) <- covariates head(umf) summary(umf) ## Not run: # Fit some models fmMLE <- occu(~veght+habitat ~veght+habitat, umf) fmMLE@estimates fm1penCV <- occuPEN_CV(~veght+habitat ~veght+habitat, umf,pen.type="Ridge", foldAssignments=rep(1:5,ceiling(nSites/5))[1:nSites]) fm1penCV@lambdaVec fm1penCV@chosenLambda fm1penCV@estimates fm2penCV <- occuPEN_CV(~veght+habitat ~veght+habitat, umf,pen.type="Bayes",foldAssignments=rep(1:5,ceiling(nSites/5))[1:nSites]) fm2penCV@lambdaVec fm2penCV@chosenLambda fm2penCV@estimates # nonparametric bootstrap for uncertainty analysis: # bootstrap is wrapped around the cross-validation fm2penCV <- nonparboot(fm2penCV,B=10) # should use more samples vcov(fm2penCV,method="nonparboot") # Mean squared error of parameters: mean((c(psipars,ppars)-c(fmMLE[1]@estimates,fmMLE[2]@estimates))^2) mean((c(psipars,ppars)-c(fm1penCV[1]@estimates,fm1penCV[2]@estimates))^2) mean((c(psipars,ppars)-c(fm2penCV[1]@estimates,fm2penCV[2]@estimates))^2) ## End(Not run)
Fit the occupancy model of Royle and Nichols (2003), which relates probability of detection of the species to the number of individuals available for detection at each site. Probability of occupancy is a derived parameter: the probability that at least one individual is available for detection at the site.
occuRN(formula, data, K=25, starts, method="BFGS", se=TRUE, engine=c("C","R"), threads=1, ...)
occuRN(formula, data, K=25, starts, method="BFGS", se=TRUE, engine=c("C","R"), threads=1, ...)
formula |
double right-hand side formula describing covariates of detection and abundance, in that order. |
data |
Object of class |
K |
the upper summation index used to numerically integrate out the latent abundance. This should be set high enough so that it does not affect the parameter estimates. Computation time will increase with K. |
starts |
initial values for the optimization. |
method |
Optimization method used by |
se |
logical specifying whether or not to compute standard errors. |
engine |
Either "C" to use fast C++ code or "R" to use native R code during the optimization. |
threads |
Set the number of threads to use for optimization in C++, if
OpenMP is available on your system. Increasing the number of threads
may speed up optimization in some cases by running the likelihood
calculation in parallel. If |
... |
Additional arguments to optim, such as lower and upper bounds |
This function fits the latent abundance mixture model described in Royle and Nichols (2003).
The number of animals available for detection at site is modelled as Poisson:
We assume that all individuals at site during sample
have identical
detection probabilities,
, and that detections are independent. The species
will be recorded if at least one individual is detected. Thus, the detection
probability for the species is linked to the
detection probability for an individual by
Note that if , then
, and increasing values of
lead to higher values of
The equation for the detection history is then:
Covariates of are modelled with the log link
and covariates of
are modelled with the logit link.
unmarkedFit object describing the model fit.
Ian Fiske
Royle, J. A. and Nichols, J. D. (2003) Estimating Abundance from Repeated Presence-Absence Data or Point Counts. Ecology, 84(3) pp. 777–790.
## Not run: data(birds) woodthrushUMF <- unmarkedFrameOccu(woodthrush.bin) # survey occasion-specific detection probabilities (fm.wood.rn <- occuRN(~ obsNum ~ 1, woodthrushUMF)) # Empirical Bayes estimates of abundance at each site re <- ranef(fm.wood.rn) plot(re) ## End(Not run)
## Not run: data(birds) woodthrushUMF <- unmarkedFrameOccu(woodthrush.bin) # survey occasion-specific detection probabilities (fm.wood.rn <- occuRN(~ obsNum ~ 1, woodthrushUMF)) # Empirical Bayes estimates of abundance at each site re <- ranef(fm.wood.rn) plot(re) ## End(Not run)
Fit time-to-detection occupancy models of Garrard et al. (2008, 2013), either single-season or dynamic. Time-to-detection can be modeled with either an exponential or Weibull distribution.
occuTTD(psiformula= ~1, gammaformula = ~ 1, epsilonformula = ~ 1, detformula = ~ 1, data, ttdDist = c("exp", "weibull"), linkPsi = c("logit", "cloglog"), starts, method="BFGS", se=TRUE, engine = c("C", "R"), ...)
occuTTD(psiformula= ~1, gammaformula = ~ 1, epsilonformula = ~ 1, detformula = ~ 1, data, ttdDist = c("exp", "weibull"), linkPsi = c("logit", "cloglog"), starts, method="BFGS", se=TRUE, engine = c("C", "R"), ...)
psiformula |
Right-hand sided formula for the initial probability of occupancy at each site. |
gammaformula |
Right-hand sided formula for colonization probability. |
epsilonformula |
Right-hand sided formula for extinction probability. |
detformula |
Right-hand sided formula for mean time-to-detection. |
data |
|
ttdDist |
Distribution to use for time-to-detection; either
|
linkPsi |
Link function for the occupancy model. Options are
|
starts |
optionally, initial values for parameters in the optimization. |
method |
Optimization method used by |
se |
logical specifying whether or not to compute standard errors. |
engine |
Either "C" or "R" to use fast C++ code or native R code during the optimization. |
... |
Additional arguments to optim, such as lower and upper bounds |
Estimates site occupancy and detection probability from time-to-detection
(TTD) data, e.g. time to first detection of a particular bird species
during a point count or time-to-detection of a plant species while searching
a quadrat (Garrard et al. 2008). Time-to-detection can be modeled
as an exponential (ttdDist="exp"
) or Weibull (ttdDist="weibull"
)
random variable with rate parameter and, for the Weibull,
an additional shape parameter
. Note that
occuTTD
puts covariates
on and not
, i.e., the expected time between events.
In the case where there are no detections before the maximum sample time at
a site (surveyLength
) is reached, we are not sure if the site is
unoccupied or if we just didn't wait long enough for a detection. We therefore
must censor the exponential or Weibull distribution at the maximum survey
length, . Thus, assuming true site occupancy at site
is
, an exponential distribution for the TTD
, and that
indicates
is censored (Kery and Royle 2016):
and
Because in unmarked
values of NA
are typically used to indicate
missing values that were a result of the sampling structure (e.g., lost data),
we indicate a censored in
occuTTD
instead by setting
in the
y
matrix provided to
unmarkedFrameOccuTTD
. You can provide either a single value of
to the
surveyLength
argument of unmarkedFrameOccuTTD
,
or provide a matrix, potentially with a unique value of for each
value of
y
. Note that in the latter case the value of y
that will
be interpreted by occuTTD
as a censored observation (i.e., )
will differ between observations!
Occupancy and detection can be estimated with only a single survey per site,
unlike a traditional occupancy model that requires at least two replicated
surveys at at least some sites. However, occuTTD
also supports
multiple surveys per site using the model described in Garrard et al. (2013).
Furthermore, multi-season dynamic models are supported, using the same basic
structure as for standard occupancy models (see colext
).
When linkPsi = "cloglog"
, the complimentary log-log link
function is used for instead of the logit link. The cloglog link
relates occupancy probability to the intensity parameter of an underlying
Poisson process (Kery and Royle 2016). Thus, if abundance at a site is
can be modeled as
, where
, then presence/absence data at the
site can be modeled as
where
.
unmarkedFitOccuTTD object describing model fit.
Ken Kellner [email protected]
Garrard, G.E., Bekessy, S.A., McCarthy, M.A. and Wintle, B.A. 2008. When have we looked hard enough? A novel method for setting minimum survey effort protocols for flora surveys. Austral Ecology 33: 986-998.
Garrard, G.E., McCarthy, M.A., Williams, N.S., Bekessy, S.A. and Wintle, B.A. 2013. A general model of detectability using species traits. Methods in Ecology and Evolution 4: 45-52.
Kery, Marc, and J. Andrew Royle. 2016. Applied Hierarchical Modeling in Ecology, Volume 1. Academic Press.
unmarked
, unmarkedFrameOccuTTD
## Not run: ### Single season model N <- 500; J <- 1 #Simulate occupancy scovs <- data.frame(elev=c(scale(runif(N, 0,100))), forest=runif(N,0,1), wind=runif(N,0,1)) beta_psi <- c(-0.69, 0.71, -0.5) psi <- plogis(cbind(1, scovs$elev, scovs$forest) %*% beta_psi) z <- rbinom(N, 1, psi) #Simulate detection Tmax <- 10 #Same survey length for all observations beta_lam <- c(-2, -0.2, 0.7) rate <- exp(cbind(1, scovs$elev, scovs$wind) %*% beta_lam) ttd <- rexp(N, rate) ttd[z==0] <- Tmax #Censor at unoccupied sites ttd[ttd>Tmax] <- Tmax #Censor when ttd was greater than survey length #Build unmarkedFrame umf <- unmarkedFrameOccuTTD(y=ttd, surveyLength=Tmax, siteCovs=scovs) #Fit model fit <- occuTTD(psiformula=~elev+forest, detformula=~elev+wind, data=umf) #Predict psi values predict(fit, type='psi', newdata=data.frame(elev=0.5, forest=1)) #Predict lambda values predict(fit, type='det', newdata=data.frame(elev=0.5, wind=0)) #Calculate p, probability species is detected at a site given it is present #for a value of lambda. This is equivalent to eq 4 of Garrard et al. 2008 lam <- predict(fit, type='det', newdata=data.frame(elev=0.5, wind=0))$Predicted pexp(Tmax, lam) #Estimated p for all observations head(getP(fit)) ### Dynamic model N <- 1000; J <- 2; T <- 2 scovs <- data.frame(elev=c(scale(runif(N, 0,100))), forest=runif(N,0,1), wind=runif(N,0,1)) beta_psi <- c(-0.69, 0.71, -0.5) psi <- plogis(cbind(1, scovs$elev, scovs$forest) %*% beta_psi) z <- matrix(NA, N, T) z[,1] <- rbinom(N, 1, psi) #Col/ext process ysc <- data.frame(forest=rep(scovs$forest, each=T), elev=rep(scovs$elev, each=T)) c_b0 <- -0.4; c_b1 <- 0.3 gam <- plogis(c_b0 + c_b1 * scovs$forest) e_b0 <- -0.7; e_b1 <- 0.4 ext <- plogis(e_b0 + e_b1 * scovs$elev) for (i in 1:N){ for (t in 1:(T-1)){ if(z[i,t]==1){ #ext z[i,t+1] <- rbinom(1, 1, (1-ext[i])) } else { #col z[i,t+1] <- rbinom(1,1, gam[i]) } } } #Simulate detection ocovs <- data.frame(obs=rep(c('A','B'),N*T)) Tmax <- 10 beta_lam <- c(-2, -0.2, 0.7) rate <- exp(cbind(1, scovs$elev, scovs$wind) %*% beta_lam) #Add second observer at each site rateB <- exp(cbind(1, scovs$elev, scovs$wind) %*% beta_lam - 0.5) #Across seasons rate2 <- as.numeric(t(cbind(rate, rateB, rate, rateB))) ttd <- rexp(N*T*2, rate2) ttd <- matrix(ttd, nrow=N, byrow=T) ttd[ttd>Tmax] <- Tmax ttd[z[,1]==0,1:2] <- Tmax ttd[z[,2]==0,3:4] <- Tmax umf <- unmarkedFrameOccuTTD(y = ttd, surveyLength = Tmax, siteCovs = scovs, obsCovs=ocovs, yearlySiteCovs=ysc, numPrimary=2) dim(umf@y) #num sites, (num surveys x num primary periods) fit <- occuTTD(psiformula=~elev+forest,detformula=~elev+wind+obs, gammaformula=~forest, epsilonformula=~elev, data=umf,se=T,engine="C") truth <- c(beta_psi, c_b0, c_b1, e_b0, e_b1, beta_lam, -0.5) #Compare to truth cbind(coef(fit), truth) ## End(Not run)
## Not run: ### Single season model N <- 500; J <- 1 #Simulate occupancy scovs <- data.frame(elev=c(scale(runif(N, 0,100))), forest=runif(N,0,1), wind=runif(N,0,1)) beta_psi <- c(-0.69, 0.71, -0.5) psi <- plogis(cbind(1, scovs$elev, scovs$forest) %*% beta_psi) z <- rbinom(N, 1, psi) #Simulate detection Tmax <- 10 #Same survey length for all observations beta_lam <- c(-2, -0.2, 0.7) rate <- exp(cbind(1, scovs$elev, scovs$wind) %*% beta_lam) ttd <- rexp(N, rate) ttd[z==0] <- Tmax #Censor at unoccupied sites ttd[ttd>Tmax] <- Tmax #Censor when ttd was greater than survey length #Build unmarkedFrame umf <- unmarkedFrameOccuTTD(y=ttd, surveyLength=Tmax, siteCovs=scovs) #Fit model fit <- occuTTD(psiformula=~elev+forest, detformula=~elev+wind, data=umf) #Predict psi values predict(fit, type='psi', newdata=data.frame(elev=0.5, forest=1)) #Predict lambda values predict(fit, type='det', newdata=data.frame(elev=0.5, wind=0)) #Calculate p, probability species is detected at a site given it is present #for a value of lambda. This is equivalent to eq 4 of Garrard et al. 2008 lam <- predict(fit, type='det', newdata=data.frame(elev=0.5, wind=0))$Predicted pexp(Tmax, lam) #Estimated p for all observations head(getP(fit)) ### Dynamic model N <- 1000; J <- 2; T <- 2 scovs <- data.frame(elev=c(scale(runif(N, 0,100))), forest=runif(N,0,1), wind=runif(N,0,1)) beta_psi <- c(-0.69, 0.71, -0.5) psi <- plogis(cbind(1, scovs$elev, scovs$forest) %*% beta_psi) z <- matrix(NA, N, T) z[,1] <- rbinom(N, 1, psi) #Col/ext process ysc <- data.frame(forest=rep(scovs$forest, each=T), elev=rep(scovs$elev, each=T)) c_b0 <- -0.4; c_b1 <- 0.3 gam <- plogis(c_b0 + c_b1 * scovs$forest) e_b0 <- -0.7; e_b1 <- 0.4 ext <- plogis(e_b0 + e_b1 * scovs$elev) for (i in 1:N){ for (t in 1:(T-1)){ if(z[i,t]==1){ #ext z[i,t+1] <- rbinom(1, 1, (1-ext[i])) } else { #col z[i,t+1] <- rbinom(1,1, gam[i]) } } } #Simulate detection ocovs <- data.frame(obs=rep(c('A','B'),N*T)) Tmax <- 10 beta_lam <- c(-2, -0.2, 0.7) rate <- exp(cbind(1, scovs$elev, scovs$wind) %*% beta_lam) #Add second observer at each site rateB <- exp(cbind(1, scovs$elev, scovs$wind) %*% beta_lam - 0.5) #Across seasons rate2 <- as.numeric(t(cbind(rate, rateB, rate, rateB))) ttd <- rexp(N*T*2, rate2) ttd <- matrix(ttd, nrow=N, byrow=T) ttd[ttd>Tmax] <- Tmax ttd[z[,1]==0,1:2] <- Tmax ttd[z[,2]==0,3:4] <- Tmax umf <- unmarkedFrameOccuTTD(y = ttd, surveyLength = Tmax, siteCovs = scovs, obsCovs=ocovs, yearlySiteCovs=ysc, numPrimary=2) dim(umf@y) #num sites, (num surveys x num primary periods) fit <- occuTTD(psiformula=~elev+forest,detformula=~elev+wind+obs, gammaformula=~forest, epsilonformula=~elev, data=umf,se=T,engine="C") truth <- c(beta_psi, c_b0, c_b1, e_b0, e_b1, beta_lam, -0.5) #Compare to truth cbind(coef(fit), truth) ## End(Not run)
Identify the optimal value of the penalty term for unmarked
models that
support penalized likelihood. For each potential value of the penalty term,
K-fold cross validation is performed. Log-likelihoods for the test data in
each fold are calculated and summed. The penalty term that maximizes the sum
of the fold log-likelihoods is selected as the optimal value. Finally, the
model is re-fit with the full dataset using the selected penalty term.
Right now only Bayes-inspired penalty of Hutchinson et al. (2015) is supported.
Currently the only fitting function that supports optimizePenalty
is
occuMulti
for multispecies occupancy modeling; see Clipp et al. (2021).
## S4 method for signature 'unmarkedFitOccuMulti' optimizePenalty( object, penalties = c(0, 2^seq(-4, 4)), k = 5, boot = 30, ...)
## S4 method for signature 'unmarkedFitOccuMulti' optimizePenalty( object, penalties = c(0, 2^seq(-4, 4)), k = 5, boot = 30, ...)
object |
A fitted model inheriting class |
penalties |
Vector of possible penalty values, all of which must be >= 0 |
k |
Number of folds to use for k-fold cross validation |
boot |
Number of bootstrap samples to use to generate the variance-covariance matrix for the final model. |
... |
Other arguments, currently ignored |
unmarkedFit
object of same type as input, with the optimal
penalty value applied.
Ken Kellner [email protected]
Clipp, H. L., Evans, A., Kessinger, B. E., Kellner, K. F., and C. T. Rota. 2021. A penalized likelihood for multi-species occupancy models improves predictions of species interactions. Ecology.
Hutchinson, R. A., J. V. Valente, S. C. Emerson, M. G. Betts, and T. G. Dietterich. 2015. Penalized Likelihood Methods Improve Parameter Estimates in Occupancy Models. Methods in Ecology and Evolution. DOI: 10.1111/2041-210X.12368
Removal sampling data collected for the Ovenbird (Seiurus aurocapillus).
data(ovendata)
data(ovendata)
The format is: chr "ovendata.list" which consists of
matrix of removal counts
data frame of site-level covariates
J.A. Royle (see reference below)
Royle, J. A. (2004). Generalized estimators of avian abundance from count survey data. Animal Biodiversity and Conservation, 27(1), 375-386.
data(ovendata) str(ovendata.list) ovenFrame <- unmarkedFrameMPois(ovendata.list$data, siteCovs=as.data.frame(scale(ovendata.list$covariates[,-1])), type = "removal")
data(ovendata) str(ovendata.list) ovenFrame <- unmarkedFrameMPois(ovendata.list$data, siteCovs=as.data.frame(scale(ovendata.list$covariates[,-1])), type = "removal")
unmarked
modelsSimulate datasets from a fitted model, refit the model, and generate a sampling distribution for a user-specified fit-statistic.
## S4 method for signature 'unmarkedFit' parboot(object, statistic = SSE, nsim = 10, report, parallel = FALSE, ncores, ...)
## S4 method for signature 'unmarkedFit' parboot(object, statistic = SSE, nsim = 10, report, parallel = FALSE, ncores, ...)
object |
a fitted model inheriting class "unmarkedFit" |
statistic |
a function returning a vector of fit-statistics. First argument must be the fitted model. Default is sum of squared residuals. |
nsim |
number of bootstrap replicates |
report |
Non-functional; will be removed soon. |
parallel |
logical (default = |
ncores |
integer (default = one less than number of available cores) number of cores to use when bootstrapping in parallel. |
... |
Additional arguments to be passed to |
This function simulates datasets based upon a fitted model, refits the model, and evaluates a user-specified fit-statistic for each simulation. Comparing this sampling distribution to the observed statistic provides a means of evaluating goodness-of-fit or assessing uncertainty in a quantity of interest.
An object of class parboot
with three slots:
call |
parboot call |
t0 |
Numeric vector of statistics for original fitted model. |
t.star |
nsim by length(t0) matrix of statistics for each simulation fit. |
Richard Chandler [email protected] and Adam Smith
data(linetran) (dbreaksLine <- c(0, 5, 10, 15, 20)) lengths <- linetran$Length ltUMF <- with(linetran, { unmarkedFrameDS(y = cbind(dc1, dc2, dc3, dc4), siteCovs = data.frame(Length, area, habitat), dist.breaks = dbreaksLine, tlength = lengths*1000, survey = "line", unitsIn = "m") }) # Fit a model (fm <- distsamp(~area ~habitat, ltUMF)) # Function returning three fit-statistics. fitstats <- function(fm, na.rm=TRUE) { observed <- getY(fm@data) expected <- fitted(fm) resids <- residuals(fm) sse <- sum(resids^2, na.rm=na.rm) chisq <- sum((observed - expected)^2 / expected, na.rm=na.rm) freeTuke <- sum((sqrt(observed) - sqrt(expected))^2, na.rm=na.rm) out <- c(SSE=sse, Chisq=chisq, freemanTukey=freeTuke) return(out) } (pb <- parboot(fm, fitstats, nsim=25, report=1)) plot(pb, main="") # Finite-sample inference for a derived parameter. # Population size in sampled area Nhat <- function(fm) { sum(bup(ranef(fm, K=50))) } set.seed(345) (pb.N <- parboot(fm, Nhat, nsim=25, report=5)) # Compare to empirical Bayes confidence intervals colSums(confint(ranef(fm, K=50)))
data(linetran) (dbreaksLine <- c(0, 5, 10, 15, 20)) lengths <- linetran$Length ltUMF <- with(linetran, { unmarkedFrameDS(y = cbind(dc1, dc2, dc3, dc4), siteCovs = data.frame(Length, area, habitat), dist.breaks = dbreaksLine, tlength = lengths*1000, survey = "line", unitsIn = "m") }) # Fit a model (fm <- distsamp(~area ~habitat, ltUMF)) # Function returning three fit-statistics. fitstats <- function(fm, na.rm=TRUE) { observed <- getY(fm@data) expected <- fitted(fm) resids <- residuals(fm) sse <- sum(resids^2, na.rm=na.rm) chisq <- sum((observed - expected)^2 / expected, na.rm=na.rm) freeTuke <- sum((sqrt(observed) - sqrt(expected))^2, na.rm=na.rm) out <- c(SSE=sse, Chisq=chisq, freemanTukey=freeTuke) return(out) } (pb <- parboot(fm, fitstats, nsim=25, report=1)) plot(pb, main="") # Finite-sample inference for a derived parameter. # Population size in sampled area Nhat <- function(fm) { sum(bup(ranef(fm, K=50))) } set.seed(345) (pb.N <- parboot(fm, Nhat, nsim=25, report=5)) # Compare to empirical Bayes confidence intervals colSums(confint(ranef(fm, K=50)))
Fit the N-mixture model of Royle (2004)
pcount(formula, data, K, mixture=c("P", "NB", "ZIP"), starts, method="BFGS", se=TRUE, engine=c("C", "R", "TMB"), threads=1, ...)
pcount(formula, data, K, mixture=c("P", "NB", "ZIP"), starts, method="BFGS", se=TRUE, engine=c("C", "R", "TMB"), threads=1, ...)
formula |
Double right-hand side formula describing covariates of detection and abundance, in that order |
data |
an unmarkedFramePCount object supplying data to the model. |
K |
Integer upper index of integration for N-mixture. This should be set high enough so that it does not affect the parameter estimates. Note that computation time will increase with K. |
mixture |
character specifying mixture: "P", "NB", or "ZIP". |
starts |
vector of starting values |
method |
Optimization method used by |
se |
logical specifying whether or not to compute standard errors. |
engine |
Either "C", "R", or "TMB" to use fast C++ code, native R code, or TMB (required for random effects) during the optimization. |
threads |
Set the number of threads to use for optimization in C++, if
OpenMP is available on your system. Increasing the number of threads
may speed up optimization in some cases by running the likelihood
calculation in parallel. If |
... |
Additional arguments to optim, such as lower and upper bounds |
This function fits N-mixture model of Royle (2004) to spatially replicated count data.
See unmarkedFramePCount
for a description of how to format data
for pcount
.
This function fits the latent N-mixture model for point count data (Royle 2004, Kery et al 2005).
The latent abundance distribution, can be set as a Poisson, negative binomial, or zero-inflated
Poisson random
variable, depending on the setting of the
mixture
argument,
mixture = "P"
, mixture = "NB"
, mixture = "ZIP"
respectively. For the first two distributions, the mean of is
. If
, then an
additional parameter,
, describes dispersion (lower
implies higher variance). For the ZIP distribution,
the mean is
, where psi is the
zero-inflation parameter.
The detection process is modeled as binomial: .
Covariates of use the log link and
covariates of
use the logit link.
unmarkedFit object describing the model fit.
Ian Fiske and Richard Chandler
Royle, J. A. (2004) N-Mixture Models for Estimating Population Size from Spatially Replicated Counts. Biometrics 60, pp. 108–105.
Kery, M., Royle, J. A., and Schmid, H. (2005) Modeling Avaian Abundance from Replicated Counts Using Binomial Mixture Models. Ecological Applications 15(4), pp. 1450–1461.
Johnson, N.L, A.W. Kemp, and S. Kotz. (2005) Univariate Discrete Distributions, 3rd ed. Wiley.
unmarkedFramePCount
, pcountOpen
,
ranef
, parboot
## Not run: # Simulate data set.seed(35) nSites <- 100 nVisits <- 3 x <- rnorm(nSites) # a covariate beta0 <- 0 beta1 <- 1 lambda <- exp(beta0 + beta1*x) # expected counts at each site N <- rpois(nSites, lambda) # latent abundance y <- matrix(NA, nSites, nVisits) p <- c(0.3, 0.6, 0.8) # detection prob for each visit for(j in 1:nVisits) { y[,j] <- rbinom(nSites, N, p[j]) } # Organize data visitMat <- matrix(as.character(1:nVisits), nSites, nVisits, byrow=TRUE) umf <- unmarkedFramePCount(y=y, siteCovs=data.frame(x=x), obsCovs=list(visit=visitMat)) summary(umf) # Fit a model fm1 <- pcount(~visit-1 ~ x, umf, K=50) fm1 plogis(coef(fm1, type="det")) # Should be close to p # Empirical Bayes estimation of random effects (fm1re <- ranef(fm1)) plot(fm1re, subset=site %in% 1:25, xlim=c(-1,40)) sum(bup(fm1re)) # Estimated population size sum(N) # Actual population size # Real data data(mallard) mallardUMF <- unmarkedFramePCount(mallard.y, siteCovs = mallard.site, obsCovs = mallard.obs) (fm.mallard <- pcount(~ ivel+ date + I(date^2) ~ length + elev + forest, mallardUMF, K=30)) (fm.mallard.nb <- pcount(~ date + I(date^2) ~ length + elev, mixture = "NB", mallardUMF, K=30)) ## End(Not run)
## Not run: # Simulate data set.seed(35) nSites <- 100 nVisits <- 3 x <- rnorm(nSites) # a covariate beta0 <- 0 beta1 <- 1 lambda <- exp(beta0 + beta1*x) # expected counts at each site N <- rpois(nSites, lambda) # latent abundance y <- matrix(NA, nSites, nVisits) p <- c(0.3, 0.6, 0.8) # detection prob for each visit for(j in 1:nVisits) { y[,j] <- rbinom(nSites, N, p[j]) } # Organize data visitMat <- matrix(as.character(1:nVisits), nSites, nVisits, byrow=TRUE) umf <- unmarkedFramePCount(y=y, siteCovs=data.frame(x=x), obsCovs=list(visit=visitMat)) summary(umf) # Fit a model fm1 <- pcount(~visit-1 ~ x, umf, K=50) fm1 plogis(coef(fm1, type="det")) # Should be close to p # Empirical Bayes estimation of random effects (fm1re <- ranef(fm1)) plot(fm1re, subset=site %in% 1:25, xlim=c(-1,40)) sum(bup(fm1re)) # Estimated population size sum(N) # Actual population size # Real data data(mallard) mallardUMF <- unmarkedFramePCount(mallard.y, siteCovs = mallard.site, obsCovs = mallard.obs) (fm.mallard <- pcount(~ ivel+ date + I(date^2) ~ length + elev + forest, mallardUMF, K=30)) (fm.mallard.nb <- pcount(~ date + I(date^2) ~ length + elev, mixture = "NB", mallardUMF, K=30)) ## End(Not run)
Function fits an N-mixture model for a discrete state space with raster covariates, and a detection function which decreases with distance from the observer, assumed to be at the centre. See Kery & Royle (2016) Section 9.8.4 for details.
pcount.spHDS(formula, data, K, mixture = c("P", "NB", "ZIP"), starts, method = "BFGS", se = TRUE, ...)
pcount.spHDS(formula, data, K, mixture = c("P", "NB", "ZIP"), starts, method = "BFGS", se = TRUE, ...)
formula |
Double right-hand side formula describing covariates of detection and abundance, in that order. Detection model should be specified without an intercept, for example:
|
data |
an |
K |
Integer upper index of integration for N-mixture. This should be set high enough so that it does not affect the parameter estimates. Note that computation time will increase with K. |
mixture |
character specifying mixture: Poisson (P), Negative-Binomial (NB), or Zero Inflated Poisson (ZIP). |
starts |
vector of starting values |
method |
Optimization method used by |
se |
logical specifying whether or not to compute standard errors. |
... |
Additional arguments to optim, such as lower and upper bounds |
unmarkedFit object describing the model fit.
Kery & Royle
Kery & Royle (2016) Applied Hierarachical Modeling in Ecology Section 9.8.4
## Simulate some data to analyse # This is based on Kery and Royle (2016) section 9.8.3 # See AHMbook::sim.spatialDS for more simulation options. # We will simulate distance data for a logit detection function with sigma = 1, # for a 6x6 square, divided into a 30 x 30 grid of pixels (900 in all), with the # observer in the centre. set.seed(2017) ## 1. Create coordinates for 30 x 30 grid grx <- seq(0.1, 5.9, 0.2) # mid-point coordinates gr <- expand.grid(grx, grx) # data frame with coordinates of pixel centres ## 2a. Simulate spatially correlated Habitat covariate # Get the pair-wise distances between pixel centres tmp <- as.matrix(dist(gr)) # a 900 x 900 matrix # Correlation is a negative exponential function of distance, with scale parameter = 1 V <- exp(-tmp/1) Habitat <- crossprod(t(chol(V)), rnorm(900)) ## 2b. Do a detection covariate: the distance of each pixel centre from the observer dist <- sqrt((gr[,1]-3)^2 + (gr[,2]-3)^2) ## 3. Simulate the true population # Probability that an animal is in a pixel depends on the Habitat covariate, with # coefficient beta: beta <- 1 probs <- exp(beta*Habitat) / sum(exp(beta*Habitat)) # Allocate 600 animals to the 900 pixels, get the pixel ID for each animal pixel.id <- sample(1:900, 600, replace=TRUE, prob=probs) ## 4. Simulate the detection process # Get the distance of each animal from the observer # (As an approximation, we'll treat animals as if they are at the pixel centre.) d <- dist[pixel.id] # Calculate probability of detection with logit detection function with sigma <- 1 p <- 2*plogis(-d^2/(2*sigma^2)) # Simulate the 1/0 detection/nondetection vector y <- rbinom(600, 1, p) # Check the number of animals detected sum(y) # Select the pixel IDs for the animals detected and count the number in each pixel detected.pixel.id <- pixel.id[y == 1] pixel.count <- tabulate(detected.pixel.id, nbins=900) ## 5. Prepare the data for unmarked # Centre the Habitat covariate Habitat <- Habitat - mean(Habitat) # Construct the unmarkedFramePCount object umf <- unmarkedFramePCount(y=cbind(pixel.count), # y needs to be a 1-column matrix siteCovs=data.frame(dist=dist, Habitat=Habitat)) summary(umf) ## 6. Fit some models (fm0 <- pcount.spHDS(~ -1 + I(dist^2) ~ 1, umf, K = 20)) (fm1 <- pcount.spHDS(~ -1 + I(dist^2) ~ Habitat, umf, K = 20)) # The true Habitat coefficient (beta above) = 1 # fm1 has much lower AIC; look at the population estimate sum(predict(fm1, type="state")[, 1])
## Simulate some data to analyse # This is based on Kery and Royle (2016) section 9.8.3 # See AHMbook::sim.spatialDS for more simulation options. # We will simulate distance data for a logit detection function with sigma = 1, # for a 6x6 square, divided into a 30 x 30 grid of pixels (900 in all), with the # observer in the centre. set.seed(2017) ## 1. Create coordinates for 30 x 30 grid grx <- seq(0.1, 5.9, 0.2) # mid-point coordinates gr <- expand.grid(grx, grx) # data frame with coordinates of pixel centres ## 2a. Simulate spatially correlated Habitat covariate # Get the pair-wise distances between pixel centres tmp <- as.matrix(dist(gr)) # a 900 x 900 matrix # Correlation is a negative exponential function of distance, with scale parameter = 1 V <- exp(-tmp/1) Habitat <- crossprod(t(chol(V)), rnorm(900)) ## 2b. Do a detection covariate: the distance of each pixel centre from the observer dist <- sqrt((gr[,1]-3)^2 + (gr[,2]-3)^2) ## 3. Simulate the true population # Probability that an animal is in a pixel depends on the Habitat covariate, with # coefficient beta: beta <- 1 probs <- exp(beta*Habitat) / sum(exp(beta*Habitat)) # Allocate 600 animals to the 900 pixels, get the pixel ID for each animal pixel.id <- sample(1:900, 600, replace=TRUE, prob=probs) ## 4. Simulate the detection process # Get the distance of each animal from the observer # (As an approximation, we'll treat animals as if they are at the pixel centre.) d <- dist[pixel.id] # Calculate probability of detection with logit detection function with sigma <- 1 p <- 2*plogis(-d^2/(2*sigma^2)) # Simulate the 1/0 detection/nondetection vector y <- rbinom(600, 1, p) # Check the number of animals detected sum(y) # Select the pixel IDs for the animals detected and count the number in each pixel detected.pixel.id <- pixel.id[y == 1] pixel.count <- tabulate(detected.pixel.id, nbins=900) ## 5. Prepare the data for unmarked # Centre the Habitat covariate Habitat <- Habitat - mean(Habitat) # Construct the unmarkedFramePCount object umf <- unmarkedFramePCount(y=cbind(pixel.count), # y needs to be a 1-column matrix siteCovs=data.frame(dist=dist, Habitat=Habitat)) summary(umf) ## 6. Fit some models (fm0 <- pcount.spHDS(~ -1 + I(dist^2) ~ 1, umf, K = 20)) (fm1 <- pcount.spHDS(~ -1 + I(dist^2) ~ Habitat, umf, K = 20)) # The true Habitat coefficient (beta above) = 1 # fm1 has much lower AIC; look at the population estimate sum(predict(fm1, type="state")[, 1])
Fit the models of Dail and Madsen (2011) and Hostetler and Chandler (in press), which are generalized forms of the Royle (2004) N-mixture model for open populations.
pcountOpen(lambdaformula, gammaformula, omegaformula, pformula, data, mixture = c("P", "NB", "ZIP"), K, dynamics=c("constant", "autoreg", "notrend", "trend", "ricker", "gompertz"), fix=c("none", "gamma", "omega"), starts, method = "BFGS", se = TRUE, immigration = FALSE, iotaformula = ~1, ...)
pcountOpen(lambdaformula, gammaformula, omegaformula, pformula, data, mixture = c("P", "NB", "ZIP"), K, dynamics=c("constant", "autoreg", "notrend", "trend", "ricker", "gompertz"), fix=c("none", "gamma", "omega"), starts, method = "BFGS", se = TRUE, immigration = FALSE, iotaformula = ~1, ...)
lambdaformula |
Right-hand sided formula for initial abundance |
gammaformula |
Right-hand sided formula for recruitment rate (when dynamics is "constant", "autoreg", or "notrend") or population growth rate (when dynamics is "trend", "ricker", or "gompertz") |
omegaformula |
Right-hand sided formula for apparent survival probability (when dynamics is "constant", "autoreg", or "notrend") or equilibrium abundance (when dynamics is "ricker" or "gompertz") |
pformula |
Right-hand sided formula for detection probability |
data |
An object of class |
mixture |
character specifying mixture: "P", "NB", or "ZIP" for the Poisson, negative binomial, and zero-inflated Poisson distributions. |
K |
Integer defining upper bound of discrete integration. This should be higher than the maximum observed count and high enough that it does not affect the parameter estimates. However, the higher the value the slower the compuatation. |
dynamics |
Character string describing the type of population dynamics. "constant" indicates that there is no relationship between omega and gamma. "autoreg" is an auto-regressive model in which recruitment is modeled as gamma*N[i,t-1]. "notrend" model gamma as lambda*(1-omega) such that there is no temporal trend. "trend" is a model for exponential growth, N[i,t] = N[i,t-1]*gamma, where gamma in this case is finite rate of increase (normally referred to as lambda). "ricker" and "gompertz" are models for density-dependent population growth. "ricker" is the Ricker-logistic model, N[i,t] = N[i,t-1]*exp(gamma*(1-N[i,t-1]/omega)), where gamma is the maximum instantaneous population growth rate (normally referred to as r) and omega is the equilibrium abundance (normally referred to as K). "gompertz" is a modified version of the Gompertz-logistic model, N[i,t] = N[i,t-1]*exp(gamma*(1-log(N[i,t-1]+1)/log(omega+1))), where the interpretations of gamma and omega are similar to in the Ricker model. |
fix |
If "omega", omega is fixed at 1. If "gamma", gamma is fixed at 0. |
starts |
vector of starting values |
method |
Optimization method used by |
se |
logical specifying whether or not to compute standard errors. |
immigration |
logical specifying whether or not to include an immigration term (iota) in population dynamics. |
iotaformula |
Right-hand sided formula for average number of immigrants to a site per time step |
... |
additional arguments to be passed to |
These models generalize the Royle (2004) N-mixture model by relaxing the closure assumption. The models include two or three additional parameters: gamma, either the recruitment rate (births and immigrations), the finite rate of increase, or the maximum instantaneous rate of increase; omega, either the apparent survival rate (deaths and emigrations) or the equilibrium abundance (carrying capacity); and iota, the number of immigrants per site and year. Estimates of population size at each time period can be derived from these parameters, and thus so can trend estimates. Or, trend can be estimated directly using dynamics="trend".
When immigration is set to FALSE (the default), iota is not modeled. When immigration is set to TRUE and dynamics is set to "autoreg", the model will separately estimate birth rate (gamma) and number of immigrants (iota). When immigration is set to TRUE and dynamics is set to "trend", "ricker", or "gompertz", the model will separately estimate local contributions to population growth (gamma and omega) and number of immigrants (iota).
The latent abundance distribution, can be set as a Poisson, negative binomial, or zero-inflated
Poisson random
variable, depending on the setting of the
mixture
argument,
mixture = "P"
, mixture = "NB"
, mixture = "ZIP"
respectively. For the first two distributions, the mean of is
. If
, then an
additional parameter,
, describes dispersion (lower
implies higher variance). For the ZIP distribution,
the mean is
, where psi is the
zero-inflation parameter.
For "constant", "autoreg", or "notrend" dynamics, the latent abundance state
following the initial sampling period arises
from a
Markovian process in which survivors are modeled as , and recruits
follow
.
Alternative population dynamics can be specified
using the
dynamics
and immigration
arguments.
The detection process is modeled as binomial: .
,
, and
are modeled
using the the log link.
is modeled using
the logit link.
is either modeled using the logit link (for
"constant", "autoreg", or "notrend" dynamics) or the log link (for "ricker"
or "gompertz" dynamics). For "trend" dynamics,
is not modeled.
An object of class unmarkedFitPCO.
This function can be extremely slow, especially if there are covariates of gamma or omega. Consider testing the timing on a small subset of the data, perhaps with se=FALSE. Finding the lowest value of K that does not affect estimates will also help with speed.
When gamma or omega are modeled using year-specific covariates, the covariate data for the final year will be ignored; however, they must be supplied.
If the time gap between primary periods is not constant, an M by T
matrix of integers should be supplied to unmarkedFramePCO
using the primaryPeriod
argument.
Secondary sampling periods are optional, but can greatly improve the precision of the estimates.
Richard Chandler [email protected] and Jeff Hostetler
Royle, J. A. (2004) N-Mixture Models for Estimating Population Size from Spatially Replicated Counts. Biometrics 60, pp. 108–105.
Dail, D. and L. Madsen (2011) Models for Estimating Abundance from Repeated Counts of an Open Metapopulation. Biometrics. 67, pp 577-587.
Hostetler, J. A. and R. B. Chandler (2015) Improved State-space Models for Inference about Spatial and Temporal Variation in Abundance from Count Data. Ecology 96:1713-1723.
## Simulation ## No covariates, constant time intervals between primary periods, and ## no secondary sampling periods set.seed(3) M <- 50 T <- 5 lambda <- 4 gamma <- 1.5 omega <- 0.8 p <- 0.7 y <- N <- matrix(NA, M, T) S <- G <- matrix(NA, M, T-1) N[,1] <- rpois(M, lambda) for(t in 1:(T-1)) { S[,t] <- rbinom(M, N[,t], omega) G[,t] <- rpois(M, gamma) N[,t+1] <- S[,t] + G[,t] } y[] <- rbinom(M*T, N, p) # Prepare data umf <- unmarkedFramePCO(y = y, numPrimary=T) summary(umf) # Fit model and backtransform (m1 <- pcountOpen(~1, ~1, ~1, ~1, umf, K=20)) # Typically, K should be higher (lam <- predict(m1, "lambda")$Predicted[1]) # or lam <- exp(coef(m1, type="lambda")) gam <- exp(coef(m1, type="gamma")) om <- plogis(coef(m1, type="omega")) p <- plogis(coef(m1, type="det")) ## Not run: # Finite sample inference. Abundance at site i, year t re <- ranef(m1) devAskNewPage(TRUE) plot(re, layout=c(5,5), subset = site %in% 1:25 & year %in% 1:2, xlim=c(-1,15)) devAskNewPage(FALSE) (N.hat1 <- colSums(bup(re))) # Expected values of N[i,t] N.hat2 <- matrix(NA, M, T) N.hat2[,1] <- lam for(t in 2:T) { N.hat2[,t] <- om*N.hat2[,t-1] + gam } rbind(N=colSums(N), N.hat1=N.hat1, N.hat2=colSums(N.hat2)) ## End(Not run)
## Simulation ## No covariates, constant time intervals between primary periods, and ## no secondary sampling periods set.seed(3) M <- 50 T <- 5 lambda <- 4 gamma <- 1.5 omega <- 0.8 p <- 0.7 y <- N <- matrix(NA, M, T) S <- G <- matrix(NA, M, T-1) N[,1] <- rpois(M, lambda) for(t in 1:(T-1)) { S[,t] <- rbinom(M, N[,t], omega) G[,t] <- rpois(M, gamma) N[,t+1] <- S[,t] + G[,t] } y[] <- rbinom(M*T, N, p) # Prepare data umf <- unmarkedFramePCO(y = y, numPrimary=T) summary(umf) # Fit model and backtransform (m1 <- pcountOpen(~1, ~1, ~1, ~1, umf, K=20)) # Typically, K should be higher (lam <- predict(m1, "lambda")$Predicted[1]) # or lam <- exp(coef(m1, type="lambda")) gam <- exp(coef(m1, type="gamma")) om <- plogis(coef(m1, type="omega")) p <- plogis(coef(m1, type="det")) ## Not run: # Finite sample inference. Abundance at site i, year t re <- ranef(m1) devAskNewPage(TRUE) plot(re, layout=c(5,5), subset = site %in% 1:25 & year %in% 1:2, xlim=c(-1,15)) devAskNewPage(FALSE) (N.hat1 <- colSums(bup(re))) # Expected values of N[i,t] N.hat2 <- matrix(NA, M, T) N.hat2[,1] <- lam for(t in 2:T) { N.hat2[,t] <- om*N.hat2[,t-1] + gam } rbind(N=colSums(N), N.hat1=N.hat1, N.hat2=colSums(N.hat2)) ## End(Not run)
Compute the cell probabilities used in the multinomial-Poisson models multinomPois and gmultmix. These functions use piFuns internally to calculate multinomial likelihoods from the occasion-wise detection probabilities. The only reason to call them directly is to check their behaviour.
removalPiFun(p) doublePiFun(p)
removalPiFun(p) doublePiFun(p)
p |
matrix of detection probabilities at each site for each observation |
These two functions are provided as examples of possible functions to calculate multinomial cell probabilities. Users may write their own functions for specific sampling designs (see the example).
For removalPiFun, a matrix of cell probabilities for each site and sampling period.
For doublePiFun, a matrix of cell probabilities for each site and observer combination. Column one is probability observer 1 but not observer 2 detects the object, column two is probability that observer 2 but not observer 1 detects the object, and column 3 is probability of both detecting.
makePiFuns for factory functions to create customised piFuns.
(pRem <- matrix(0.5, nrow=3, ncol=3)) # Capture probabilities removalPiFun(pRem) # Cell probs (pDouble <- matrix(0.5, 3, 2)) # Observer detection probs doublePiFun(pDouble) # Cell probs # A user-defined piFun calculating removal probs when time intervals differ. # Here 10-minute counts were divided into 2, 3, and 5 minute intervals. # This function could be supplied to unmarkedFrameMPois along with the obsToY # argument shown below. instRemPiFun <- function(p) { M <- nrow(p) J <- ncol(p) pi <- matrix(NA, M, J) p[,1] <- pi[,1] <- 1 - (1 - p[,1])^2 p[,2] <- 1 - (1 - p[,2])^3 p[,3] <- 1 - (1 - p[,3])^5 for(i in 2:J) { pi[,i] <- pi[, i - 1]/p[, i - 1] * (1 - p[, i - 1]) * p[, i] } return(pi) } instRemPiFun(pRem) # Associated obsToY matrix required by unmarkedFrameMPois o2y <- diag(3) # if y has 3 columns o2y[upper.tri(o2y)] <- 1 o2y
(pRem <- matrix(0.5, nrow=3, ncol=3)) # Capture probabilities removalPiFun(pRem) # Cell probs (pDouble <- matrix(0.5, 3, 2)) # Observer detection probs doublePiFun(pDouble) # Cell probs # A user-defined piFun calculating removal probs when time intervals differ. # Here 10-minute counts were divided into 2, 3, and 5 minute intervals. # This function could be supplied to unmarkedFrameMPois along with the obsToY # argument shown below. instRemPiFun <- function(p) { M <- nrow(p) J <- ncol(p) pi <- matrix(NA, M, J) p[,1] <- pi[,1] <- 1 - (1 - p[,1])^2 p[,2] <- 1 - (1 - p[,2])^3 p[,3] <- 1 - (1 - p[,3])^5 for(i in 2:J) { pi[,i] <- pi[, i - 1]/p[, i - 1] * (1 - p[, i - 1]) * p[, i] } return(pi) } instRemPiFun(pRem) # Associated obsToY matrix required by unmarkedFrameMPois o2y <- diag(3) # if y has 3 columns o2y[upper.tri(o2y)] <- 1 o2y
This function generates a plot visualizing the effects of a single covariate on a parameter (e.g. occupancy, abundance) in an unmarked model. If the covariate is numeric, the result is a line plot with an error ribbon where the x-axis is the range of the covariate and the y-axis is the predicted parameter value. If the covariate is an R factor (i.e., categorical), the x-axis instead contains each unique value of the covariate.
All covariates in the model besides the one being plotted are held either at their median value (if they are numeric) or at their reference level (if they are factors).
Some types of unmarked models may require additional arguments, which are passed
to the matching predict
method. For example, unmarkedFitOccuMulti
models require the species
argument to be included in the function
call in order to work properly.
If you want to customize a plot, the easiest approach is to get data
formatted for plotting using plotEffectsData
, and use that. If
you want to see and/or modify the code used by plotEffects
to generate
the default plots, run getMethod("plotEffects", "unmarkedFit")
in
the R console.
## S4 method for signature 'unmarkedFit' plotEffects(object, type, covariate, level=0.95, ...) ## S4 method for signature 'unmarkedFit' plotEffectsData(object, type, covariate, level=0.95, ...)
## S4 method for signature 'unmarkedFit' plotEffects(object, type, covariate, level=0.95, ...) ## S4 method for signature 'unmarkedFit' plotEffectsData(object, type, covariate, level=0.95, ...)
object |
A fitted model inheriting class |
type |
Submodel in which the covariate of interest can be found, for
example |
covariate |
The name of the covariate to be plotted, as a character string |
level |
Confidence level for the error ribbons or bars |
... |
Other arguments passed to the |
A plot (plotEffects
or a data frame (plotEffectsData
)
containing values to be used in a plot.
Ken Kellner [email protected]
## Not run: # Simulate data and build an unmarked frame set.seed(123) dat_occ <- data.frame(x1=rnorm(500)) dat_p <- data.frame(x2=rnorm(500*5)) y <- matrix(NA, 500, 5) z <- rep(NA, 500) b <- c(0.4, -0.5, 0.3, 0.5) re_fac <- factor(sample(letters[1:5], 500, replace=T)) dat_occ$group <- re_fac re <- rnorm(5, 0, 1.2) re_idx <- as.numeric(re_fac) idx <- 1 for (i in 1:500){ z[i] <- rbinom(1,1, plogis(b[1] + b[2]*dat_occ$x1[i] + re[re_idx[i]])) for (j in 1:5){ y[i,j] <- z[i]*rbinom(1,1, plogis(b[3] + b[4]*dat_p$x2[idx])) idx <- idx + 1 } } umf <- unmarkedFrameOccu(y=y, siteCovs=dat_occ, obsCovs=dat_p) # Fit model (fm <- occu(~x2 ~x1 + group, umf)) # Plot marginal effects of various covariates plotEffects(fm, "state", "x1") plotEffects(fm, "state", "group") plotEffects(fm, "det", "x2") # Get raw data used for a plot plotEffectsData(fm, "state", "group") # See code used by plotEffects so you can edit it yourself and customize the plot methods::getMethod("plotEffects", "unmarkedFit") ## End(Not run)
## Not run: # Simulate data and build an unmarked frame set.seed(123) dat_occ <- data.frame(x1=rnorm(500)) dat_p <- data.frame(x2=rnorm(500*5)) y <- matrix(NA, 500, 5) z <- rep(NA, 500) b <- c(0.4, -0.5, 0.3, 0.5) re_fac <- factor(sample(letters[1:5], 500, replace=T)) dat_occ$group <- re_fac re <- rnorm(5, 0, 1.2) re_idx <- as.numeric(re_fac) idx <- 1 for (i in 1:500){ z[i] <- rbinom(1,1, plogis(b[1] + b[2]*dat_occ$x1[i] + re[re_idx[i]])) for (j in 1:5){ y[i,j] <- z[i]*rbinom(1,1, plogis(b[3] + b[4]*dat_p$x2[idx])) idx <- idx + 1 } } umf <- unmarkedFrameOccu(y=y, siteCovs=dat_occ, obsCovs=dat_p) # Fit model (fm <- occu(~x2 ~x1 + group, umf)) # Plot marginal effects of various covariates plotEffects(fm, "state", "x1") plotEffects(fm, "state", "group") plotEffects(fm, "det", "x2") # Get raw data used for a plot plotEffectsData(fm, "state", "group") # See code used by plotEffects so you can edit it yourself and customize the plot methods::getMethod("plotEffects", "unmarkedFit") ## End(Not run)
Response matrix of animals detected in five distance classes plus two covariates.
data(pointtran)
data(pointtran)
A data frame with 30 observations on the following 7 variables.
dc1
Counts in distance class 1 [0-5 m)
dc2
Counts in distance class 2 [5-10 m)
dc3
Counts in distance class 3 [10-15 m)
dc4
Counts in distance class 4 [15-20 m)
dc5
Counts in distance class 5 [20-25 m)
area
a numeric vector
habitat
a factor with levels A
B
C
data(pointtran) pointtran # Format for distsamp() ptUMF <- with(pointtran, { unmarkedFrameDS(y = cbind(dc1, dc2, dc3, dc4, dc5), siteCovs = data.frame(area, habitat), dist.breaks = seq(0, 25, by=5), survey = "point", unitsIn = "m") })
data(pointtran) pointtran # Format for distsamp() ptUMF <- with(pointtran, { unmarkedFrameDS(y = cbind(dc1, dc2, dc3, dc4, dc5), siteCovs = data.frame(area, habitat), dist.breaks = seq(0, 25, by=5), survey = "point", unitsIn = "m") })
Draw samples from the empirical Bayes posterior predictive distribution derived from unmarked models or ranef objects
## S4 method for signature 'unmarkedRanef' posteriorSamples(object, nsims=100, ...) ## S4 method for signature 'unmarkedFit' posteriorSamples(object, nsims=100, ...)
## S4 method for signature 'unmarkedRanef' posteriorSamples(object, nsims=100, ...) ## S4 method for signature 'unmarkedFit' posteriorSamples(object, nsims=100, ...)
object |
An object inheriting class |
nsims |
Number of draws to make from the posterior predictive distribution |
... |
Other arguments |
unmarkedPostSamples
object containing the draws from the
posterior predictive distribution. The draws are in the @samples
slot.
Ken Kellner [email protected]
# Simulate data under N-mixture model set.seed(4564) R <- 20 J <- 5 N <- rpois(R, 10) y <- matrix(NA, R, J) y[] <- rbinom(R*J, N, 0.5) # Fit model umf <- unmarkedFramePCount(y=y) fm <- pcount(~1 ~1, umf, K=50) # Estimates of conditional abundance distribution at each site (re <- ranef(fm)) #Draw from the posterior predictive distribution (ppd <- posteriorSamples(re, nsims=100))
# Simulate data under N-mixture model set.seed(4564) R <- 20 J <- 5 N <- rpois(R, 10) y <- matrix(NA, R, J) y[] <- rbinom(R*J, N, 0.5) # Fit model umf <- unmarkedFramePCount(y=y) fm <- pcount(~1 ~1, umf, K=50) # Estimates of conditional abundance distribution at each site (re <- ranef(fm)) #Draw from the posterior predictive distribution (ppd <- posteriorSamples(re, nsims=100))
This function uses a simulation-based approach to estimate power for parameters
in unmarked models. At a minimum, users must provide an unmarkedFrame
object describing the experimental design and a list of effect sizes for each
parameter in the model. See the unmarkedPower
vignette for more details and
examples.
## S4 method for signature 'unmarkedFrame' powerAnalysis(object, model = NULL, effects = NULL, alpha=0.05, nsim = 100, parallel = FALSE, nulls=NULL, ...) ## S4 method for signature 'list' powerAnalysis(object, model = NULL, effects = NULL, alpha=0.05, nsim = length(object), parallel = FALSE, nulls=NULL, ...)
## S4 method for signature 'unmarkedFrame' powerAnalysis(object, model = NULL, effects = NULL, alpha=0.05, nsim = 100, parallel = FALSE, nulls=NULL, ...) ## S4 method for signature 'list' powerAnalysis(object, model = NULL, effects = NULL, alpha=0.05, nsim = length(object), parallel = FALSE, nulls=NULL, ...)
object |
An |
model |
The model to use when the |
effects |
A list containing the desired effect sizes/parameter values
for which you want to estimate power. This list must follow a specific format.
There is one named entry in the list per submodel (e.g., occupancy, detection).
Each list element should be a numeric vector with length equal to the number
of parameters in that submodel. Parameter values are on the inverse link
scale. You can leave |
alpha |
Desired Type I error rate. |
nsim |
Number of simulations to conduct. |
parallel |
Logical; run simulations in parallel? |
nulls |
If provided, a list matching the structure of |
... |
Arguments to send to the fitting function for the model.
Most importantly this will include formula argument(s), but could also include
distributions, key functions, etc. For example, for simulating occupancy data,
you must also supply the argument |
unmarkedPower
object containing the results of the power analysis.
For information on interpretation of the output, see the power analysis
vignette.
Ken Kellner [email protected]
## Not run: # Create experimental design M <- 50 J <- 3 y <- matrix(NA, M, J) sc <- data.frame(x=rnorm(M)) umf <- unmarkedFrameOccu(y, siteCovs=sc) # Power analysis p <- powerAnalysis(umf, model=occu, formula=~1~x, effects = list(state = c(-0.2, 0.3), det = 0)) p summary(p, alpha=0.3) plot(p, ylim=c(-3, 3)) plot(p, ylim=c(-3, 3)) # Simulate your own datasets first and pass to power analysis cf <- list(state=c(0,1), det=0) s <- simulate(umf, model = occu, formula=~1~x, coefs=cf, nsim = 100) p2 <- powerAnalysis(s, model=occu, formula=~1~x, effects=cf) p2 ## End(Not run)
## Not run: # Create experimental design M <- 50 J <- 3 y <- matrix(NA, M, J) sc <- data.frame(x=rnorm(M)) umf <- unmarkedFrameOccu(y, siteCovs=sc) # Power analysis p <- powerAnalysis(umf, model=occu, formula=~1~x, effects = list(state = c(-0.2, 0.3), det = 0)) p summary(p, alpha=0.3) plot(p, ylim=c(-3, 3)) plot(p, ylim=c(-3, 3)) # Simulate your own datasets first and pass to power analysis cf <- list(state=c(0,1), det=0) s <- simulate(umf, model = occu, formula=~1~x, coefs=cf, nsim = 100) p2 <- powerAnalysis(s, model=occu, formula=~1~x, effects=cf) p2 ## End(Not run)
These methods return predicted values from unmarkedFit
objects, fitList
s,
or ranef
output. Most object types use the same method, but some have
unique options; see usage below.
## S4 method for signature 'unmarkedFit' predict(object, type, newdata, backTransform = TRUE, na.rm = TRUE, appendData = FALSE, level = 0.95, re.form = NULL, ...) ## S4 method for signature 'unmarkedFitOccuMulti' predict(object, type, newdata, level = 0.95, species = NULL, cond = NULL, nsims = 100, ...) ## S4 method for signature 'unmarkedFitOccuMS' predict(object, type, newdata, level = 0.95, nsims = 100, ...) ## S4 method for signature 'unmarkedFitList' predict(object, type, newdata = NULL, backTransform = TRUE, appendData = FALSE, level = 0.95) ## S4 method for signature 'unmarkedRanef' predict(object, func, nsims = 100, ...)
## S4 method for signature 'unmarkedFit' predict(object, type, newdata, backTransform = TRUE, na.rm = TRUE, appendData = FALSE, level = 0.95, re.form = NULL, ...) ## S4 method for signature 'unmarkedFitOccuMulti' predict(object, type, newdata, level = 0.95, species = NULL, cond = NULL, nsims = 100, ...) ## S4 method for signature 'unmarkedFitOccuMS' predict(object, type, newdata, level = 0.95, nsims = 100, ...) ## S4 method for signature 'unmarkedFitList' predict(object, type, newdata = NULL, backTransform = TRUE, appendData = FALSE, level = 0.95) ## S4 method for signature 'unmarkedRanef' predict(object, func, nsims = 100, ...)
object |
A |
type |
The submodel to predict values for, such as |
newdata |
Optional; provide a |
backTransform |
Logical. If |
na.rm |
Logical. Should missing values be removed when predicting from original data? |
appendData |
Logical. Should covariate data used for prediction be appended to the output data frame? |
level |
The confidence interval to calculate. For example |
re.form |
For |
species |
Which species (integer or species name as a string) should
predictions be calculated for? For |
cond |
Which species (integer or species name as a string) should
predictions be calculated conditional on? If conditional on species presence,
supply just the species name; if species absence, put a minus sign in front
of the species name, e.g. "-coyote". See |
nsims |
Number of bootstrap simulations to use. Relevant only for some predict types. |
func |
A function to apply to bootstrapped |
... |
Other arguments, currently ignored |
The predict
method for unmarkedRanef
objects generates
the empirical Bayes posterior predictive distribution for functions of the
random variables (latent abundance or occurrence).
In addition to the output object from ranef
, you must also supply a
custom function to argument func
. The function must take as input a matrix
with dimensions M x T, where M is the number of sites and T is the number of
primary periods (T=1 for single-season models). The output of this function should
be a vector or matrix containing the derived parameters of interest.
You may also manually set the number of draws from the posterior predictive
distribution with argument nsims
; the default is 100.
The output of predict
will be a vector or array with one more dimension
than the output of the function supplied func
, corresponding to the number
of draws requested nsims
. For example, if func
outputs a scalar, the output of predict
will be a vector with length
equal to nsims
. If func
outputs a 3x2 matrix, the output of
predict
will be an array with dimensions 3x2xnsims
.
See ranef
for an example.
Alternatively, you can use the posteriorSamples
function on the
ranef
output object to obtain the full posterior predictive distribution.
This is useful if you are having trouble designing your custom function or if
you want to obtain multiple different derived parameters from the same posterior
predictive distribution.
For most methods, a data.frame
with four columns: the
predicted values, standard errors, and lower and upper bounds. If
appendData = TRUE
covariate data are also in the output. For
unmarkedRanef
, an array is returned. See details.
Extract estimates and summary statistics of random effect terms from an
unmarkedFit
model or an unmarkedEstimate
.
## S4 method for signature 'unmarkedEstimate' randomTerms(object, level=0.95, ...) ## S4 method for signature 'unmarkedFit' randomTerms(object, type, level=0.95, ...)
## S4 method for signature 'unmarkedEstimate' randomTerms(object, level=0.95, ...) ## S4 method for signature 'unmarkedFit' randomTerms(object, type, level=0.95, ...)
object |
An object inheriting class |
level |
Significance level to use for confidence interval |
type |
If provided, return only random effect terms from the chosen submodel type (as a character string) |
... |
Other arguments |
data.frame
containing estimates, SEs, and confidence intervals
for random effect terms in the model.
Ken Kellner [email protected]
Estimate posterior distributions of the random variables (latent occupancy or abundance) using empirical Bayes methods. These methods return an object storing the posterior distributions of the latent variables at each site, and for each year (primary period) in the case of open population models. See unmarkedRanef-class and posteriorSamples for methods used to manipulate the returned object.
## S4 method for signature 'unmarkedFit' ranef(object, ...)
## S4 method for signature 'unmarkedFit' ranef(object, ...)
object |
A |
... |
Other arguments. For some abundance models you can specify |
Empirical Bayes methods can underestimate the variance of the posterior distribution because they do not account for uncertainty in the hyperparameters (lambda or psi). Eventually, we hope to add methods to account for the uncertainty of the hyperparameters.
Note also that the posterior mode appears to exhibit some bias as an estimator or abundance. Consider using the posterior mean instead, even though it will not be an integer in general. More simulation studies are needed to evaluate the performance of empirical Bayes methods for these models.
From Carlin and Louis (1996): “... the Bayesian approach to inference depends on a prior distribution for the model parameters. This prior can depend on unknown parameters which in turn may follow some second-stage prior. This sequence of parameters and priors consitutes a hierarchical model. The hierarchy must stop at some point, with all remaining prior parameters assumed known. Rather than make this assumption, the basic empirical Bayes approach uses the observed data to estimate these final stage parameters (or to estimate the Bayes rule), and proceeds as in a standard Bayesian analysis.”
Richard Chandler [email protected]
Laird, N.M. and T.A. Louis. 1987. Empirical Bayes confidence intervals based on bootstrap samples. Journal of the American Statistical Association 82:739–750.
Carlin, B.P and T.A Louis. 1996. Bayes and Empirical Bayes Methods for Data Analysis. Chapman and Hall/CRC.
Royle, J.A and R.M. Dorazio. 2008. Hierarchical Modeling and Inference in Ecology. Academic Press.
# Simulate data under N-mixture model set.seed(4564) R <- 20 J <- 5 N <- rpois(R, 10) y <- matrix(NA, R, J) y[] <- rbinom(R*J, N, 0.5) # Fit model umf <- unmarkedFramePCount(y=y) fm <- pcount(~1 ~1, umf, K=50) # Estimates of conditional abundance distribution at each site (re <- ranef(fm)) # Best Unbiased Predictors bup(re, stat="mean") # Posterior mean bup(re, stat="mode") # Posterior mode confint(re, level=0.9) # 90% CI # Plots plot(re, subset=site %in% c(1:10), layout=c(5, 2), xlim=c(-1,20)) # Compare estimates to truth sum(N) sum(bup(re)) # Extract all values in convenient formats post.df <- as(re, "data.frame") head(post.df) post.arr <- as(re, "array") #Generate posterior predictive distribution for a function #of random variables using predict() #First, create a function that operates on a vector of #length M (if you fit a single-season model) or a matrix of #dimensions MxT (if a dynamic model), where #M = nsites and T = n primary periods #Our function will generate mean abundance for sites 1-10 and sites 11-20 myfunc <- function(x){ #x will be length 20 since M=20 #Mean of first 10 sites group1 <- mean(x[1:10]) #Mean of sites 11-20 group2 <- mean(x[11:20]) #Naming elements of the output is optional but helpful return(c(group1=group1, group2=group2)) } #Get 100 samples of the values calculated in your function (pr <- predict(re, func=myfunc, nsims=100)) #Summarize posterior data.frame(mean=rowMeans(pr), se=apply(pr, 1, stats::sd), lower=apply(pr, 1, stats::quantile, 0.025), upper=apply(pr, 1, stats::quantile, 0.975)) #Alternatively, you can return the posterior predictive distribution #and run operations on it separately (ppd <- posteriorSamples(re, nsims=100))
# Simulate data under N-mixture model set.seed(4564) R <- 20 J <- 5 N <- rpois(R, 10) y <- matrix(NA, R, J) y[] <- rbinom(R*J, N, 0.5) # Fit model umf <- unmarkedFramePCount(y=y) fm <- pcount(~1 ~1, umf, K=50) # Estimates of conditional abundance distribution at each site (re <- ranef(fm)) # Best Unbiased Predictors bup(re, stat="mean") # Posterior mean bup(re, stat="mode") # Posterior mode confint(re, level=0.9) # 90% CI # Plots plot(re, subset=site %in% c(1:10), layout=c(5, 2), xlim=c(-1,20)) # Compare estimates to truth sum(N) sum(bup(re)) # Extract all values in convenient formats post.df <- as(re, "data.frame") head(post.df) post.arr <- as(re, "array") #Generate posterior predictive distribution for a function #of random variables using predict() #First, create a function that operates on a vector of #length M (if you fit a single-season model) or a matrix of #dimensions MxT (if a dynamic model), where #M = nsites and T = n primary periods #Our function will generate mean abundance for sites 1-10 and sites 11-20 myfunc <- function(x){ #x will be length 20 since M=20 #Mean of first 10 sites group1 <- mean(x[1:10]) #Mean of sites 11-20 group2 <- mean(x[11:20]) #Naming elements of the output is optional but helpful return(c(group1=group1, group2=group2)) } #Get 100 samples of the values calculated in your function (pr <- predict(re, func=myfunc, nsims=100)) #Summarize posterior data.frame(mean=rowMeans(pr), se=apply(pr, 1, stats::sd), lower=apply(pr, 1, stats::quantile, 0.025), upper=apply(pr, 1, stats::quantile, 0.975)) #Alternatively, you can return the posterior predictive distribution #and run operations on it separately (ppd <- posteriorSamples(re, nsims=100))
Calculates residuals from a model. The approach varies based
on the model type, but is usually the observed data (y
) minus the
fitted values. See fitted
.
## S4 method for signature 'unmarkedFit' residuals(object, ...)
## S4 method for signature 'unmarkedFit' residuals(object, ...)
object |
A |
... |
Other arguments, currently unused. |
In most cases, residuals
returns a matrix of residual values with the
same dimensions as the y
matrix (M x J or M x JT). For some models with
multiple observation processes or multiple species, such as gdistremoval
or occuMulti
, the output may be a named list of matrices.
Extract standard errors of parameter estimates from a fitted model.
A model prediction
A fitted model
Launch a Shiny app to test power under various scenarios. Requires the Shiny package to be installed.
shinyPower(object, ...)
shinyPower(object, ...)
object |
A template |
... |
Currently ignored |
No return value, called for its side effects.
When distance data are collected on line transects using sight distances and sight angles, they need to be converted to perpendicular distances before analysis.
sight2perpdist(sightdist, sightangle)
sight2perpdist(sightdist, sightangle)
sightdist |
Distance from observer |
sightangle |
Angle from center line. In degrees between 0 and 180. |
Perpendicular distance
round(sight2perpdist(10, c(0, 45, 90, 135, 180)))
round(sight2perpdist(10, c(0, 45, 90, 135, 180)))
Extract estimates and summary statistics of random effect standard
deviations from an unmarkedFit
model or an unmarkedEstimate
.
## S4 method for signature 'unmarkedEstimate' sigma(object, level=0.95, ...) ## S4 method for signature 'unmarkedFit' sigma(object, type, level=0.95, ...)
## S4 method for signature 'unmarkedEstimate' sigma(object, level=0.95, ...) ## S4 method for signature 'unmarkedFit' sigma(object, type, level=0.95, ...)
object |
An object inheriting class |
level |
Significance level to use for confidence interval |
type |
If provided, return only random effect SDs from the chosen submodel type (as a character string) |
... |
Other arguments |
data.frame
containing estimates, SEs, and confidence intervals
for random effect standard deviations in the model.
Ken Kellner [email protected]
Simulate data from a fitted model, or simulate an entirely new dataset
from a study design defined by an unmarkedFrame
.
## S4 method for signature 'unmarkedFit' simulate(object, nsim=1, seed=NULL, ...) ## S4 method for signature 'unmarkedFrame' simulate(object, nsim=1, seed=NULL, model = NULL, coefs = NULL, quiet = FALSE, ...)
## S4 method for signature 'unmarkedFit' simulate(object, nsim=1, seed=NULL, ...) ## S4 method for signature 'unmarkedFrame' simulate(object, nsim=1, seed=NULL, model = NULL, coefs = NULL, quiet = FALSE, ...)
object |
Fitted model or |
nsim |
Number of simulations |
seed |
Ignored |
model |
The model to use when |
coefs |
List with one element per submodel. Each list element should be
named with the corresponding submodel, and should be a numeric vector of
parameter values to use for that submodel when simulating the dataset.
Note that parameter values should be on the inverse link scale.
The number of parameter values in the vector depends on the model specified,
covariates, etc. If you are not sure how to specify this list, set
|
quiet |
If TRUE, don't print informational messages. |
... |
Used only for the |
## Not run: # Simulation of an occupancy dataset from scratch # First create an unmarkedFrame with the correct design M <- 300 # number of sites J <- 5 # number of occasions # The values in the y-matrix don't matter as they will be simulated # We can supply them as all NAs y <- matrix(NA, M, J) # Site covariate x <- rnorm(M) # Create unmarkedFrame umf <- unmarkedFrameOccu(y = y, siteCovs = data.frame(x = x)) # Must specify model = occu since unmarkedFrameOccu is also used for occuRN # the formula species the specific model structure we want to simulate # If we don't specify coefs, unmarked will generate a template you can copy and use simulate(umf, model = occu, formula = ~1~x) # Now set coefs # Here we imply a mean occupancy and mean detection of 0.5 # (corresponding to values of 0 on the inverse link scale) and a positive effect of x s <- simulate(umf, model = occu, formula = ~1~x, coefs = list(state = c(0,0.3), det = 0)) head(s[[1]]) occu(~1~x, s[[1]]) # For some models we can also include a random effect # add a factor covariate umf@siteCovs$x2 <- factor(sample(letters[1:10], M, replace=TRUE)) # The final value in coefs now represents the random effect SD for x2 s <- simulate(umf, model = occu, formula = ~1~x+(1|x2), coefs = list(state = c(0,0.3, 1), det = 0)) head(s[[1]]) occu(~1~x+(1|x2), s[[1]]) # Here's a more complicated example simulating a gdistsamp dataset # using a negative binomial distribution M <- 100 J <- 3 T <- 2 y <- matrix(NA, M, J*T) umf2 <- unmarkedFrameGDS(y=y, siteCovs=data.frame(x=rnorm(M)), dist.breaks = c(0, 10, 20, 30), unitsIn='m', numPrimary = T, survey="point") cf <- list(lambda=c(1, 0.3), phi=0, det=c(log(20), 0), alpha=log(1)) # Note we now also supply another argument mixture="NB" to ... s2 <- simulate(umf2, coefs=cf, lambdaformula=~x, phiformula=~1, pformula=~x, mixture="NB") head(s2[[1]]) gdistsamp(~x, ~1, ~x, s2[[1]], mixture="NB") ## End(Not run)
## Not run: # Simulation of an occupancy dataset from scratch # First create an unmarkedFrame with the correct design M <- 300 # number of sites J <- 5 # number of occasions # The values in the y-matrix don't matter as they will be simulated # We can supply them as all NAs y <- matrix(NA, M, J) # Site covariate x <- rnorm(M) # Create unmarkedFrame umf <- unmarkedFrameOccu(y = y, siteCovs = data.frame(x = x)) # Must specify model = occu since unmarkedFrameOccu is also used for occuRN # the formula species the specific model structure we want to simulate # If we don't specify coefs, unmarked will generate a template you can copy and use simulate(umf, model = occu, formula = ~1~x) # Now set coefs # Here we imply a mean occupancy and mean detection of 0.5 # (corresponding to values of 0 on the inverse link scale) and a positive effect of x s <- simulate(umf, model = occu, formula = ~1~x, coefs = list(state = c(0,0.3), det = 0)) head(s[[1]]) occu(~1~x, s[[1]]) # For some models we can also include a random effect # add a factor covariate umf@siteCovs$x2 <- factor(sample(letters[1:10], M, replace=TRUE)) # The final value in coefs now represents the random effect SD for x2 s <- simulate(umf, model = occu, formula = ~1~x+(1|x2), coefs = list(state = c(0,0.3, 1), det = 0)) head(s[[1]]) occu(~1~x+(1|x2), s[[1]]) # Here's a more complicated example simulating a gdistsamp dataset # using a negative binomial distribution M <- 100 J <- 3 T <- 2 y <- matrix(NA, M, J*T) umf2 <- unmarkedFrameGDS(y=y, siteCovs=data.frame(x=rnorm(M)), dist.breaks = c(0, 10, 20, 30), unitsIn='m', numPrimary = T, survey="point") cf <- list(lambda=c(1, 0.3), phi=0, det=c(log(20), 0), alpha=log(1)) # Note we now also supply another argument mixture="NB" to ... s2 <- simulate(umf2, coefs=cf, lambdaformula=~x, phiformula=~1, pformula=~x, mixture="NB") head(s2[[1]]) gdistsamp(~x, ~1, ~x, s2[[1]], mixture="NB") ## End(Not run)
Compute the sum of squared residuals for an unmarked fit object. This
is useful for a parboot
.
SSE(fit, ...)
SSE(fit, ...)
fit |
An unmarked fit object. |
... |
Additional arguments to be passed to statistic |
A numeric value for the models SSE.
Spatially-referenced data on elevation, forest cover, and water at a 1km-sq resolution.
data(Switzerland)
data(Switzerland)
A data frame with 42275 observations on the following 5 variables.
x
Easting (m)
y
Northing (m)
elevation
a numeric vector (m)
forest
a numeric vector (percent cover)
water
a numeric vector (percent cover)
Forest and water coverage (in percent area) was computed using the 1992-97 landcover dataset of the Swiss Federal Statistical Office (http://www.bfs.admin.ch). Median elevation (in metres) was computed using a median aggregation of the digital elevation model of the Swiss Federal Statistical Office.
x and y are the coordinates of the center of each 1km2 pixel.
The coordinate reference system intentionally not specified.
These data can only be used for non-profit projects. Otherwise, written permission must be obtained from the Swiss Federal Statistical Office
Swiss Federal Statistical Office (http://www.bfs.admin.ch)
library(lattice) data(Switzerland) str(Switzerland) levelplot(elevation ~ x + y, Switzerland, aspect="iso", col.regions=terrain.colors(100)) ## Not run: library(raster) el.r <- rasterFromXYZ(Switzerland[,c("x","y","elevation")], crs = "+proj=somerc +lat_0=46.95240555555556 +lon_0=7.439583333333333 +k_0=1 +x_0=600000 +y_0=200000 +ellps=bessel +towgs84=674.374,15.056,405.346,0,0,0,0 +units=m +no_defs") plot(el.r) spplot(el.r) ## End(Not run)
library(lattice) data(Switzerland) str(Switzerland) levelplot(elevation ~ x + y, Switzerland, aspect="iso", col.regions=terrain.colors(100)) ## Not run: library(raster) el.r <- rasterFromXYZ(Switzerland[,c("x","y","elevation")], crs = "+proj=somerc +lat_0=46.95240555555556 +lon_0=7.439583333333333 +k_0=1 +x_0=600000 +y_0=200000 +ellps=bessel +towgs84=674.374,15.056,405.346,0,0,0,0 +units=m +no_defs") plot(el.r) spplot(el.r) ## End(Not run)
Contains parameter estimates, covariance matrix, and metadata
Creating these objects is done internally not by users.
name
:Object of class "character"
storing parameter
names
short.name
:Object of class "character"
storing
abbreviated parameter names
estimates
:Object of class "numeric"
covMat
:Object of class "matrix"
covMatBS
:Object of class "matrix"
fixed
:Object of class "numeric"
invlink
:Object of class "character"
invlinkGrad
:Object of class "character"
randomVarInfo
:Object of class "list"
signature(obj = "unmarkedEstimate")
signature(object = "unmarkedEstimate")
signature(object = "unmarkedEstimate")
signature(obj = "unmarkedEstimate",
coefficients = "matrixOrVector")
signature(obj = "unmarkedEstimate")
signature(object = "unmarkedEstimate")
signature(object = "unmarkedEstimate")
These methods are typically called within a call to a method for
unmarkedFit-class
showClass("unmarkedEstimate")
showClass("unmarkedEstimate")
Class to hold multiple unmarkedEstimates in an
unmarkedFit
estimates
:A "list"
of models.
Contains fitted model information which can be manipulated or extracted using the methods described below.
signature(x = "unmarkedFit", i = "ANY", j = "ANY",
drop = "ANY")
: extract one of names(obj), eg 'state' or 'det'
signature(obj = "unmarkedFit")
: back-transform
parameters to original scale when no covariate effects are modeled
signature(object = "unmarkedFit")
: returns parameter
estimates. type can be one of names(obj), eg 'state' or 'det'.
If altNames=TRUE estimate names are more specific.
signature(object = "unmarkedFit")
: Returns confidence
intervals. Must specify type and method (either "normal" or "profile")
signature(object = "unmarkedFit")
: extracts data
signature(object = "unmarkedFit")
: calculates and extracts
expected false positive detection probabilities
signature(object = "unmarkedFit")
: calculates and extracts
expected probabilities a true positive detection was classified as certain
signature(obj = "unmarkedFit",
coefficients = "matrixOrVector")
: Returns estimate and SE on original
scale when covariates are present
signature(x = "unmarkedFit")
: Names of parameter levels
signature(object = "unmarkedFit")
: Parametric
bootstrapping method to assess goodness-of-fit
signature(x = "unmarkedFit", y = "missing")
: Plots
expected vs. observed values
signature(fitted = "unmarkedFit")
: used by confint
method='profile'
signature(object = "unmarkedFit")
: returns number
of sites in sample
signature(obj = "unmarkedFit")
: returns standard errors
signature(object = "unmarkedFit")
: results with more
details
signature(object = "unmarkedFit")
: refit model with
changes to one or more arguments
signature(object = "unmarkedFit")
: returns
variance-covariance matrix
signature(object="unmarkedFitColExt")
:
Returns the smoothed trajectory from a colonization-extinction
model fit. Takes additional logical argument mean which specifies
whether or not to return the average over sites.
signature(object="unmarkedFitColExt")
:
Returns the projected trajectory from a colonization-extinction
model fit. Takes additional logical argument mean which specifies
whether or not to return the average over sites.
signature(object="unmarkedFit")
:
Returns the log-likelihood.
signature(m1="unmarkedFit", m2="unmarkedFit")
:
Returns the chi-squared statistic, degrees-of-freedom, and p-value from
a Likelihood Ratio Test.
This is a superclass with child classes for each fit type
showClass("unmarkedFit") # Format removal data for multinomPois data(ovendata) ovenFrame <- unmarkedFrameMPois(y = ovendata.list$data, siteCovs = as.data.frame(scale(ovendata.list$covariates[,-1])), type = "removal") # Fit a couple of models (fm1 <- multinomPois(~ 1 ~ ufc + trba, ovenFrame)) summary(fm1) # Apply a bunch of methods to the fitted model # Look at the different parameter types names(fm1) fm1['state'] fm1['det'] # Coefficients from abundance part of the model coef(fm1, type='state') # Variance-covariance matrix vcov(fm1, type='state') # Confidence intervals using profiled likelihood confint(fm1, type='state', method='profile') # Expected values fitted(fm1) # Original data getData(fm1) # Detection probabilities getP(fm1) # log-likelihood logLik(fm1) # Back-transform detection probability to original scale # backTransform only works on models with no covariates or # in conjunction with linearComb (next example) backTransform(fm1, type ='det') # Predicted abundance at specified covariate values (lc <- linearComb(fm1, c(Int = 1, ufc = 0, trba = 0), type='state')) backTransform(lc) # Assess goodness-of-fit parboot(fm1) plot(fm1) # Predict abundance at specified covariate values. newdat <- data.frame(ufc = 0, trba = seq(-1, 1, length=10)) predict(fm1, type='state', newdata=newdat) # Fit a new model without covariates (fmNull <- update(fm1, formula = ~1 ~1))
showClass("unmarkedFit") # Format removal data for multinomPois data(ovendata) ovenFrame <- unmarkedFrameMPois(y = ovendata.list$data, siteCovs = as.data.frame(scale(ovendata.list$covariates[,-1])), type = "removal") # Fit a couple of models (fm1 <- multinomPois(~ 1 ~ ufc + trba, ovenFrame)) summary(fm1) # Apply a bunch of methods to the fitted model # Look at the different parameter types names(fm1) fm1['state'] fm1['det'] # Coefficients from abundance part of the model coef(fm1, type='state') # Variance-covariance matrix vcov(fm1, type='state') # Confidence intervals using profiled likelihood confint(fm1, type='state', method='profile') # Expected values fitted(fm1) # Original data getData(fm1) # Detection probabilities getP(fm1) # log-likelihood logLik(fm1) # Back-transform detection probability to original scale # backTransform only works on models with no covariates or # in conjunction with linearComb (next example) backTransform(fm1, type ='det') # Predicted abundance at specified covariate values (lc <- linearComb(fm1, c(Int = 1, ufc = 0, trba = 0), type='state')) backTransform(lc) # Assess goodness-of-fit parboot(fm1) plot(fm1) # Predict abundance at specified covariate values. newdat <- data.frame(ufc = 0, trba = seq(-1, 1, length=10)) predict(fm1, type='state', newdata=newdat) # Fit a new model without covariates (fmNull <- update(fm1, formula = ~1 ~1))
Class to hold multiple fitted models from one of
unmarked
's fitting functions
Objects can be created by using the fitList
function.
fits
:A "list"
of models.
signature(object = "unmarkedFitList")
:
Extract coefficients
signature(object = "unmarkedFitList")
:
Extract standard errors
signature(object = "unmarkedFitList")
:
Model selection
signature(object = "unmarkedFitList")
:
Model-averaged prediction
Model-averaging regression coefficients is intentionally not implemented.
showClass("unmarkedFitList") data(linetran) (dbreaksLine <- c(0, 5, 10, 15, 20)) lengths <- linetran$Length * 1000 ltUMF <- with(linetran, { unmarkedFrameDS(y = cbind(dc1, dc2, dc3, dc4), siteCovs = data.frame(Length, area, habitat), dist.breaks = dbreaksLine, tlength = lengths, survey = "line", unitsIn = "m") }) fm1 <- distsamp(~ 1 ~1, ltUMF) fm2 <- distsamp(~ area ~1, ltUMF) fm3 <- distsamp( ~ 1 ~area, ltUMF) fl <- fitList(Null=fm1, A.=fm2, .A=fm3) fl coef(fl) SE(fl) ms <- modSel(fl, nullmod="Null") ms
showClass("unmarkedFitList") data(linetran) (dbreaksLine <- c(0, 5, 10, 15, 20)) lengths <- linetran$Length * 1000 ltUMF <- with(linetran, { unmarkedFrameDS(y = cbind(dc1, dc2, dc3, dc4), siteCovs = data.frame(Length, area, habitat), dist.breaks = dbreaksLine, tlength = lengths, survey = "line", unitsIn = "m") }) fm1 <- distsamp(~ 1 ~1, ltUMF) fm2 <- distsamp(~ area ~1, ltUMF) fm3 <- distsamp( ~ 1 ~area, ltUMF) fl <- fitList(Null=fm1, A.=fm2, .A=fm3) fl coef(fl) SE(fl) ms <- modSel(fl, nullmod="Null") ms
Constructor for unmarkedFrames.
unmarkedFrame(y, siteCovs=NULL, obsCovs=NULL, obsToY)
unmarkedFrame(y, siteCovs=NULL, obsCovs=NULL, obsToY)
y |
An MxJ matrix of the observed measured data, where M is the number of sites and J is the maximum number of observations per site. |
siteCovs |
A |
obsCovs |
Either a named list of |
obsToY |
optional matrix specifying relationship between observation-level covariates and response matrix |
unmarkedFrame is the S4 class that holds data structures to be passed to the model-fitting functions in unmarked.
An unmarkedFrame contains the observations (y
), covariates
measured at the observation level (obsCovs
), and covariates
measured at the site level (siteCovs
).
For a data set with M sites and J observations at each site, y is an
M x J matrix. obsCovs
and siteCovs
are both data frames
(see data.frame). siteCovs
has M rows so that each row
contains the covariates for the corresponding sites.
obsCovs
has M*obsNum rows so that each covariates is ordered by
site first, then observation number. Missing values are coded with
NA
in any of y, siteCovs, or obsCovs.
Additionally, unmarkedFrames contain metadata: obsToY.
obsToY is a matrix describing relationship between response matrix and
observation-level covariates. Generally this does not need to be
supplied by the user; however, it may be needed when using
multinomPois
. For example, double observer sampling, y
has 3 columns corresponding the observer 1, observer 2, and both, but
there were only two independent observations.
In this situation, y has 3 columns, but obsToY must be specified.
Several child classes of unmarkedFrame
require addional
metadata. For example, unmarkedFrameDS
is used to organize
distsance sampling data for the distsamp
function, and
it has arguments dist.breaks, tlength, survey, and unitsIn, which
specify the distance interval cut points, transect lengths, "line" or
"point" transect, and units of measure, respectively.
All site-level covariates are automatically copied to obsCovs so that site level covariates are available at the observation level.
an unmarkedFrame object
unmarkedFrame-class
,
unmarkedFrameOccu
, unmarkedFramePCount
,
unmarkedFrameDS
# Set up data for pcount() data(mallard) mallardUMF <- unmarkedFramePCount(mallard.y, siteCovs = mallard.site, obsCovs = mallard.obs) summary(mallardUMF) # Set up data for occu() data(frogs) pferUMF <- unmarkedFrameOccu(pfer.bin) # Set up data for distsamp() data(linetran) ltUMF <- with(linetran, { unmarkedFrameDS(y = cbind(dc1, dc2, dc3, dc4), siteCovs = data.frame(Length, area, habitat), dist.breaks = c(0, 5, 10, 15, 20), tlength = linetran$Length * 1000, survey = "line", unitsIn = "m") }) summary(ltUMF) # Set up data for multinomPois() data(ovendata) ovenFrame <- unmarkedFrameMPois(ovendata.list$data, siteCovs=as.data.frame(scale(ovendata.list$covariates[,-1])), type = "removal") summary(ovenFrame) ## Not run: # Set up data for colext() frogUMF <- formatMult(masspcru) summary(frogUMF) ## End(Not run)
# Set up data for pcount() data(mallard) mallardUMF <- unmarkedFramePCount(mallard.y, siteCovs = mallard.site, obsCovs = mallard.obs) summary(mallardUMF) # Set up data for occu() data(frogs) pferUMF <- unmarkedFrameOccu(pfer.bin) # Set up data for distsamp() data(linetran) ltUMF <- with(linetran, { unmarkedFrameDS(y = cbind(dc1, dc2, dc3, dc4), siteCovs = data.frame(Length, area, habitat), dist.breaks = c(0, 5, 10, 15, 20), tlength = linetran$Length * 1000, survey = "line", unitsIn = "m") }) summary(ltUMF) # Set up data for multinomPois() data(ovendata) ovenFrame <- unmarkedFrameMPois(ovendata.list$data, siteCovs=as.data.frame(scale(ovendata.list$covariates[,-1])), type = "removal") summary(ovenFrame) ## Not run: # Set up data for colext() frogUMF <- formatMult(masspcru) summary(frogUMF) ## End(Not run)
Methods for manipulating, summarizing and viewing unmarkedFrames
Objects can be created by calls to the constructor function
unmarkedFrame
. These objects are passed to the data
argument of the fitting functions.
y
:Object of class "matrix"
obsCovs
:Object of class "optionalDataFrame"
siteCovs
:Object of class "optionalDataFrame"
obsToY
:Object of class "optionalMatrix"
signature(x = "unmarkedFrame", i = "numeric", j =
"missing", drop = "missing")
: ...
signature(x = "unmarkedFrame", i = "numeric", j =
"numeric", drop = "missing")
: ...
signature(x = "unmarkedFrame", i = "missing", j =
"numeric", drop = "missing")
: ...
signature(object = "unmarkedFrame")
: extract y
matrix
signature(object = "unmarkedFrame")
: extract M
signature(object = "unmarkedFrame")
: extract
ncol(y)
signature(object = "unmarkedFrame")
: extract
observation-level covariates
signature(object = "unmarkedFrame")
: add or
modify observation-level covariates
signature(object = "unmarkedFrame")
: extract
number of observations
signature(object = "unmarkedFrame")
:
signature(object = "unmarkedFrame")
: ...
signature(x = "unmarkedFrame", y = "missing")
:
visualize response variable.
Takes additional argument panels
which specifies how many
panels data should be split over.
signature(object = "unmarkedFrame")
: view data as
data.frame
signature(object = "unmarkedFrame")
: extract
site-level covariates
signature(object = "unmarkedFrame")
: add or
modify site-level covariates
signature(object = "unmarkedFrame")
: summarize
data
signature(object = "unmarkedFrameOccuCOP")
: extract L
This is a superclass with child classes for each fitting function.
unmarkedFrame
, unmarkedFit
,
unmarked-package
# List all the child classes of unmarkedFrame showClass("unmarkedFrame") # Organize data for pcount() data(mallard) mallardUMF <- unmarkedFramePCount(mallard.y, siteCovs = mallard.site, obsCovs = mallard.obs) # Vizualize it plot(mallardUMF) mallardUMF # Summarize it summary(mallardUMF) str(mallardUMF) numSites(mallardUMF) numY(mallardUMF) obsNum(mallardUMF) # Extract components of data getY(mallardUMF) obsCovs(mallardUMF) obsCovs(mallardUMF, matrices = TRUE) siteCovs(mallardUMF) mallardUMF[1:5,] # First 5 rows in wide format mallardUMF[,1:2] # First 2 observations
# List all the child classes of unmarkedFrame showClass("unmarkedFrame") # Organize data for pcount() data(mallard) mallardUMF <- unmarkedFramePCount(mallard.y, siteCovs = mallard.site, obsCovs = mallard.obs) # Vizualize it plot(mallardUMF) mallardUMF # Summarize it summary(mallardUMF) str(mallardUMF) numSites(mallardUMF) numY(mallardUMF) obsNum(mallardUMF) # Extract components of data getY(mallardUMF) obsCovs(mallardUMF) obsCovs(mallardUMF, matrices = TRUE) siteCovs(mallardUMF) mallardUMF[1:5,] # First 5 rows in wide format mallardUMF[,1:2] # First 2 observations
Organizes count data along with the covariates and metadata.
This S4 class is required by the data argument of distsamp
unmarkedFrameDS(y, siteCovs=NULL, dist.breaks, tlength, survey, unitsIn)
unmarkedFrameDS(y, siteCovs=NULL, dist.breaks, tlength, survey, unitsIn)
y |
An RxJ matrix of count data, where R is the number of sites (transects) and J is the number of distance classes. |
siteCovs |
A |
dist.breaks |
vector of distance cut-points delimiting the distance classes. It must be of length J+1. |
tlength |
A vector of length R containing the trasect lengths. This is ignored when survey="point". |
survey |
Either "point" or "line" for point- and line-transects. |
unitsIn |
Either "m" or "km" defining the measurement units for
both |
.
unmarkedFrameDS is the S4 class that holds data to be passed
to the distsamp
model-fitting function.
an object of class unmarkedFrameDS
If you have continuous distance data, they must be "binned" into discrete distance classes, which are delimited by dist.breaks.
Royle, J. A., D. K. Dawson, and S. Bates (2004) Modeling abundance effects in distance sampling. Ecology 85, pp. 1591-1597.
unmarkedFrame-class
, unmarkedFrame
,
distsamp
# Fake data R <- 4 # number of sites J <- 3 # number of distance classes db <- c(0, 10, 20, 30) # distance break points y <- matrix(c( 5,4,3, # 5 detections in 0-10 distance class at this transect 0,0,0, 2,1,1, 1,1,0), nrow=R, ncol=J, byrow=TRUE) y site.covs <- data.frame(x1=1:4, x2=factor(c('A','B','A','B'))) site.covs umf <- unmarkedFrameDS(y=y, siteCovs=site.covs, dist.breaks=db, survey="point", unitsIn="m") # organize data umf # look at data summary(umf) # summarize fm <- distsamp(~1 ~1, umf) # fit a model
# Fake data R <- 4 # number of sites J <- 3 # number of distance classes db <- c(0, 10, 20, 30) # distance break points y <- matrix(c( 5,4,3, # 5 detections in 0-10 distance class at this transect 0,0,0, 2,1,1, 1,1,0), nrow=R, ncol=J, byrow=TRUE) y site.covs <- data.frame(x1=1:4, x2=factor(c('A','B','A','B'))) site.covs umf <- unmarkedFrameDS(y=y, siteCovs=site.covs, dist.breaks=db, survey="point", unitsIn="m") # organize data umf # look at data summary(umf) # summarize fm <- distsamp(~1 ~1, umf) # fit a model
Organizes distance sampling data and experimental design information
from multiple primary periods along with associated covariates. This S4 class
is required by the data argument of distsampOpen
unmarkedFrameDSO(y, siteCovs=NULL, yearlySiteCovs=NULL, numPrimary, primaryPeriod, dist.breaks, tlength, survey, unitsIn)
unmarkedFrameDSO(y, siteCovs=NULL, yearlySiteCovs=NULL, numPrimary, primaryPeriod, dist.breaks, tlength, survey, unitsIn)
y |
An MxJT matrix of the repeated count data, where M is the number of sites (i.e., points or transects), J is the number of distance classes and T is the maximum number of primary sampling periods per site |
siteCovs |
A |
yearlySiteCovs |
Either a named list of MxT |
numPrimary |
Maximum number of observed primary periods for each site |
primaryPeriod |
An MxJT matrix of integers indicating the primary period of each observation |
dist.breaks |
vector of distance cut-points delimiting the distance classes. It must be of length J+1 |
tlength |
A vector of length R containing the transect lengths. This is ignored when survey="point" |
survey |
Either "point" or "line" for point- and line-transects |
unitsIn |
Either "m" or "km" defining the measurement units for
both |
unmarkedFrameDSO
is the S4 class that holds data to be passed
to the distsampOpen
model-fitting function. Unlike
most unmarked functions, obsCovs
cannot be supplied.
If you have continuous distance data, they must be "binned" into discrete distance classes, which are delimited by dist.breaks.
When gamma or omega are modeled using year-specific covariates, the covariate data for the final year will be ignored; however, they must be supplied.
If the time gap between primary periods is not constant, an M by T
matrix of integers should be supplied using the primaryPeriod
argument.
an object of class unmarkedFrameDSO
unmarkedFrame-class
, unmarkedFrame
,
distsampOpen
# Fake data M <- 4 # number of sites J <- 3 # number of distance classes T <- 2 # number of primary periods db <- c(0, 10, 20, 30) # distance break points y <- matrix(c( 5,4,3, 6,2,1, # In bin 1: 5 detections in primary period 1, 6 in period 2 0,0,0, 0,1,0, 2,1,1, 0,0,0, 1,1,0, 1,1,1), nrow=M, ncol=J*T, byrow=TRUE) y # Primary periods of observations # In this case there are no gaps primPer <- matrix(as.integer(c( 1,2, 1,2, 1,2, 1,2)), nrow=M, ncol=T, byrow=TRUE) #Site covs: M rows and 1 column per covariate site.covs <- data.frame(x1=1:4, x2=factor(c('A','B','A','B'))) site.covs #Yearly site covs on gamma/omega ysc <- list( x3 = matrix(c( 1,2, 1,2, 1,2, 1,2), nrow=M, ncol=T, byrow=TRUE)) umf <- unmarkedFrameDSO(y=y, siteCovs=site.covs, yearlySiteCovs=ysc, numPrimary=T, primaryPeriod=primPer, dist.breaks=db, survey="point", unitsIn="m") umf # look at data summary(umf) # summarize
# Fake data M <- 4 # number of sites J <- 3 # number of distance classes T <- 2 # number of primary periods db <- c(0, 10, 20, 30) # distance break points y <- matrix(c( 5,4,3, 6,2,1, # In bin 1: 5 detections in primary period 1, 6 in period 2 0,0,0, 0,1,0, 2,1,1, 0,0,0, 1,1,0, 1,1,1), nrow=M, ncol=J*T, byrow=TRUE) y # Primary periods of observations # In this case there are no gaps primPer <- matrix(as.integer(c( 1,2, 1,2, 1,2, 1,2)), nrow=M, ncol=T, byrow=TRUE) #Site covs: M rows and 1 column per covariate site.covs <- data.frame(x1=1:4, x2=factor(c('A','B','A','B'))) site.covs #Yearly site covs on gamma/omega ysc <- list( x3 = matrix(c( 1,2, 1,2, 1,2, 1,2), nrow=M, ncol=T, byrow=TRUE)) umf <- unmarkedFrameDSO(y=y, siteCovs=site.covs, yearlySiteCovs=ysc, numPrimary=T, primaryPeriod=primPer, dist.breaks=db, survey="point", unitsIn="m") umf # look at data summary(umf) # summarize
gdistremoval
Organize data for the combined distance and removal point-count model of
Amundson et al. (2014) fit by gdistremoval
unmarkedFrameGDR(yDistance, yRemoval, numPrimary=1, siteCovs=NULL, obsCovs=NULL, yearlySiteCovs=NULL, dist.breaks, unitsIn, period.lengths=NULL)
unmarkedFrameGDR(yDistance, yRemoval, numPrimary=1, siteCovs=NULL, obsCovs=NULL, yearlySiteCovs=NULL, dist.breaks, unitsIn, period.lengths=NULL)
yDistance |
An MxTJ matrix of count data, where M is the number of sites (points), T is the number of primary periods (can be 1) and J is the number of distance classes |
yRemoval |
An MxTJ matrix of count data, where M is the number of sites (points), T is the number of primary periods (can be 1) and J is the number of time removal periods |
numPrimary |
Number of primary periods in the dataset |
siteCovs |
A |
obsCovs |
A |
yearlySiteCovs |
A |
dist.breaks |
vector of distance cut-points delimiting the distance classes. It must be of length J+1 |
unitsIn |
Either "m" or "km" defining the measurement units for
|
period.lengths |
Optional vector of time lengths of each removal period. Each value in the vector must be a positive integer, and the total length of the vector must be equal to the number of removal periods J. If this is not provided (the default), then all periods are assumed to have an equal length of 1 time unit |
unmarkedFrameGDR is the S4 class that holds data to be passed
to the gdistremoval
model-fitting function.
an object of class unmarkedFrameGDR
If you have continuous distance data, they must be "binned" into discrete distance classes, which are delimited by dist.breaks.
Ken Kellner [email protected]
Amundson, C.L., Royle, J.A. and Handel, C.M., 2014. A hierarchical model combining distance sampling and time removal to estimate detection probability during avian point counts. The Auk 131: 476-494.
unmarkedFrame-class
, unmarkedFrame
,
gdistremoval
Organizes count data and experimental design information
from multiple primary periods along with associated covariates. This S4 class
is required by the data argument of multmixOpen
unmarkedFrameMMO(y, siteCovs=NULL, obsCovs=NULL, yearlySiteCovs=NULL, numPrimary, type, primaryPeriod)
unmarkedFrameMMO(y, siteCovs=NULL, obsCovs=NULL, yearlySiteCovs=NULL, numPrimary, type, primaryPeriod)
y |
An MxJT matrix of the repeated count data, where M is the number of sites (i.e., points or transects), J is the number of distance classes and T is the maximum number of primary sampling periods per site |
siteCovs |
A |
obsCovs |
Either a named list of |
yearlySiteCovs |
Either a named list of MxT |
numPrimary |
Maximum number of observed primary periods for each site |
type |
Either "removal" for removal sampling, "double" for standard double observer sampling, or "depDouble" for dependent double observer sampling |
primaryPeriod |
An MxJT matrix of integers indicating the primary period of each observation |
unmarkedFrameMMO
is the S4 class that holds data to be passed
to the multmixOpen
model-fitting function.
Options for the detection process (type
) include equal-interval removal
sampling ("removal"
), double observer sampling ("double"
), or
dependent double-observer sampling ("depDouble"
). Note
that unlike the related functions multinomPois
and
gmultmix
, custom functions for the detection process (i.e.,
piFun
s) are not supported. To request additional options contact the author.
When gamma or omega are modeled using year-specific covariates, the covariate data for the final year will be ignored; however, they must be supplied.
If the time gap between primary periods is not constant, an M by T
matrix of integers should be supplied using the primaryPeriod
argument.
an object of class unmarkedFrameMMO
unmarkedFrame-class
, unmarkedFrame
,
multmixOpen
#Generate some data set.seed(123) lambda=4; gamma=0.5; omega=0.8; p=0.5 M <- 100; T <- 5 y <- array(NA, c(M, 3, T)) N <- matrix(NA, M, T) S <- G <- matrix(NA, M, T-1) for(i in 1:M) { N[i,1] <- rpois(1, lambda) y[i,1,1] <- rbinom(1, N[i,1], p) # Observe some Nleft1 <- N[i,1] - y[i,1,1] # Remove them y[i,2,1] <- rbinom(1, Nleft1, p) # ... Nleft2 <- Nleft1 - y[i,2,1] y[i,3,1] <- rbinom(1, Nleft2, p) for(t in 1:(T-1)) { S[i,t] <- rbinom(1, N[i,t], omega) G[i,t] <- rpois(1, gamma) N[i,t+1] <- S[i,t] + G[i,t] y[i,1,t+1] <- rbinom(1, N[i,t+1], p) # Observe some Nleft1 <- N[i,t+1] - y[i,1,t+1] # Remove them y[i,2,t+1] <- rbinom(1, Nleft1, p) # ... Nleft2 <- Nleft1 - y[i,2,t+1] y[i,3,t+1] <- rbinom(1, Nleft2, p) } } y=matrix(y, M) #Create some random covariate data sc <- data.frame(x1=rnorm(100)) #Create unmarked frame umf <- unmarkedFrameMMO(y=y, numPrimary=5, siteCovs=sc, type="removal") summary(umf)
#Generate some data set.seed(123) lambda=4; gamma=0.5; omega=0.8; p=0.5 M <- 100; T <- 5 y <- array(NA, c(M, 3, T)) N <- matrix(NA, M, T) S <- G <- matrix(NA, M, T-1) for(i in 1:M) { N[i,1] <- rpois(1, lambda) y[i,1,1] <- rbinom(1, N[i,1], p) # Observe some Nleft1 <- N[i,1] - y[i,1,1] # Remove them y[i,2,1] <- rbinom(1, Nleft1, p) # ... Nleft2 <- Nleft1 - y[i,2,1] y[i,3,1] <- rbinom(1, Nleft2, p) for(t in 1:(T-1)) { S[i,t] <- rbinom(1, N[i,t], omega) G[i,t] <- rpois(1, gamma) N[i,t+1] <- S[i,t] + G[i,t] y[i,1,t+1] <- rbinom(1, N[i,t+1], p) # Observe some Nleft1 <- N[i,t+1] - y[i,1,t+1] # Remove them y[i,2,t+1] <- rbinom(1, Nleft1, p) # ... Nleft2 <- Nleft1 - y[i,2,t+1] y[i,3,t+1] <- rbinom(1, Nleft2, p) } } y=matrix(y, M) #Create some random covariate data sc <- data.frame(x1=rnorm(100)) #Create unmarked frame umf <- unmarkedFrameMMO(y=y, numPrimary=5, siteCovs=sc, type="removal") summary(umf)
Organizes count data along with the covariates.
This S4 class is required by the data argument of
multinomPois
unmarkedFrameMPois(y, siteCovs=NULL, obsCovs=NULL, type, obsToY, piFun)
unmarkedFrameMPois(y, siteCovs=NULL, obsCovs=NULL, type, obsToY, piFun)
y |
An RxJ matrix of count data, where R is the number of sites (transects) and J is the maximum number of observations per site. |
siteCovs |
A |
obsCovs |
Either a named list of RxJ |
type |
Either "removal" for removal sampling, "double" for standard
double observer sampling, or "depDouble" for dependent double observer
sampling. If this argument not specified, the user must
provide an |
obsToY |
A matrix describing the relationship between |
piFun |
Function used to compute the multinomial cell probabilities
from a matrix of detection probabilities. This is created automatically
if |
unmarkedFrameMPois is the S4 class that holds data to be passed
to the multinomPois
model-fitting function.
an object of class unmarkedFrameMPois
Royle, J. A. (2004). Generalized estimators of avian abundance from count survey data. Animal Biodiversity and Conservation, 27(1), 375-386.
unmarkedFrame-class
, unmarkedFrame
,
multinomPois
, piFuns
# Fake doulbe observer data R <- 4 # number of sites J <- 2 # number of observers y <- matrix(c( 1,0,3, 0,0,0, 2,0,1, 0,0,2), nrow=R, ncol=J+1, byrow=TRUE) y site.covs <- data.frame(x1=1:4, x2=factor(c('A','B','A','B'))) site.covs obs.covs <- list( x3 = matrix(c( -1,0, -2,0, -3,1, 0,0), nrow=R, ncol=J, byrow=TRUE), x4 = matrix(c( 'a','b', 'a','b', 'a','b', 'a','b'), nrow=R, ncol=J, byrow=TRUE)) obs.covs # Create unmarkedFrame umf <- unmarkedFrameMPois(y=y, siteCovs=site.covs, obsCovs=obs.covs, type="double") # The above is the same as: o2y <- matrix(1, 2, 3) pifun <- function(p) { M <- nrow(p) pi <- matrix(NA, M, 3) pi[, 1] <- p[, 1] * (1 - p[, 2]) pi[, 2] <- p[, 2] * (1 - p[, 1]) pi[, 3] <- p[, 1] * p[, 2] return(pi) } umf <- unmarkedFrameMPois(y=y, siteCovs=site.covs, obsCovs=obs.covs, obsToY=o2y, piFun="pifun") # Fit a model fm <- multinomPois(~1 ~1, umf)
# Fake doulbe observer data R <- 4 # number of sites J <- 2 # number of observers y <- matrix(c( 1,0,3, 0,0,0, 2,0,1, 0,0,2), nrow=R, ncol=J+1, byrow=TRUE) y site.covs <- data.frame(x1=1:4, x2=factor(c('A','B','A','B'))) site.covs obs.covs <- list( x3 = matrix(c( -1,0, -2,0, -3,1, 0,0), nrow=R, ncol=J, byrow=TRUE), x4 = matrix(c( 'a','b', 'a','b', 'a','b', 'a','b'), nrow=R, ncol=J, byrow=TRUE)) obs.covs # Create unmarkedFrame umf <- unmarkedFrameMPois(y=y, siteCovs=site.covs, obsCovs=obs.covs, type="double") # The above is the same as: o2y <- matrix(1, 2, 3) pifun <- function(p) { M <- nrow(p) pi <- matrix(NA, M, 3) pi[, 1] <- p[, 1] * (1 - p[, 2]) pi[, 2] <- p[, 2] * (1 - p[, 1]) pi[, 3] <- p[, 1] * p[, 2] return(pi) } umf <- unmarkedFrameMPois(y=y, siteCovs=site.covs, obsCovs=obs.covs, obsToY=o2y, piFun="pifun") # Fit a model fm <- multinomPois(~1 ~1, umf)
Organizes detection, non-detection data along with the covariates.
This S4 class is required by the data argument of occu
and
occuRN
unmarkedFrameOccu(y, siteCovs=NULL, obsCovs=NULL)
unmarkedFrameOccu(y, siteCovs=NULL, obsCovs=NULL)
y |
An RxJ matrix of the detection, non-detection data, where R is the number of sites, J is the maximum number of sampling periods per site. |
siteCovs |
A |
obsCovs |
Either a named list of |
unmarkedFrameOccu is the S4 class that holds data to be passed
to the occu
and occuRN
model-fitting function.
an object of class unmarkedFrameOccu
unmarkedFrame-class
, unmarkedFrame
,
occu
, occuRN
# Fake data R <- 4 # number of sites J <- 3 # number of visits y <- matrix(c( 1,1,0, 0,0,0, 1,1,1, 1,0,1), nrow=R, ncol=J, byrow=TRUE) y site.covs <- data.frame(x1=1:4, x2=factor(c('A','B','A','B'))) site.covs obs.covs <- list( x3 = matrix(c( -1,0,1, -2,0,0, -3,1,0, 0,0,0), nrow=R, ncol=J, byrow=TRUE), x4 = matrix(c( 'a','b','c', 'd','b','a', 'a','a','c', 'a','b','a'), nrow=R, ncol=J, byrow=TRUE)) obs.covs umf <- unmarkedFrameOccu(y=y, siteCovs=site.covs, obsCovs=obs.covs) # organize data umf # look at data summary(umf) # summarize fm <- occu(~1 ~1, umf) # fit a model
# Fake data R <- 4 # number of sites J <- 3 # number of visits y <- matrix(c( 1,1,0, 0,0,0, 1,1,1, 1,0,1), nrow=R, ncol=J, byrow=TRUE) y site.covs <- data.frame(x1=1:4, x2=factor(c('A','B','A','B'))) site.covs obs.covs <- list( x3 = matrix(c( -1,0,1, -2,0,0, -3,1,0, 0,0,0), nrow=R, ncol=J, byrow=TRUE), x4 = matrix(c( 'a','b','c', 'd','b','a', 'a','a','c', 'a','b','a'), nrow=R, ncol=J, byrow=TRUE)) obs.covs umf <- unmarkedFrameOccu(y=y, siteCovs=site.covs, obsCovs=obs.covs) # organize data umf # look at data summary(umf) # summarize fm <- occu(~1 ~1, umf) # fit a model
occuCOP
Organizes count data along with the covariates. The unmarkedFrame S4 class required by the data
argument of occuCOP
.
unmarkedFrameOccuCOP(y, L, siteCovs = NULL, obsCovs = NULL)
unmarkedFrameOccuCOP(y, L, siteCovs = NULL, obsCovs = NULL)
y |
An MxJ matrix of the count data, where M is the number of sites, J is the maximum number of observation periods (sampling occasions, transects, discretised sessions...) per site. |
L |
An MxJ matrix of the length of the observation periods. For example, duration of the sampling occasion in hours, duration of the discretised session in days, or length of the transect in meters. |
siteCovs |
A |
obsCovs |
A named list of dataframes of dimension MxJ, with one dataframe per covariate that varies between sites and observation periods |
unmarkedFrameOccuCOP is the unmarkedFrame S4 class that holds data to be passed to the occuCOP
model-fitting function.
an object of class unmarkedFrameOccuCOP
unmarkedFrame-class
,
unmarkedFrame
,
occuCOP
# Fake data M <- 4 # Number of sites J <- 3 # Number of observation periods # Count data (y <- matrix( c(1, 3, 0, 0, 0, 0, 2, 0, 5, 1, NA, 0), nrow = M, ncol = J, byrow = TRUE )) # Length of observation periods (L <- matrix( c(1, 3, NA, 2, 2, 2, 1, 2, 1, 7, 1, 3), nrow = M, ncol = J, byrow = TRUE )) # Site covariates (site.covs <- data.frame( "elev" = rexp(4), "habitat" = factor(c("forest", "forest", "grassland", "grassland")) )) # Observation covariates (as a list) (obs.covs.list <- list( "rain" = matrix(rexp(M * J), nrow = M, ncol = J), "wind" = matrix( sample(letters[1:3], replace = TRUE, size = M * J), nrow = M, ncol = J) )) # Organise data in a unmarkedFrameOccuCOP object umf <- unmarkedFrameOccuCOP( y = y, L = L, siteCovs = site.covs, obsCovs = obs.covs.list ) # Extract L getL(umf) # Look at data print(umf) # Print the whole data set print(umf[1, 2]) # Print the data of the 1st site, 2nd observation summary(umf) # Summarise the data set plot(umf) # Plot the count of detection events # L is optional, if absent, it will be replaced by a MxJ matrix of 1 unmarkedFrameOccuCOP( y = y, siteCovs = site.covs, obsCovs = obs.covs.list ) # Covariates are optional unmarkedFrameOccuCOP(y = y)
# Fake data M <- 4 # Number of sites J <- 3 # Number of observation periods # Count data (y <- matrix( c(1, 3, 0, 0, 0, 0, 2, 0, 5, 1, NA, 0), nrow = M, ncol = J, byrow = TRUE )) # Length of observation periods (L <- matrix( c(1, 3, NA, 2, 2, 2, 1, 2, 1, 7, 1, 3), nrow = M, ncol = J, byrow = TRUE )) # Site covariates (site.covs <- data.frame( "elev" = rexp(4), "habitat" = factor(c("forest", "forest", "grassland", "grassland")) )) # Observation covariates (as a list) (obs.covs.list <- list( "rain" = matrix(rexp(M * J), nrow = M, ncol = J), "wind" = matrix( sample(letters[1:3], replace = TRUE, size = M * J), nrow = M, ncol = J) )) # Organise data in a unmarkedFrameOccuCOP object umf <- unmarkedFrameOccuCOP( y = y, L = L, siteCovs = site.covs, obsCovs = obs.covs.list ) # Extract L getL(umf) # Look at data print(umf) # Print the whole data set print(umf[1, 2]) # Print the data of the 1st site, 2nd observation summary(umf) # Summarise the data set plot(umf) # Plot the count of detection events # L is optional, if absent, it will be replaced by a MxJ matrix of 1 unmarkedFrameOccuCOP( y = y, siteCovs = site.covs, obsCovs = obs.covs.list ) # Covariates are optional unmarkedFrameOccuCOP(y = y)
Organizes detection, non-detection data along with the covariates.
This S4 class is required by the data argument of occu
and
occuRN
unmarkedFrameOccuFP(y, siteCovs=NULL, obsCovs=NULL, type)
unmarkedFrameOccuFP(y, siteCovs=NULL, obsCovs=NULL, type)
y |
An RxJ matrix of the detection, non-detection data, where R is the number of sites, J is the maximum number of sampling periods per site. |
siteCovs |
A |
obsCovs |
Either a named list of |
type |
A vector with 3 values designating the number of occassions
where data is of type 1, type 2, and type 3 - see |
unmarkedFrameOccuFP is the S4 class that holds data to be passed
to the occu
and occuRN
model-fitting function.
an object of class unmarkedFrameOccuFP
unmarkedFrame-class
, unmarkedFrame
,
occuFP
n = 100 o = 10 o1 = 5 y = matrix(0,n,o) p = .7 r = .5 fp = 0.05 y[1:(n*.5),(o-o1+1):o] <- rbinom((n*o1*.5),1,p) y[1:(n*.5),1:(o-o1)] <- rbinom((o-o1)*n*.5,1,r) y[(n*.5+1):n,(o-o1+1):o] <- rbinom((n*o1*.5),1,fp) type <- c((o-o1),o1,0) ### vector with the number of each data type site <- c(rep(1,n*.5*.8),rep(0,n*.5*.2),rep(1,n*.5*.2),rep(0,n*.8*.5)) occ <- matrix(c(rep(0,n*(o-o1)),rep(1,n*o1)),n,o) site <- data.frame(habitat = site) occ <- list(METH = occ) umf1 <- unmarkedFrameOccuFP(y,site,occ, type = type) m1 <- occuFP(detformula = ~ METH, FPformula = ~1, stateformula = ~ habitat, data = umf1)
n = 100 o = 10 o1 = 5 y = matrix(0,n,o) p = .7 r = .5 fp = 0.05 y[1:(n*.5),(o-o1+1):o] <- rbinom((n*o1*.5),1,p) y[1:(n*.5),1:(o-o1)] <- rbinom((o-o1)*n*.5,1,r) y[(n*.5+1):n,(o-o1+1):o] <- rbinom((n*o1*.5),1,fp) type <- c((o-o1),o1,0) ### vector with the number of each data type site <- c(rep(1,n*.5*.8),rep(0,n*.5*.2),rep(1,n*.5*.2),rep(0,n*.8*.5)) occ <- matrix(c(rep(0,n*(o-o1)),rep(1,n*o1)),n,o) site <- data.frame(habitat = site) occ <- list(METH = occ) umf1 <- unmarkedFrameOccuFP(y,site,occ, type = type) m1 <- occuFP(detformula = ~ METH, FPformula = ~1, stateformula = ~ habitat, data = umf1)
Organizes multi-state occupancy data (currently single-season only)
along with covariates. This S4 class is required by the data argument
of occuMS
unmarkedFrameOccuMS(y, siteCovs=NULL, obsCovs=NULL, numPrimary=1, yearlySiteCovs=NULL)
unmarkedFrameOccuMS(y, siteCovs=NULL, obsCovs=NULL, numPrimary=1, yearlySiteCovs=NULL)
y |
An MxR matrix of multi-state occupancy data for a species,
where M is the number of sites and R is the maximum number of
observations per site (across all primary and secondary periods, if
you have multi-season data). Values in |
siteCovs |
A |
obsCovs |
Either a named list of |
numPrimary |
Number of primary time periods (e.g. seasons) for the dynamic or multi-season version of the model. There should be an equal number of secondary periods in each primary period. |
yearlySiteCovs |
A data frame with one column per covariate that varies among sites and primary periods (e.g. years). It should have MxT rows where M is the number of sites and T the number of primary periods, ordered by site-primary period. These covariates only used for dynamic (multi-season) models. |
unmarkedFrameOccuMS is the S4 class that holds data to be passed
to the occuMS
model-fitting function.
an object of class unmarkedFrameOccuMS
Ken Kellner [email protected]
unmarkedFrame-class
, unmarkedFrame
,
occuMS
# Fake data #Parameters N <- 100; J <- 3; S <- 3 psi <- c(0.5,0.3,0.2) p11 <- 0.4; p12 <- 0.25; p22 <- 0.3 #Simulate state z <- sample(0:2, N, replace=TRUE, prob=psi) #Simulate detection y <- matrix(0,nrow=N,ncol=J) for (n in 1:N){ probs <- switch(z[n]+1, c(0,0,0), c(1-p11,p11,0), c(1-p12-p22,p12,p22)) if(z[n]>0){ y[n,] <- sample(0:2, J, replace=TRUE, probs) } } #Covariates site_covs <- as.data.frame(matrix(rnorm(N*2),ncol=2)) # nrow = # of sites obs_covs <- as.data.frame(matrix(rnorm(N*J*2),ncol=2)) # nrow = N*J #Build unmarked frame umf <- unmarkedFrameOccuMS(y=y,siteCovs=site_covs,obsCovs=obs_covs) umf # look at data summary(umf) # summarize plot(umf) # visualize umf@numStates # check number of occupancy states detected
# Fake data #Parameters N <- 100; J <- 3; S <- 3 psi <- c(0.5,0.3,0.2) p11 <- 0.4; p12 <- 0.25; p22 <- 0.3 #Simulate state z <- sample(0:2, N, replace=TRUE, prob=psi) #Simulate detection y <- matrix(0,nrow=N,ncol=J) for (n in 1:N){ probs <- switch(z[n]+1, c(0,0,0), c(1-p11,p11,0), c(1-p12-p22,p12,p22)) if(z[n]>0){ y[n,] <- sample(0:2, J, replace=TRUE, probs) } } #Covariates site_covs <- as.data.frame(matrix(rnorm(N*2),ncol=2)) # nrow = # of sites obs_covs <- as.data.frame(matrix(rnorm(N*J*2),ncol=2)) # nrow = N*J #Build unmarked frame umf <- unmarkedFrameOccuMS(y=y,siteCovs=site_covs,obsCovs=obs_covs) umf # look at data summary(umf) # summarize plot(umf) # visualize umf@numStates # check number of occupancy states detected
Organizes detection, non-detection data for multiple species along
with the covariates. This S4 class is required by the data argument
of occuMulti
unmarkedFrameOccuMulti(y, siteCovs=NULL, obsCovs=NULL, maxOrder)
unmarkedFrameOccuMulti(y, siteCovs=NULL, obsCovs=NULL, maxOrder)
y |
A list (optionally a named list) of length S where each element is an MxJ matrix of the detection, non-detection data for one species, where M is the number of sites, J is the maximum number of sampling periods per site, and S is the number of species in the analysis. |
siteCovs |
A |
obsCovs |
Either a named list of |
maxOrder |
Optional; specify maximum interaction order. Defaults to number of species (all possible interactions). Reducing this value may speed up creation of unmarked frame if you aren't interested in higher-order interactions. |
unmarkedFrameOccuMulti is the S4 class that holds data to be passed
to the occuMulti
model-fitting function.
an object of class unmarkedFrameOccuMulti
Ken Kellner [email protected]
unmarkedFrame-class
, unmarkedFrame
,
occuMulti
# Fake data S <- 3 # number of species M <- 4 # number of sites J <- 3 # number of visits y <- list(matrix(rbinom(M*J,1,0.5),M,J), # species 1 matrix(rbinom(M*J,1,0.5),M,J), # species 2 matrix(rbinom(M*J,1,0.2),M,J)) # species 3 site.covs <- data.frame(x1=1:4, x2=factor(c('A','B','A','B'))) site.covs umf <- unmarkedFrameOccuMulti(y=y, siteCovs=site.covs, obsCovs=NULL) # organize data umf # look at data summary(umf) # summarize plot(umf) # visualize #fm <- occu(~1 ~1, umf) # fit a model
# Fake data S <- 3 # number of species M <- 4 # number of sites J <- 3 # number of visits y <- list(matrix(rbinom(M*J,1,0.5),M,J), # species 1 matrix(rbinom(M*J,1,0.5),M,J), # species 2 matrix(rbinom(M*J,1,0.2),M,J)) # species 3 site.covs <- data.frame(x1=1:4, x2=factor(c('A','B','A','B'))) site.covs umf <- unmarkedFrameOccuMulti(y=y, siteCovs=site.covs, obsCovs=NULL) # organize data umf # look at data summary(umf) # summarize plot(umf) # visualize #fm <- occu(~1 ~1, umf) # fit a model
Organizes time-to-detection occupancy data along with covariates.
This S4 class is required by the data argument of occuTTD
unmarkedFrameOccuTTD(y, surveyLength, siteCovs=NULL, obsCovs=NULL, numPrimary=1, yearlySiteCovs=NULL)
unmarkedFrameOccuTTD(y, surveyLength, siteCovs=NULL, obsCovs=NULL, numPrimary=1, yearlySiteCovs=NULL)
y |
An MxR matrix of time-to-detection data for a species,
where M is the number of sites and R is the maximum number of
observations per site (across all primary periods and observations, if
you have multi-season data). Values in |
surveyLength |
The maximum length of a survey, in the same units as
|
siteCovs |
A |
obsCovs |
Either a named list of |
numPrimary |
Number of primary time periods (e.g. seasons) for the dynamic or multi-season version of the model. There should be an equal number of secondary periods in each primary period. |
yearlySiteCovs |
A data frame with one column per covariate that varies among sites and primary periods (e.g. years). It should have MxT rows where M is the number of sites and T the number of primary periods, ordered by site-primary period. These covariates only used for dynamic (multi-season) models. |
unmarkedFrameOccuTTD is the S4 class that holds data to be passed
to the occuTTD
model-fitting function.
an object of class unmarkedFrameOccuTTD
If the time-to-detection values in y
are very large (e.g., because
they are expressed as numbers of seconds) you may have issues fitting models.
An easy solution is to convert your units (e.g., from seconds to decimal
minutes) to keep the values as close to 0 as possible.
Ken Kellner [email protected]
# For a single-season model N <- 100 #Number of sites psi <- 0.4 #Occupancy probability lam <- 7 #Parameter for exponential distribution of time to detection Tmax <- 10 #Maximum survey length z <- rbinom(N, 1, psi) #Simulate occupancy y <- rexp(N, 1/lam) #Simulate time to detection y[z==0] <- Tmax y[y>Tmax] <- Tmax sc <- as.data.frame(matrix(rnorm(N*2),ncol=2)) #Site covs oc <- as.data.frame(matrix(rnorm(N*2),ncol=2)) #obs covs umf <- unmarkedFrameOccuTTD(y=y, surveyLength=Tmax, siteCovs=sc, obsCovs=oc)
# For a single-season model N <- 100 #Number of sites psi <- 0.4 #Occupancy probability lam <- 7 #Parameter for exponential distribution of time to detection Tmax <- 10 #Maximum survey length z <- rbinom(N, 1, psi) #Simulate occupancy y <- rexp(N, 1/lam) #Simulate time to detection y[z==0] <- Tmax y[y>Tmax] <- Tmax sc <- as.data.frame(matrix(rnorm(N*2),ncol=2)) #Site covs oc <- as.data.frame(matrix(rnorm(N*2),ncol=2)) #obs covs umf <- unmarkedFrameOccuTTD(y=y, surveyLength=Tmax, siteCovs=sc, obsCovs=oc)
Organizes repeated count data along with the covariates and possibly the dates on which each survey was conducted. This S4 class is required by the data argument of pcountOpen
unmarkedFramePCO(y, siteCovs=NULL, obsCovs=NULL, yearlySiteCovs, numPrimary, primaryPeriod)
unmarkedFramePCO(y, siteCovs=NULL, obsCovs=NULL, yearlySiteCovs, numPrimary, primaryPeriod)
y |
An MxJT matrix of the repeated count data, where M is the number of sites, J is the maximum number of secondary sampling periods per site and T is the maximum number of primary sampling periods per site. |
siteCovs |
A |
obsCovs |
Either a named list of |
yearlySiteCovs |
Either a named list of MxT |
numPrimary |
Maximum number of observed primary periods for each site |
primaryPeriod |
matrix of integers indicating the primary period of each survey. |
unmarkedFramePCO is the S4 class that holds data to be passed
to the pcountOpen
model-fitting function.
The unmarkedFramePCO class is similar to the unmarkedFramePCount class except that it contains the dates for each survey, which needs to be supplied .
an object of class unmarkedFramePCO
unmarkedFrame-class
, unmarkedFrame
,
pcountOpen
# Repeated count data with 5 primary periods and # no secondary sampling periods (ie J==1) y1 <- matrix(c( 0, 2, 3, 2, 0, 2, 2, 3, 1, 1, 1, 1, 0, 0, 3, 0, 0, 0, 0, 0), nrow=4, ncol=5, byrow=TRUE) # Site-specific covariates sc1 <- data.frame(x1 = 1:4, x2 = c('A','A','B','B')) # Observation-specific covariates oc1 <- list( x3 = matrix(1:5, nrow=4, ncol=5, byrow=TRUE), x4 = matrix(letters[1:5], nrow=4, ncol=5, byrow=TRUE)) # Primary periods of surveys primaryPeriod1 <- matrix(as.integer(c( 1, 2, 5, 7, 8, 1, 2, 3, 4, 5, 1, 2, 4, 5, 6, 1, 3, 5, 6, 7)), nrow=4, ncol=5, byrow=TRUE) # Create the unmarkedFrame umf1 <- unmarkedFramePCO(y=y1, siteCovs=sc1, obsCovs=oc1, numPrimary=5, primaryPeriod=primaryPeriod1) # Take a look umf1 summary(umf1) # Repeated count data with 4 primary periods and # no 2 secondary sampling periods (ie J=2) y2 <- matrix(c( 0,0, 2,2, 3,2, 2,2, 2,2, 2,1, 3,2, 1,1, 1,0, 1,1, 0,0, 0,0, 0,0, 0,0, 0,0, 0,0), nrow=4, ncol=8, byrow=TRUE) # Site-specific covariates sc2 <- data.frame(x1 = 1:4, x2 = c('A','A','B','B')) # Observation-specific covariates oc2 <- list( x3 = matrix(1:8, nrow=4, ncol=8, byrow=TRUE), x4 = matrix(letters[1:8], nrow=4, ncol=8, byrow=TRUE)) # Yearly-site covariates ysc <- list( x5 = matrix(c( 1,2,3,4, 1,2,3,4, 1,2,3,4, 1,2,3,4), nrow=4, ncol=4, byrow=TRUE)) # Primary periods of surveys primaryPeriod2 <- matrix(as.integer(c( 1,2,5,7, 1,2,3,4, 1,2,4,5, 1,3,5,6)), nrow=4, ncol=4, byrow=TRUE) # Create the unmarkedFrame umf2 <- unmarkedFramePCO(y=y2, siteCovs=sc2, obsCovs=oc2, yearlySiteCovs=ysc, numPrimary=4, primaryPeriod=primaryPeriod2) # Take a look umf2 summary(umf2)
# Repeated count data with 5 primary periods and # no secondary sampling periods (ie J==1) y1 <- matrix(c( 0, 2, 3, 2, 0, 2, 2, 3, 1, 1, 1, 1, 0, 0, 3, 0, 0, 0, 0, 0), nrow=4, ncol=5, byrow=TRUE) # Site-specific covariates sc1 <- data.frame(x1 = 1:4, x2 = c('A','A','B','B')) # Observation-specific covariates oc1 <- list( x3 = matrix(1:5, nrow=4, ncol=5, byrow=TRUE), x4 = matrix(letters[1:5], nrow=4, ncol=5, byrow=TRUE)) # Primary periods of surveys primaryPeriod1 <- matrix(as.integer(c( 1, 2, 5, 7, 8, 1, 2, 3, 4, 5, 1, 2, 4, 5, 6, 1, 3, 5, 6, 7)), nrow=4, ncol=5, byrow=TRUE) # Create the unmarkedFrame umf1 <- unmarkedFramePCO(y=y1, siteCovs=sc1, obsCovs=oc1, numPrimary=5, primaryPeriod=primaryPeriod1) # Take a look umf1 summary(umf1) # Repeated count data with 4 primary periods and # no 2 secondary sampling periods (ie J=2) y2 <- matrix(c( 0,0, 2,2, 3,2, 2,2, 2,2, 2,1, 3,2, 1,1, 1,0, 1,1, 0,0, 0,0, 0,0, 0,0, 0,0, 0,0), nrow=4, ncol=8, byrow=TRUE) # Site-specific covariates sc2 <- data.frame(x1 = 1:4, x2 = c('A','A','B','B')) # Observation-specific covariates oc2 <- list( x3 = matrix(1:8, nrow=4, ncol=8, byrow=TRUE), x4 = matrix(letters[1:8], nrow=4, ncol=8, byrow=TRUE)) # Yearly-site covariates ysc <- list( x5 = matrix(c( 1,2,3,4, 1,2,3,4, 1,2,3,4, 1,2,3,4), nrow=4, ncol=4, byrow=TRUE)) # Primary periods of surveys primaryPeriod2 <- matrix(as.integer(c( 1,2,5,7, 1,2,3,4, 1,2,4,5, 1,3,5,6)), nrow=4, ncol=4, byrow=TRUE) # Create the unmarkedFrame umf2 <- unmarkedFramePCO(y=y2, siteCovs=sc2, obsCovs=oc2, yearlySiteCovs=ysc, numPrimary=4, primaryPeriod=primaryPeriod2) # Take a look umf2 summary(umf2)
Organizes repeated count data along with the covariates.
This S4 class is required by the data argument of pcount
unmarkedFramePCount(y, siteCovs=NULL, obsCovs=NULL)
unmarkedFramePCount(y, siteCovs=NULL, obsCovs=NULL)
y |
An RxJ matrix of the repeated count data, where R is the number of sites, J is the maximum number of sampling periods per site. |
siteCovs |
A |
obsCovs |
Either a named list of |
unmarkedFramePCount is the S4 class that holds data to be passed
to the pcount
model-fitting function.
an object of class unmarkedFramePCount
unmarkedFrame-class
, unmarkedFrame
,
pcount
# Fake data R <- 4 # number of sites J <- 3 # number of visits y <- matrix(c( 1,2,0, 0,0,0, 1,1,1, 2,2,1), nrow=R, ncol=J, byrow=TRUE) y site.covs <- data.frame(x1=1:4, x2=factor(c('A','B','A','B'))) site.covs obs.covs <- list( x3 = matrix(c( -1,0,1, -2,0,0, -3,1,0, 0,0,0), nrow=R, ncol=J, byrow=TRUE), x4 = matrix(c( 'a','b','c', 'd','b','a', 'a','a','c', 'a','b','a'), nrow=R, ncol=J, byrow=TRUE)) obs.covs umf <- unmarkedFramePCount(y=y, siteCovs=site.covs, obsCovs=obs.covs) # organize data umf # take a l summary(umf) # summarize data fm <- pcount(~1 ~1, umf, K=10) # fit a model
# Fake data R <- 4 # number of sites J <- 3 # number of visits y <- matrix(c( 1,2,0, 0,0,0, 1,1,1, 2,2,1), nrow=R, ncol=J, byrow=TRUE) y site.covs <- data.frame(x1=1:4, x2=factor(c('A','B','A','B'))) site.covs obs.covs <- list( x3 = matrix(c( -1,0,1, -2,0,0, -3,1,0, 0,0,0), nrow=R, ncol=J, byrow=TRUE), x4 = matrix(c( 'a','b','c', 'd','b','a', 'a','a','c', 'a','b','a'), nrow=R, ncol=J, byrow=TRUE)) obs.covs umf <- unmarkedFramePCount(y=y, siteCovs=site.covs, obsCovs=obs.covs) # organize data umf # take a l summary(umf) # summarize data fm <- pcount(~1 ~1, umf, K=10) # fit a model
These functions construct unmarkedFrames for data collected during primary and secondary sampling periods.
unmarkedMultFrame(y, siteCovs, obsCovs, numPrimary, yearlySiteCovs) unmarkedFrameGMM(y, siteCovs, obsCovs, numPrimary, yearlySiteCovs, type, obsToY, piFun) unmarkedFrameGDS(y, siteCovs, numPrimary, yearlySiteCovs, dist.breaks, survey, unitsIn, tlength) unmarkedFrameGPC(y, siteCovs, obsCovs, numPrimary, yearlySiteCovs)
unmarkedMultFrame(y, siteCovs, obsCovs, numPrimary, yearlySiteCovs) unmarkedFrameGMM(y, siteCovs, obsCovs, numPrimary, yearlySiteCovs, type, obsToY, piFun) unmarkedFrameGDS(y, siteCovs, numPrimary, yearlySiteCovs, dist.breaks, survey, unitsIn, tlength) unmarkedFrameGPC(y, siteCovs, obsCovs, numPrimary, yearlySiteCovs)
y |
A matrix of the observed data. |
siteCovs |
Data frame of covariates that vary at the site level. |
obsCovs |
Data frame of covariates that vary within site-year-observation level. |
numPrimary |
Number of primary time periods (seasons in the multiseason model). |
yearlySiteCovs |
Data frame containing covariates at the site-year level. |
type |
Set to "removal" for constant-interval removal sampling, "double" for standard double observer sampling, or "depDouble" for dependent double observer sampling. This should be not be specified for other types of survey designs. |
obsToY |
A matrix specifying relationship between observation-level covariates and response matrix |
piFun |
A function converting an MxJ matrix of detection probabilities into an MxJ matrix of multinomial cell probabilities. |
dist.breaks |
see |
survey |
see |
unitsIn |
see |
tlength |
see |
unmarkedMultFrame objects are used by colext
.
unmarkedFrameGMM objects are used by gmultmix
.
unmarkedFrameGDS objects are used by gdistsamp
.
unmarkedFrameGPC objects are used by gpcount
.
For a study with M sites, T years, and a maximum of
J observations per site-year, the data can be supplied in a
variety of ways but are stored as follows.
y
is an matrix, with each row
corresponding to a site.
siteCovs
is a data frame with
rows.
yearlySiteCovs
is a data frame with rows which
are in site-major, year-minor order.
obsCovs
is a data frame
with rows, which are ordered by site-year-observation, so that
a column of
obsCovs
corresponds to as.vector(t(y))
,
element-by-element. The number of years must be specified in
numPrimary
.
If the data are in long format, the convenience function
formatMult
is useful for creating the unmarkedMultFrame.
unmarkedFrameGMM and unmarkedFrameGDS are superclasses of
unmarkedMultFrame containing information on
the survey design used that resulted in multinomial outcomes. For
unmarkedFrameGMM and constant-interval removal sampling, you can set
type="removal" and ignore
the arguments obsToY and piFun. Similarly, for double-observer sampling,
setting type="double" or type="depDouble" will automatically create an appropiate
obsToY matrix and piFuns
. For all other situations, the type
argument of unmarkedFrameGMM should be
ignored and the obsToY and piFun arguments must be specified. piFun must be a
function that converts an MxJ matrix of detection probabilities into an MxJ
matrix of multinomial cell probabilities. obsToY is a matrix describing how
the obsCovs relate to the observed counts y. For further discussion and examples
see the help page for multinomPois
and piFuns
.
unmarkedFrameGMM and unmarkedFrameGDS objects can be created from an unmarkedMultFrame using the "as" conversion method. See examples.
an unmarkedMultFrame or unmarkedFrameGMM object
Data used with colext, gmultmix, and gdistsamp may be collected during a single year, so yearlySiteCovs may be a misnomer is some cases.
formatMult
, colext
,
gmultmix
, gpcount
n <- 50 # number of sites T <- 4 # number of primary periods J <- 3 # number of secondary periods site <- 1:50 years <- data.frame(matrix(rep(2010:2013, each=n), n, T)) years <- data.frame(lapply(years, as.factor)) occasions <- data.frame(matrix(rep(1:(J*T), each=n), n, J*T)) y <- matrix(0:1, n, J*T) umf <- unmarkedMultFrame(y=y, siteCovs = data.frame(site=site), obsCovs=list(occasion=occasions), yearlySiteCovs=list(year=years), numPrimary=T) umfGMM1 <- unmarkedFrameGMM(y=y, siteCovs = data.frame(site=site), obsCovs=list(occasion=occasions), yearlySiteCovs=data.frame(year=c(t(years))), # or: yearlySiteCovs=list(year=years), numPrimary=T, type="removal") # A user-defined piFun calculating removal probs when time intervals differ. instRemPiFun <- function(p) { M <- nrow(p) J <- ncol(p) pi <- matrix(NA, M, J) p[,1] <- pi[,1] <- 1 - (1 - p[,1])^2 p[,2] <- 1 - (1 - p[,2])^3 p[,3] <- 1 - (1 - p[,3])^5 for(i in 2:J) { pi[,i] <- pi[, i - 1]/p[, i - 1] * (1 - p[, i - 1]) * p[, i] } return(pi) } # Associated obsToY matrix required by unmarkedFrameMPois o2y <- diag(ncol(y)) o2y[upper.tri(o2y)] <- 1 o2y umfGMM2 <- unmarkedFrameGMM(y=y, siteCovs = data.frame(site=site), obsCovs=list(occasion=occasions), yearlySiteCovs=data.frame(year=c(t(years))), numPrimary=T, obsToY=o2y, piFun="instRemPiFun") str(umfGMM2)
n <- 50 # number of sites T <- 4 # number of primary periods J <- 3 # number of secondary periods site <- 1:50 years <- data.frame(matrix(rep(2010:2013, each=n), n, T)) years <- data.frame(lapply(years, as.factor)) occasions <- data.frame(matrix(rep(1:(J*T), each=n), n, J*T)) y <- matrix(0:1, n, J*T) umf <- unmarkedMultFrame(y=y, siteCovs = data.frame(site=site), obsCovs=list(occasion=occasions), yearlySiteCovs=list(year=years), numPrimary=T) umfGMM1 <- unmarkedFrameGMM(y=y, siteCovs = data.frame(site=site), obsCovs=list(occasion=occasions), yearlySiteCovs=data.frame(year=c(t(years))), # or: yearlySiteCovs=list(year=years), numPrimary=T, type="removal") # A user-defined piFun calculating removal probs when time intervals differ. instRemPiFun <- function(p) { M <- nrow(p) J <- ncol(p) pi <- matrix(NA, M, J) p[,1] <- pi[,1] <- 1 - (1 - p[,1])^2 p[,2] <- 1 - (1 - p[,2])^3 p[,3] <- 1 - (1 - p[,3])^5 for(i in 2:J) { pi[,i] <- pi[, i - 1]/p[, i - 1] * (1 - p[, i - 1]) * p[, i] } return(pi) } # Associated obsToY matrix required by unmarkedFrameMPois o2y <- diag(ncol(y)) o2y[upper.tri(o2y)] <- 1 o2y umfGMM2 <- unmarkedFrameGMM(y=y, siteCovs = data.frame(site=site), obsCovs=list(occasion=occasions), yearlySiteCovs=data.frame(year=c(t(years))), numPrimary=T, obsToY=o2y, piFun="instRemPiFun") str(umfGMM2)
Various functions to summarize unmarkedPower objects
## S4 method for signature 'unmarkedPower' show(object) ## S4 method for signature 'unmarkedPower' summary(object, alpha, showIntercepts = FALSE, ...) ## S4 method for signature 'unmarkedPower,missing' plot(x, y, alpha, showIntercepts = FALSE, ...)
## S4 method for signature 'unmarkedPower' show(object) ## S4 method for signature 'unmarkedPower' summary(object, alpha, showIntercepts = FALSE, ...) ## S4 method for signature 'unmarkedPower,missing' plot(x, y, alpha, showIntercepts = FALSE, ...)
object , x
|
An object of class |
alpha |
Desired Type I error rate. If not provided, defaults to the value
specified when calling |
showIntercepts |
Show intercepts output? This is rarely useful. |
y |
Not currently used. |
... |
Not currently used. |
For show
and summary
, summary output is printed to the console.
For plot
, a visualization of the summary output is created.
Ken Kellner [email protected]
A list of power analyses created with powerAnalysis
can be combined
using unmarkedPowerList
, allowing comparison e.g. between different
study designs/sample sizes. A series of methods for unmarkedPowerList
objects are available including a plot
method.
## S4 method for signature 'unmarkedPower' unmarkedPowerList(object, ...) ## S4 method for signature 'unmarkedPowerList' show(object) ## S4 method for signature 'unmarkedPowerList' summary(object, showIntercepts = FALSE, ...) ## S4 method for signature 'unmarkedPowerList,ANY' plot(x, power=NULL, param=NULL, ...)
## S4 method for signature 'unmarkedPower' unmarkedPowerList(object, ...) ## S4 method for signature 'unmarkedPowerList' show(object) ## S4 method for signature 'unmarkedPowerList' summary(object, showIntercepts = FALSE, ...) ## S4 method for signature 'unmarkedPowerList,ANY' plot(x, power=NULL, param=NULL, ...)
object , x
|
For |
showIntercepts |
Show intercepts output? This is rarely useful. |
power |
If specified, adds a dotted line to the plot at this target power value. |
param |
When plotting, the model parameter to plot power vs. sample size for. By default this is the first parameter. |
... |
For |
A unmarkedPowerList
object, a summary of the object in the console,
or a summary plot, depending on the method
Ken Kellner [email protected]
## Not run: # Build unmarkedFrame umf <- unmarkedFrameOccu(y = matrix(NA, 300, 8), siteCovs = data.frame(elev=rnorm(300))) # Run power analyses cf <- list(state = c(0, -0.4), det = 0) pa1 <- powerAnalysis(umf, model=occu, formula=~1~elev, effects=cf) pa2 <- powerAnalysis(umf[1:100,], model=occu, formula=~1~elev, effects=cf) # Combine them into a list (pl <- unmarkedPowerList(pa1, pa2)) # Look at summary plot for elev effect plot(pl, power=0.8, param='elev') ## End(Not run)
## Not run: # Build unmarkedFrame umf <- unmarkedFrameOccu(y = matrix(NA, 300, 8), siteCovs = data.frame(elev=rnorm(300))) # Run power analyses cf <- list(state = c(0, -0.4), det = 0) pa1 <- powerAnalysis(umf, model=occu, formula=~1~elev, effects=cf) pa2 <- powerAnalysis(umf[1:100,], model=occu, formula=~1~elev, effects=cf) # Combine them into a list (pl <- unmarkedPowerList(pa1, pa2)) # Look at summary plot for elev effect plot(pl, power=0.8, param='elev') ## End(Not run)
"unmarkedRanef"
Stores the estimated posterior distributions of the latent abundance or occurrence variables.
Objects can be created by calls of the form ranef
.
post
:An array
with nSites rows and Nmax
(K+1) columns and nPrimaryPeriod slices
signature(object = "unmarkedRanef")
: Extract the
Best Unbiased Predictors (BUPs) of the latent variables (abundance
or occurrence state). Either the posterior mean or median can be
requested using the stat
argument.
signature(object = "unmarkedRanef")
: Compute
confidence intervals.
signature(x = "unmarkedRanef", y = "missing")
:
Plot the posteriors using xyplot
signature(object = "unmarkedRanef")
: Display the
modes and confidence intervals
Empirical Bayes methods can underestimate the variance of the posterior distribution because they do not account for uncertainty in the hyperparameters (lambda or psi). Simulation studies indicate that the posterior mode can exhibit (3-5 percent) negatively bias as a point estimator of site-specific abundance. It appears to be safer to use the posterior mean even though this will not be an integer in general.
Laird, N.M. and T.A. Louis. 1987. Empirical Bayes confidence intervals based on bootstrap samples. Journal of the American Statistical Association 82:739–750.
Carlin, B.P and T.A Louis. 1996. Bayes and Empirical Bayes Methods for Data Analysis. Chapman and Hall/CRC.
Royle, J.A and R.M. Dorazio. 2008. Hierarchical Modeling and Inference in Ecology. Academic Press.
showClass("unmarkedRanef")
showClass("unmarkedRanef")
Extract variance-covariance matrix from a fitted model.
A fitted model
Compute the variance inflation factors (VIFs) for covariates in one level of the
model (i.e., occupancy or detection). Calculation of VIFs follows the approach
of function vif
in package car
, using the correlation matrix of
fitted model parameters.
vif(mod, type)
vif(mod, type)
mod |
An unmarked fit object. |
type |
Level of the model for which to calculate VIFs (for example,
|
A named vector of variance inflation factor values for each covariate.