Monday 16 April 2018

Illustrating inter-annual trends of SABAP2 reporting rate in R

Illustrating inter-annual trends of SABAP2 reporting rate in R

In my last post we learnt how to animate annual trends in monthly reporting rate. But was there any change in overall reporting rate across years?

http://bluehillescape.blogspot.co.za/2018/04/animated-sabap2-reporting-rate-timelines.html

So how is reporting rate changing between years for Barn Swallow Hirundo rustica. We can't really tell from the species summary page:

http://sabap2.adu.org.za/species_info.php?spp=493#menu_left

For this exercise you will need the data at the link marked “Pentad level summary (monthly)” under the Data downloads options.

For the animated chart you will need to download and install the free image processing software ImageMagick: https://www.imagemagick.org/script/download.php

# These are the packages required to run these scripts
library(ggplot2); library(dplyr); library(gganimate); library(animation) 

# Tell R where ImageMagick lives:
magickPath <- ani.options="" code="" convert="magickPath)" files="" magemagick-7.0.7-q16="" magick.exe="" rogram="" shortpathname="">

Here we prepare the data (for explanation of these steps see previous post). I've included this so the code can run as a stand-along script.

# Download your data here by replacing the 493 here with the species code of your choice (or pasting the link)
swallow <- africa="" best="" coverage="" data="" filter="" group_by="" http:="" inc="" lat="" occurrence="" of="" pentad="" range="" read.csv="" region="" restrict="" sabap2.adu.org.za="" southern="" species="" species_data_download.php="" spp="493&section=6" stringsasfactors="F)" swallow="" the="" to="" with="">%
  summarise(ok=sum(cards_with_spp))

occurrence <- add="" amp="" cards="" cards_with_spp="" code="" filter="" occurrence="" ok="" rate="" reporting="" reporting_rate="" swallow="">

The new code starts here. First, it is a good idea to remove data from 2007 (and maybe even 2008). 2007 is the year SABAP2 got going: submission rates and spatial coverage were low. Also remove the current year.

# exclude unwanted years:
swallow <- 2018="" a="" and="" by="" c="" error="" filter="" group_by="" here="" in="" include="" measure="" now:="" of="" rate="" reporting="" standard="" summarise="" summary="" swallow="" year="" yrr="">%
summarise(mean=mean(reporting_rate), sd=sd(reporting_rate), n=sum(cards_with_spp), se=sd/sqrt(n))

chart_title = c("SABAP2 reporting rate trend for \n Barn Swallow")

p <- 2018="" aes="" base_size="14)+" chart="" code="" coord_cartesian="" cumulative="T))+geom_bar(stat=" ear="" eporting="" frame="yrr," geom_errorbar="" ggplot="" identity="" is="" labs="" mean="" p="" proportion="" rate="" static="" summary="" the="" theme_bw="" this="" title="chart_title)+" xlab="" xlim="c(2007," ylab="" ymax="mean+se))" ymin="mean-se," yrr="">



# Here I add a regression line across the interannual mean reporting rate values
q = p+
geom_smooth(data=summary, aes(yrr, mean),   method="lm", color="red")

q



# The animated version  
gganimate(q, interval = .5)

# Save the above as gif: suppressMessages(gganimate(q, interval=6, "SWALLOW_barchart_time_series.gif"))


# A simple linear regression suggests no significant change. 
# Obviously real life may be much more complicated than this suggests.   

summary(lm(mean~yrr, data=summary))
## 
## Call:
## lm(formula = mean ~ yrr, data = summary)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.042814 -0.015509  0.005625  0.024023  0.025913 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept)  8.854237   6.314515   1.402    0.198
## yrr         -0.004226   0.003138  -1.347    0.215
## 
## Residual standard error: 0.0285 on 8 degrees of freedom
## Multiple R-squared:  0.1848, Adjusted R-squared:  0.08293 
## F-statistic: 1.814 on 1 and 8 DF,  p-value: 0.215

##  Try a more statistically robust approach to interannual change in reporting rate
## using a clt / bootstrapping approach with random samples

#initialise the target dataframe:

df = data.frame(year=2007, rr=NA)

# Run a loop that selects each year, removes sites with low sampling effort, 
# and then subsamples a random set of the available data 1000 times

for(i in 2008:2017){
  temp = filter(swallow, yrr==i&cards>2)%>%select(reporting_rate)
  holder = rep(NA, 1000)

    for(b in 1:1000){
    temp2 = sample(temp$reporting_rate, 100, replace = F)
    holder[b] = mean(temp2)
  }

  holder2 = data.frame(year=i, rr=holder)
  df = rbind(df, holder2)
}

df = filter(df, year!=2007)

# plot it:

ggplot(df, aes(year, rr))+
  geom_smooth()+
  geom_smooth(method="lm", colour="red")+
  coord_cartesian(ylim=c(0,0.50))


No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...