r - Aggregate a data.frame by time series and with different functions -
i have lots of measurement values, recorded each minute. of values have mean, min , max values given minute. i'd summarize/aggregate whole data.frame have 1 entry every 30 minutes,
str(wgdata) 'data.frame': 115200 obs. of 7 variables: $ timestamp : posixct, format: "2012-11-24 00:00:00" "2012-11-24 00:01:00" "2012-11-24 00:02:00" 7"2012-11-24 00:03:00" ... $ record : int 11683 11684 11685 11686 11687 11688 11689 11690 11691 11692 ... $ tpanel : num -0.075 -0.075 -0.075 -0.095 -0.095 -0.095 -0.095 -0.118 -0.118 -0.118 ... $ vbattery : num 13.8 13.8 13.8 13.8 13.8 ... $ vbatteryheating_avg: num 12.2 12.2 12.2 12.2 12.2 ... $ vbatteryheating_min: num 12.2 12.2 12.2 12.2 12.2 ... $ vbatteryheating_max: num 12.2 12.2 12.2 12.2 12.2 ...
so i'd calculate every 30 minutes: timestamp
, mean of tpanel
(temperatur of panel), mean of vbattery
, mean of vbatteryheating_avg
, min of vbatteryheating_min
, max of vbatteryheating_max
i had success doing
wgdata30min <- aggregate(list(tp = wgdata$tpanel, vb=wgdata$vbatteryheating_avg, vb_min=wgdata$vbatteryheating_min, vb_max=wgdata$vbatteryheating_min), list(timestamp = cut(wgdata$timestamp, "30 min")), mean) head(wgdata30min) timestamp tp vb vb_min vb_max 1 2012-11-24 00:00:00 -0.1621667 12.15467 12.15333 12.15333 2 2012-11-24 00:30:00 -0.4751667 12.13333 12.13133 12.13133 3 2012-11-24 01:00:00 -0.5647333 12.11167 12.11067 12.11067 4 2012-11-24 01:30:00 -0.4573667 12.09133 12.08967 12.08967 5 2012-11-24 02:00:00 -0.4923667 12.07100 12.07000 12.07000 6 2012-11-24 02:30:00 -0.6469000 12.04933 12.04733 12.04733
... did not manage pass array of functions apply columns. appreciated.
i believe data looks this
seconds <- seq(0,100000, by= 600) dates <- as.posixlt(seconds, origin = "2012-11-24", tz = "utc") tpanel <- rnorm(167) vbatteryheating_avg <- rcauchy(167) vbatteryheating_min <- runif(167) vbatteryheating_max <- rexp(167) wgdata <- data.frame(timestamp = dates, tpanel = tpanel, vbatteryheating_avg = vbatteryheating_avg, vbatteryheating_min = vbatteryheating_min, vbatteryheating_max = vbatteryheating_max) head(wgdata) ## timestamp tpanel vbatteryheating_avg vbatteryheating_min vbatteryheating_max ## 1 2012-11-24 00:00:00 0.4770116 10.2937806 0.80151633 0.8722767 ## 2 2012-11-24 00:10:00 0.0304906 -20.7057773 0.32311092 0.7172383 ## 3 2012-11-24 00:20:00 1.4875903 0.5749393 0.74020471 0.5857239 ## 4 2012-11-24 00:30:00 0.4933884 6.6567398 0.73824231 0.3691020 ## 5 2012-11-24 00:40:00 -0.0369843 3.4332840 0.06552402 0.2455765 ## 6 2012-11-24 00:50:00 0.7339858 -3.3787044 0.06451802 0.5952835
probably best solution use plyr
. first, use cut
before make indicator 30-minute chunks. use ddply
, splitting data frame variable.
wgdata$timestamp30min <- cut(wgdata$timestamp,"30 min") library(plyr) out <- ddply(wgdata, .(timestamp30min), summarize, tp = mean(tpanel), vb = mean(vbatteryheating_avg), vb_min = min(vbatteryheating_min), vb_max = max(vbatteryheating_max)) head(out) ## timestamp30min tp vb vb_min vb_max ## 1 2012-11-24 00:00:00 0.6650308 -3.27901911 0.32311092 0.8722767 ## 2 2012-11-24 00:30:00 0.3967966 2.23710649 0.06451802 0.5952835 ## 3 2012-11-24 01:00:00 -0.1326459 -1.20082543 0.50358789 1.0569388 ## 4 2012-11-24 01:30:00 0.7845420 -0.07520645 0.14500901 0.9656004 ## 5 2012-11-24 02:00:00 -0.4523882 0.40472169 0.24997021 1.4056166 ## 6 2012-11-24 02:30:00 -0.2317818 0.61860868 0.64909054 0.2338781
alternatively, use aggregate
each function (mean
, min
, , max
) , use merge
on results, 2 data frames @ time.
Comments
Post a Comment