r - Replace values in a data.frame column based on values in a different column -
i have data.frame:
df <- data.frame(id = rep(c("one", "two", "three"), each = 10), week.born = na) df$week.born[c(5,15,28)] <- c(23,19,24) df id week.born 1 1 na 2 1 na 3 1 na 4 1 na 5 1 23 6 1 na 7 1 na 8 1 na 9 1 na 10 1 na 11 2 na 12 2 na 13 2 na 14 2 na 15 2 19 16 2 na 17 2 na 18 2 na 19 2 na 20 2 na 21 3 na 22 3 na 23 3 na 24 3 na 25 3 na 26 3 na 27 3 na 28 3 24 29 3 na 30 3 na
for one
week.born
values should 23
. two
week.born
values should 19
. one
week.born
values should 24
.
whats best way this?
i create data.frame containing mapping , simple join:
require(dplyr) map <- data.frame(id=c("one","two","three"), new.week.born=c(23,19,24)) left_join(df, map, by="id") # id week.born new.week.born # 1 1 na 23 # 2 1 na 23 # ... # 16 2 na 19 # 17 2 na 19 # 18 2 na 19 # 19 2 na 19 # 20 2 na 19 # 21 3 na 24 # 22 3 na 24 # 23 3 na 24 # ...
see benchmark below.
library(microbenchmark) library(dplyr) # v 0.4.1 library(data.table) # v 1.9.5 df <- data.frame(id = rep(c("one", "two", "three"), each = 1e6)) df2 <- copy(df) map <- data.frame(id=c("one","two","three"), new.week.born=c(23,19,24)) dplyr_join <- function() { left_join(df, map, by="id") } r_merge <- function() { merge(df, map, by="id") } data.table_join <- function() { setkey(setdt(df2))[map] } unit: milliseconds expr min lq mean median uq max neval dplyr_join() 409.10635 476.6690 910.6446 489.4573 705.4021 2866.151 10 r_merge() 41589.32357 47376.0741 55719.1752 50133.0918 54636.3356 83562.931 10 data.table_join() 94.14621 132.3788 483.4220 225.3309 1051.7916 1416.946 10
Comments
Post a Comment