r - Removing duplicate columns in table when there are more than one duplicate set of columns -
i see how handle case of removing duplicate columns when there 2 blocks of duplicates, in real data have 3 or more. i've tried come toy example data sets there duplicate set of column names collapse. there straightforward way of untangling these messes dplyr
, tidyr
?
easier case:
structure(list(x = c("a", "a", na, "a", "a", na, "a"), y = c(1, 5, na, 15, 19, na, 27), z = c(2, 6, na, 16, 20, na, 28), x.1 = c("b", "b", "b", "b", "b", "b", "b"), y.1 = c(3, 7, 11, 17, 21, 23, 29), z.1 = c(4, 8, 12, 18, 22, 24, 30), x.2 = c(na, na, "a", na, na, "a", na), y.2 = c(na, na, 13, na, na, 25, na), z.2 = c(na, na, 14, na, na, 26, na)), .names = c("x", "y", "z", "x.1", "y.1", "z.1", "x.2", "y.2", "z.2"), row.names = c(na, -7l), class = "data.frame")
this looks in r:
x y z x.1 y.1 z.1 x.2 y.2 z.2 1 1 2 b 3 4 <na> na na 2 5 6 b 7 8 <na> na na 3 <na> na na b 11 12 13 14 4 15 16 b 17 18 <na> na na 5 19 20 b 21 22 <na> na na 6 <na> na na b 23 24 25 26 7 27 28 b 29 30 <na> na na
how should after dplyr
:
x y z x.1 y.1 z.1 1 1 2 b 3 4 2 5 6 b 7 8 3 13 14 b 11 12 4 15 16 b 17 18 5 19 20 b 21 22 6 25 26 b 23 24 7 27 28 b 29 30
harder case:
structure(list(x = c("a", "b", na, "a", "a", na, "a"), y = c(1, 7, 9, 15, 19, na, 27), z = c(2, 8, 10, 16, 20, na, 28), x.1 = c("b", na, "b", "b", "b", "b", "b"), y.1 = c(3, na, 11, 17, 21, 23, 29), z.1 = c(4, na, 12, 18, 22, 24, 30), x.2 = c(na, "a", "a", na, na, "a", na), y.2 = c(na, 5, 13, na, na, 25, na), z.2 = c(na, 6, 14, na, na, 26, na)), .names = c("x", "y", "z", "x.1", "y.1", "z.1", "x.2", "y.2", "z.2"), row.names = c(na, -7l), class = "data.frame")
this looks in r:
x y z x.1 y.1 z.1 x.2 y.2 z.2 1 1 2 b 3 4 <na> na na 2 b 7 8 <na> na na 5 6 3 <na> 9 10 b 11 12 13 14 4 15 16 b 17 18 <na> na na 5 19 20 b 21 22 <na> na na 6 <na> na na b 23 24 25 26 7 27 28 b 29 30 <na> na na
what should after dplyr
:
x y z x.1 y.1 z.1 1 1 2 b 3 4 2 5 6 b 7 8 3 13 14 b 11 12 4 15 16 b 17 18 5 19 20 b 21 22 6 25 26 b 23 24 7 27 28 b 29 30
in both cases output data frame should have 2 columns first , b second.
thanks help!
both cases simple indexing problems
fist case (the easy one)
indx <- is.na(df$x) df[indx, 1:3] <- df[indx, 7:9] df[1:6] # x y z x.1 y.1 z.1 # 1 1 2 b 3 4 # 2 5 6 b 7 8 # 3 13 14 b 11 12 # 4 15 16 b 17 18 # 5 19 20 b 21 22 # 6 25 26 b 23 24 # 7 27 28 b 29 30
second case (the harder one)
indx <- 1:3 indx2 <- as.logical(rowsums(is.na(df2[indx + 3]))) indx3 <- as.logical(rowsums(is.na(df2[indx]))) df2[indx2, indx + 3] <- df2[indx2, indx] df2[indx3, indx] <- df2[indx3, indx + 6] df2[1:6] # x y z x.1 y.1 z.1 # 1 1 2 b 3 4 # 2 b 7 8 b 7 8 # 3 13 14 b 11 12 # 4 15 16 b 17 18 # 5 19 20 b 21 22 # 6 25 26 b 23 24 # 7 27 28 b 29 30
Comments
Post a Comment