r - Removing duplicate columns in table when there are more than one duplicate set of columns -

- March 15, 2012

i see how handle case of removing duplicate columns when there 2 blocks of duplicates, in real data have 3 or more. i've tried come toy example data sets there duplicate set of column names collapse. there straightforward way of untangling these messes dplyr , tidyr?

easier case:

structure(list(x = c("a", "a", na, "a", "a", na, "a"), y = c(1,  5, na, 15, 19, na, 27), z = c(2, 6, na, 16, 20, na, 28), x.1 = c("b",  "b", "b", "b", "b", "b", "b"), y.1 = c(3, 7, 11, 17, 21, 23,  29), z.1 = c(4, 8, 12, 18, 22, 24, 30), x.2 = c(na, na, "a",  na, na, "a", na), y.2 = c(na, na, 13, na, na, 25, na), z.2 = c(na,  na, 14, na, na, 26, na)), .names = c("x", "y", "z", "x.1", "y.1",  "z.1", "x.2", "y.2", "z.2"), row.names = c(na, -7l), class = "data.frame")

this looks in r:

     x  y  z x.1 y.1 z.1  x.2 y.2 z.2 1     1  2   b   3   4 <na>  na  na 2     5  6   b   7   8 <na>  na  na 3 <na> na na   b  11  12     13  14 4    15 16   b  17  18 <na>  na  na 5    19 20   b  21  22 <na>  na  na 6 <na> na na   b  23  24     25  26 7    27 28   b  29  30 <na>  na  na

how should after dplyr:

  x  y  z x.1 y.1 z.1 1  1  2   b   3   4 2  5  6   b   7   8 3 13 14   b  11  12 4 15 16   b  17  18 5 19 20   b  21  22 6 25 26   b  23  24 7 27 28   b  29  30

harder case:

structure(list(x = c("a", "b", na, "a", "a", na, "a"), y = c(1,  7, 9, 15, 19, na, 27), z = c(2, 8, 10, 16, 20, na, 28), x.1 = c("b",  na, "b", "b", "b", "b", "b"), y.1 = c(3, na, 11, 17, 21, 23,  29), z.1 = c(4, na, 12, 18, 22, 24, 30), x.2 = c(na, "a", "a",  na, na, "a", na), y.2 = c(na, 5, 13, na, na, 25, na), z.2 = c(na,  6, 14, na, na, 26, na)), .names = c("x", "y", "z", "x.1", "y.1",  "z.1", "x.2", "y.2", "z.2"), row.names = c(na, -7l), class = "data.frame")

this looks in r:

     x  y  z  x.1 y.1 z.1  x.2 y.2 z.2 1     1  2    b   3   4 <na>  na  na 2    b  7  8 <na>  na  na      5   6 3 <na>  9 10    b  11  12     13  14 4    15 16    b  17  18 <na>  na  na 5    19 20    b  21  22 <na>  na  na 6 <na> na na    b  23  24     25  26 7    27 28    b  29  30 <na>  na  na

what should after dplyr:

  x  y  z x.1 y.1 z.1 1  1  2   b   3   4 2  5  6   b   7   8 3 13 14   b  11  12 4 15 16   b  17  18 5 19 20   b  21  22 6 25 26   b  23  24 7 27 28   b  29  30

in both cases output data frame should have 2 columns first , b second.

thanks help!

both cases simple indexing problems

fist case (the easy one)

indx <- is.na(df$x) df[indx, 1:3] <- df[indx, 7:9] df[1:6] #   x  y  z x.1 y.1 z.1 # 1  1  2   b   3   4 # 2  5  6   b   7   8 # 3 13 14   b  11  12 # 4 15 16   b  17  18 # 5 19 20   b  21  22 # 6 25 26   b  23  24 # 7 27 28   b  29  30

second case (the harder one)

indx <- 1:3 indx2 <- as.logical(rowsums(is.na(df2[indx + 3]))) indx3 <- as.logical(rowsums(is.na(df2[indx])))  df2[indx2, indx + 3] <- df2[indx2, indx] df2[indx3, indx] <- df2[indx3, indx + 6] df2[1:6] #   x  y  z x.1 y.1 z.1 # 1  1  2   b   3   4 # 2 b  7  8   b   7   8 # 3 13 14   b  11  12 # 4 15 16   b  17  18 # 5 19 20   b  21  22 # 6 25 26   b  23  24 # 7 27 28   b  29  30

Search This Blog

Shefl

r - Removing duplicate columns in table when there are more than one duplicate set of columns -

Comments

Post a Comment

Popular posts from this blog

SQL php on different pages to Insert (mysqli) -

php - How can an email be returned from Stripe Checkout? -

sql - Partition elimination in Greenplum -