0

I have the same problem as here but can't get my head around that!

My dataframe:

> str(scotland_weather)
'data.frame':   106 obs. of  24 variables:
 $ Jan    : num  294 292 276 252 246 ...
 $ Year.1 : int  1993 1928 2008 2015 1974 1975 2005 2007 1990 1983 ...
 $ Feb    : num  278 259 245 228 225 ...
 $ Year.2 : int  1990 1997 2002 1989 2014 1995 1998 2000 1920 1918 ...
 $ Mar    : num  238 233 201 200 180 ...
 $ Year.3 : int  1994 1990 1992 1967 1979 1989 1921 1913 2015 1978 ...
 $ Apr    : num  191 149 147 142 134 ...
 $ Year.4 : int  1947 1910 1934 1949 1950 1932 1927 1925 1977 1994 ...
 $ May    : num  191 169 156 150 137 ...
 $ Year.5 : int  2011 1986 1925 2015 2003 2007 1920 1916 1945 1954 ...
 $ Jun    : num  155 138 138 138 135 ...
 $ Year.6 : int  1938 2002 1948 1931 1966 2004 1980 1928 2012 1912 ...
 $ Jul    : num  186 181 170 166 163 ...
 $ Year.7 : int  1940 1988 1939 2010 1956 1985 1953 2009 2015 1957 ...
 $ Aug    : num  216 212 202 191 190 ...
 $ Year.8 : int  1985 1992 2009 1962 2014 2004 1948 1930 1910 1923 ...
 $ Sep    : num  268 221 194 190 190 ...
 $ Year.9 : int  1950 1981 1982 2011 1927 1985 1935 1980 1995 1978 ...
 $ Oct    : num  258 254 249 248 242 ...
 $ Year.10: int  1935 1954 2014 1938 1983 2001 2008 1917 1967 1916 ...
 $ Nov    : num  262 245 245 242 231 ...
 $ Year.11: int  2009 2015 1938 2006 1917 1981 1951 1986 1984 1982 ...
 $ Dec    : num  301 268 267 265 264 ...
 $ Year.12: int  2013 1986 1929 2011 2006 1912 1949 1999 1974 1966 ...

> head(scotland_weather)
    Jan Year.1   Feb Year.2   Mar Year.3   Apr Year.4   May Year.5   Jun Year.6   Jul Year.7   Aug Year.8   Sep Year.9   Oct Year.10   Nov Year.11   Dec Year.12
1 293.8   1993 278.1   1990 238.5   1994 191.1   1947 191.4   2011 155.0   1938 185.6   1940 216.5   1985 267.6   1950 258.1    1935 262.0    2009 300.7    2013
2 292.2   1928 258.8   1997 233.4   1990 149.0   1910 168.7   1986 137.9   2002 181.4   1988 211.9   1992 221.2   1981 254.0    1954 245.3    2015 268.5    1986
3 275.6   2008 244.7   2002 201.3   1992 146.8   1934 155.9   1925 137.8   1948 170.1   1939 202.3   2009 193.9   1982 248.8    2014 244.8    1938 267.2    1929
4 252.3   2015 227.9   1989 200.2   1967 142.1   1949 149.5   2015 137.7   1931 165.8   2010 191.4   1962 189.7   2011 247.7    1938 242.2    2006 265.4    2011
5 246.2   1974 224.9   2014 180.2   1979 133.5   1950 137.4   2003 135.0   1966 162.9   1956 190.3   2014 189.7   1927 242.3    1983 231.3    1917 264.0    2006
6 245.0   1975 195.6   1995 180.0   1989 132.9   1932 129.7   2007 131.7   2004 159.9   1985 189.1   2004 189.6   1985 240.9    2001 229.9    1981 261.0    1912
> tail(scotland_weather)
     Jan Year.1  Feb Year.2  Mar Year.3  Apr Year.4  May Year.5  Jun Year.6  Jul Year.7  Aug Year.8  Sep Year.9  Oct Year.10  Nov Year.11  Dec Year.12
101 71.2   1987 34.4   1947 50.9   1918 44.6   1982 34.1   1978 38.8   1940 49.2   2005 46.2   2003 50.7   2015 76.5    1973 57.1    1942 62.7    2010
102 57.9   1997 33.7   1917 44.4   1953 38.5   1918 32.1   1919 36.9   1932 47.8   1989 46.1   1983 49.6   1959 74.6    1922 54.9    1958 59.9    1963
103 57.9   1941 31.8   1963 39.7   1924 31.7   1981 28.8   1994 33.2   1921 45.8   1983 37.6   1955 48.5   1910 69.9    1972 53.9    1925 55.0    1995
104 57.6   1940 24.2   1930 38.8   1969 29.0   1938 26.1   2008 32.8   1925 39.7   1919 33.0   1995 40.0   1933 62.9    1914 53.6    1983 43.4    1927
105 51.9   1929 20.0   1986 37.4   1931 19.8   1980 24.0   1980 30.9   1941 33.7   1955 21.9   1976 39.2   2014 60.7    1951 42.3    1937 40.2    1933
106 38.6   1963 10.3   1932 28.7   1929 14.0   1974 22.5   1984 30.1   1988 32.7   1913  5.1   1947 31.7   1972 19.4    1946 28.8    1945   NA      NA

I want to bring all columns into rows...the result should be like:

month    year     rainfall_mm
Jan      1993      292.8
Feb      1990      278.1
Mar      1994      238.5 
....

I tried this:

scotland_weather%>%
                        gather(key, value)

This successfully brings all columns into rows. but they are not in the format as described above... everything is in two columns i.e.

key     value
Jan     293.8
Jan     292.2
Jan     275.6
...
Year.1  1993
Year.1  1928

UPDATE

> dput(head(scotland_weather,6))
structure(list(Jan = c(293.8, 292.2, 275.6, 252.3, 246.2, 245
), Year.1 = c(1993L, 1928L, 2008L, 2015L, 1974L, 1975L), Feb = c(278.1, 
258.8, 244.7, 227.9, 224.9, 195.6), Year.2 = c(1990L, 1997L, 
2002L, 1989L, 2014L, 1995L), Mar = c(238.5, 233.4, 201.3, 200.2, 
180.2, 180), Year.3 = c(1994L, 1990L, 1992L, 1967L, 1979L, 1989L
), Apr = c(191.1, 149, 146.8, 142.1, 133.5, 132.9), Year.4 = c(1947L, 
1910L, 1934L, 1949L, 1950L, 1932L), May = c(191.4, 168.7, 155.9, 
149.5, 137.4, 129.7), Year.5 = c(2011L, 1986L, 1925L, 2015L, 
2003L, 2007L), Jun = c(155, 137.9, 137.8, 137.7, 135, 131.7), 
    Year.6 = c(1938L, 2002L, 1948L, 1931L, 1966L, 2004L), Jul = c(185.6, 
    181.4, 170.1, 165.8, 162.9, 159.9), Year.7 = c(1940L, 1988L, 
    1939L, 2010L, 1956L, 1985L), Aug = c(216.5, 211.9, 202.3, 
    191.4, 190.3, 189.1), Year.8 = c(1985L, 1992L, 2009L, 1962L, 
    2014L, 2004L), Sep = c(267.6, 221.2, 193.9, 189.7, 189.7, 
    189.6), Year.9 = c(1950L, 1981L, 1982L, 2011L, 1927L, 1985L
    ), Oct = c(258.1, 254, 248.8, 247.7, 242.3, 240.9), Year.10 = c(1935L, 
    1954L, 2014L, 1938L, 1983L, 2001L), Nov = c(262, 245.3, 244.8, 
    242.2, 231.3, 229.9), Year.11 = c(2009L, 2015L, 1938L, 2006L, 
    1917L, 1981L), Dec = c(300.7, 268.5, 267.2, 265.4, 264, 261
    ), Year.12 = c(2013L, 1986L, 1929L, 2011L, 2006L, 1912L), 
    X1.12 = c(743.6, 649.5, 645.4, 638.3, 608.9, 592.8), Year.13 = c(2014L, 
    1995L, 2000L, 2007L, 1990L, 2015L), X1.13 = c(409.5, 401.3, 
    393.7, 393.2, 391.7, 389.1), Year.14 = c(1986L, 2015L, 1994L, 
    1967L, 1992L, 1913L), X1.14 = c(455.6, 435.6, 427.8, 422.6, 
    397, 390.1), Year.15 = c(1985L, 1948L, 2009L, 1956L, 2004L, 
    1938L), X1.15 = c(661.2, 633.8, 615.8, 594.5, 590.6, 589.2
    ), Year.16 = c(1981L, 1954L, 1938L, 1935L, 1982L, 2006L), 
    X1.16 = structure(c(105L, 104L, 103L, 102L, 101L, 100L), .Label = c("  1091.2", 
    "  1138.2", "  1158.2", "  1166.0", "  1168.8", "  1174.1", 
    "  1189.4", "  1214.2", "  1219.3", "  1220.0", "  1222.0", 
    "  1231.5", "  1239.5", "  1250.0", "  1255.4", "  1266.1", 
    "  1269.7", "  1274.2", "  1276.0", "  1281.1", "  1283.5", 
    "  1301.7", "  1305.4", "  1306.4", "  1311.0", "  1311.1", 
    "  1314.3", "  1315.8", "  1324.6", "  1325.3", "  1337.6", 
    "  1348.6", "  1351.5", "  1355.6", "  1356.1", "  1356.7", 
    "  1357.8", "  1366.9", "  1374.7", "  1376.5", "  1377.9", 
    "  1378.5", "  1390.2", "  1397.6", "  1406.7", "  1406.9", 
    "  1407.5", "  1407.9", "  1414.0", "  1425.3", "  1426.5", 
    "  1429.6", "  1430.8", "  1431.6", "  1436.4", "  1438.0", 
    "  1438.8", "  1445.9", "  1446.6", "  1448.6", "  1455.0", 
    "  1458.6", "  1459.0", "  1460.9", "  1461.3", "  1464.4", 
    "  1465.7", "  1466.4", "  1467.3", "  1473.9", "  1478.4", 
    "  1478.6", "  1491.3", "  1493.2", "  1503.9", "  1520.3", 
    "  1530.4", "  1532.5", "  1536.3", "  1558.0", "  1561.4", 
    "  1566.8", "  1579.2", "  1582.3", "  1585.0", "  1585.5", 
    "  1592.6", "  1607.8", "  1623.8", "  1627.8", "  1631.0", 
    "  1657.1", "  1670.7", "  1672.8", "  1683.6", "  1686.1", 
    "  1690.4", "  1692.9", "  1696.7", "  1716.5", "  1720.0", 
    "  1735.8", "  1756.8", "  1828.1", "  1886.4", "NA"), class = "factor"), 
    Year.17 = structure(c(102L, 81L, 105L, 29L, 99L, 45L), .Label = c("  1910", 
    "  1911", "  1912", "  1913", "  1914", "  1915", "  1916", 
    "  1917", "  1918", "  1919", "  1920", "  1921", "  1922", 
    "  1923", "  1924", "  1925", "  1926", "  1927", "  1928", 
    "  1929", "  1930", "  1931", "  1932", "  1933", "  1934", 
    "  1935", "  1936", "  1937", "  1938", "  1939", "  1940", 
    "  1941", "  1942", "  1943", "  1944", "  1945", "  1946", 
    "  1947", "  1948", "  1949", "  1950", "  1951", "  1952", 
    "  1953", "  1954", "  1955", "  1956", "  1957", "  1958", 
    "  1959", "  1960", "  1961", "  1962", "  1963", "  1964", 
    "  1965", "  1966", "  1967", "  1968", "  1969", "  1970", 
    "  1971", "  1972", "  1973", "  1974", "  1975", "  1976", 
    "  1977", "  1978", "  1979", "  1980", "  1981", "  1982", 
    "  1983", "  1984", "  1985", "  1986", "  1987", "  1988", 
    "  1989", "  1990", "  1991", "  1992", "  1993", "  1994", 
    "  1995", "  1996", "  1997", "  1998", "  1999", "  2000", 
    "  2001", "  2002", "  2003", "  2004", "  2005", "  2006", 
    "  2007", "  2008", "  2009", "  2010", "  2011", "  2012", 
    "  2013", "  2014", "NA"), class = "factor")), .Names = c("Jan", 
"Year.1", "Feb", "Year.2", "Mar", "Year.3", "Apr", "Year.4", 
"May", "Year.5", "Jun", "Year.6", "Jul", "Year.7", "Aug", "Year.8", 
"Sep", "Year.9", "Oct", "Year.10", "Nov", "Year.11", "Dec", "Year.12", 
"X1.12", "Year.13", "X1.13", "Year.14", "X1.14", "Year.15", "X1.15", 
"Year.16", "X1.16", "Year.17"), row.names = c(NA, 6L), class = "data.frame")

The problem is not only that I need to transform my data, I do also have the problem that years for first column is in the second, years in the second column may appear in other year type columns and so on...

Any ideas how to tame this beast?

Regards,

Community
  • 1
  • 1
Shery
  • 1,459
  • 2
  • 18
  • 40

1 Answers1

2

This is a strange data frame to start with. If I understand your data frame correctly, here is an method.

library(tidyr)

# minimun example data frame
scotland_weather <- data.frame(
    Jan = c(294, 292, 276),
    Year.1 = c(1993, 1928, 2008),
    Feb = c(278, 259, 245),
    Year.2 = c(1990, 1997, 2002)
)

# split it into two data frames
df1 <- scotland_weather[seq(1, 4, 2)]
month <- gather(df1, month, rainfall)

df2 <- scotland_weather[seq(2, 4, 2)]
year <- gather(df2, yearId, year)

# rbind to get the data frame
final <- cbind(year, month)
final$yearId <- NULL
GL_Li
  • 1,658
  • 1
  • 9
  • 24
  • It could be quick and dirty option ... do you think we can do it in a bit more efficient way? – Shery Dec 16 '15 at 21:30
  • I'd say that's pretty efficient, actually. The problem you have is you have a very unusual dataset, and this answer deals with it quite well. – Phil Dec 17 '15 at 10:58