\documentclass[a4paper,12pt]{article} \usepackage{setspace} \usepackage{float} \onehalfspacing \begin{document} <>= eom <- function(x, date.var = "date", actual = FALSE){ ## Takes an argument x that can be either a character vector of ## dates or a dateframe with date.var in it. If the former, it ## figures out all of last days of the month correspoding to each ## date. Allows us to figure out all the dates that we need from ## an input dataframe instead of calculating them ourselves. If ## the later, just return those dates as Dates. stopifnot(is.character(x) | is.data.frame(x)) ## Need to treat the two cases differently and use different error ## checking. character is easy. In both cases, the final vector ## will be dates. if(is.character(x)){ dates <- as.Date(x) } else{ ## All the work in the old eom goes here. stopifnot(date.var %in% names(x)) dates <- x[[date.var]] stopifnot(is(dates, "Date")) if(actual){ ## Case where I just want the actual last days in the file. dates <- sort(dates) months <- format(dates, "%m") months.lag <- c(months[-1], 13) dates <- dates[months != months.lag] } else{ ## Trick is to take the date, change it to the first of that ## month, and then add 40 days. The resulting date is ## guaranteed to be in the next month. Then create the first ## day of that next month. Then subtract 1 day. That day is ## guaranteed to be the last day of the month you started ## in. Hacky but effective. next.month <- as.Date(paste(format(dates, "%Y-%m"), "01", sep = "-")) + 40 dates <- as.Date(paste(format(next.month, "%Y-%m"), "01", sep = "-")) - 1 } } ## Whatever the source of dates, we now just want to return the ## unique values. invisible(unique(dates)) } ## Function for getting data for multiple ids and multiple years. grab.data <- function(symbols, years){ ## Function for getting data for multiple symbols and multiple years. ## @param symbols - character vector ## @param years - numeric vector ## all calculations are done using data from the ws.data package ## loads the package if it is not loaded already require(ws.data) ## Two approaches are possible. Combine all the big datasets into ## one before taking a subset (tough to do without lots of memory) ## or take the subset from each one separately and then ## combine. I'll do the second. ## Note that the tricky part is dealing with the first data ## frame. Once that is done, appending is easy. x <- NULL for(i in years){ ## iterate through each year in the year array, ## getting the subset of each year that contains symbols file.name <- paste("daily", i, sep = ".") data(list = file.name) ## Subset and append result. At some point, may just want to get ## the key variables that I care about while dropping things like ## volume. x <- rbind(x, subset(eval(parse(text = file.name)), symbol %in% symbols)) } ## Change date name, delete old variable and then return the final data frame. x$date <- x$v.date x$v.date <- NULL invisible(x) } calc.returns <- function(x, date = NULL, d.before = 30, d.after = 0, actual = FALSE){ ## Function for calculating returns over a given window for all ## stocks in dataframe x. date can be a character vector or it ## will to default to all the end-of-month days for all dates in ## x. d.before and d.after are calendar days. ## Need to ensure that all these variables are in the dataframe x stopifnot(all(c("symbol","date", "tret") %in% names(x))) ## Key step is that, instead of requiring the user to pass in a ## long list of dates, we are going to look in x, grab all the ## dates that are there, figure out the month end for all them, ## pick out the unique month ends, and then loop through ## those. Users can still pass in a date themselves. We want to ## retain that for testing. anchors <- eom(x, actual = actual) ## Now that we have our list of dates, we initialize the data ## frame that is going to store our results and then loop through ## all the dates in anchors, even if there is just one. We also ## need to name the output variable on the basis of the ## arguments. Again, we only allow for arguments in terms of ## calendar days, so the d suffix makes sense. out.name <- paste("ret", d.before, d.after, "d", sep = ".") res <- NULL for(i in 1:length(anchors)){ anchor <- anchors[i] ## Once we have a single date from anchors, we figure out what ## range we want around it. Later, we may think of ways to ## make this exact rather than approximate. start.date <- anchor - d.before end.date <- anchor + d.after ## Key trick is to throw away all the data that we don't ## need. aggregate works well as long as you are going to ## calculate all the data that is passed in, so we ensure that ## only such data is included. data.sub <- x[start.date < x$date & x$date <= end.date,] ## One of the biggest bothers will be dealing with ## endpoints. What happens in the first day when we do not ## have 30 days of data? We will worry about such details (and ## they do matter) later. But, for now, we should just skip ## any date which produces a subset with zero rows. if(nrow(data.sub) == 0){next()} ## Need to be careful about how you pass things around for two ## reasons. First, aggregate gets the names it uses as outputs ## from the inputs, so I set the names here. Second, as the ## dataframes get bigger, you want to think about how much ## data you are taking from step to step. sub.tret <- data.sub["tret"] names(sub.tret) <- out.name this.res <- aggregate(sub.tret, by = data.sub["symbol"], function(x){ prod(1 + x) -1 }) ## We need to add in a column for the specific date in order ## to organize the output date frame. How does the result look ## without this line? this.res$date <- anchor res <- rbind(this.res, res) } ## Having gone through all the dates and rbind'd things together, ## we just return. invisible(res) } ## The top three functions looks good. But we should test them! basic.test <- function(){ ## How can we be vaguely sure that these functions are working? ## Use a quick test case. require(ws.data) x <- grab.data(symbols = c("IBM", "MSFT"), years = 2006:2007) y <- calc.returns(x) subset(y, date == as.Date("2006-08-31")) ## Function should print out: ## symbol ret.30.0.d date ## 47 IBM 0.06767088 2006-08-31 ## 48 MSFT 0.07519583 2006-08-31 ## If you get anything else, something is wrong. } calculate1.returns <- function(x, end.date, date.range, date.var="v.date"){ ##terminate with error if the names attribute of x does not contain ##cols for "symbol" and date.var stopifnot(all(c("symbol",date.var, "tret") %in% names(x)))##changed to date.var ##convert from characters to a class Date object end.date <- as.Date(end.date) ##this way our measurements span date.range dates: (start.date , end.date] start.date <- end.date - date.range ##extract only the data that falls within our range x <- x[start.date < x[[date.var]] & x[[date.var]] <= end.date,] res <- aggregate(x["tret"],by=x["symbol"],function(x){prod(1+x)-1}) invisible(res) } calculate.returns2 <- function(x, symbols, dates, date.range){ ##terminate with error if the names attribute of x does not contain ##cols for "symbol" and date.var stopifnot(all(c("symbol", "v.date", "tret") %in% names(x)))##changed to date.var ##data frame to append to return.frame <- NULL for(date in dates){ return.frame <- rbind(return.frame, calculate.returns(x, date, date.range)) } invisible(return.frame) } calc.industry.returns <- function(x, ret.var){ stopifnot(all(c("symbol", "date") %in% names(x))) require(ws.data) data(secref) ##add industry data to x merged = merge(x, secref[c("m.ind", "symbol")]) ##cacluate the average returns by industry a <- aggregate(x = merged[ret.var] , by = list(Date = merged$date, Industry = merged$m.ind), FUN = "mean") ##remove excess columns ##a$date <- NULL ##a$m.ind <- NULL ##a$symbol <- NULL invisible(a) } @ <>= ## CREATE Z ## Compile the necessary data into z require(ws.data) data(secref) data(yearly) data(daily.1998) data(daily.2007) x <- grab.data(symbols = secref$symbol, years = 1998:2007) x <- subset(x, tret < 2) y.1 <- calc.returns(x, d.before = 30, d.after = 0, actual = TRUE) y.2 <- calc.returns(x, d.before = 0, d.after = 30, actual = TRUE) z <- merge(y.1, y.2) ## Calculate 6 month previous and forward returns. Bring data into z y.3 <- calc.returns(x, d.before = 182, d.after = 0, actual = TRUE) y.4 <- calc.returns(x, d.before = 0, d.after = 182, actual = TRUE) z$ret.182.0.d <- y.3$ret.182.0.d[match(paste(z$symbol, z$date), paste(y.3$symbol, y.3$date))] z$ret.0.182.d <- y.4$ret.0.182.d[match(paste(z$symbol, z$date), paste(y.4$symbol, y.4$date))] ## Remove the first 6-months and last 6-months of data to account for calc.returns function problem z <- z[z$date >= "1998-06-30" & z$date <= "2007-06-30",] ## Bring industry information into z z$industry <- secref$m.ind[match(z$symbol, secref$symbol)] ## Bring price information into z z$price <- x$price.unadj[match(paste(z$symbol, z$date), paste(x$symbol, x$date))] ## Bring in year z$year <- format(z$date, "%Y") ## Merge with yearly to get market cap and top 1500 data z <- merge(z, yearly) ## Remove rows that we do not need z <- subset(z, top.1500 & price > 5) ## bring average industry returns (30 days) into z ind.ret.30.d <- aggregate(z$ret.30.0.d, z[c("date", "industry")], mean, na.rm = TRUE) names(ind.ret.30.d) <- c("date", "industry", "ind.ret.30.d") z$ind.ret.30.d <- ind.ret.30.d$ind.ret.30.d[match(paste(z$date, z$industry), paste(ind.ret.30.d$date, ind.ret.30.d$industry))] ## bring average industry returns (182 days) into z ind.ret.182.d <- aggregate(z$ret.182.0.d, z[c("date", "industry")], mean, na.rm = TRUE) names(ind.ret.182.d) <- c("date", "industry", "ind.ret.182.d") z$ind.ret.182.d <- ind.ret.182.d$ind.ret.182.d[match(paste(z$date, z$industry), paste(ind.ret.182.d$date, ind.ret.182.d$industry))] ## bring industry portfolio membership data into z (30 days) ind.ret.30.d <- ind.ret.30.d[!is.na(ind.ret.30.d[["ind.ret.30.d"]]),] k <- ind.ret.30.d k <- k[order(k$date),] k$ind.rank.30 <- unlist(tapply(k$ind.ret.30.d, k$date, function(x){ breaks <- quantile(x, c(0, 0.3, 0.7, 1)) res <- cut(x, breaks = breaks, include.lowest = TRUE, labels = c("losers", "neutral", "winners")) res})) z$ind.rank.30 <- k$ind.rank.30[match(paste(z$date, z$industry), paste(k$date, k$industry))] ## bring industry portfolio membership data into z (182 days) ind.ret.182.d <- ind.ret.182.d[!is.na(ind.ret.182.d[["ind.ret.182.d"]]),] k2 <- ind.ret.182.d k2 <- k2[order(k2$date),] k2$ind.rank.182 <- unlist(tapply(k2$ind.ret.182.d, k2$date, function(x){ breaks <- quantile(x, c(0, 0.3, 0.7, 1)) res <- cut(x, breaks = breaks, include.lowest = TRUE, labels = c("losers", "neutral", "winners")) res})) z$ind.rank.182 <- k2$ind.rank.182[match(paste(z$date, z$industry), paste(k2$date, k2$industry))] ## bring industry size portfolio membership data into z avg.ind <- aggregate(z$symbol, z[c("date", "industry")], length) avg.ind$total <- avg.ind$"if (stringsAsFactors) factor(x) else x" avg.ind$"if (stringsAsFactors) factor(x) else x" <- NULL k3 <- avg.ind k3 <- k3[order(k3$date),] k3$ind.size.rank <- unlist(tapply(k3$total, k3$date, function(x){ breaks <- quantile(x, c(0, 0.3, 0.7, 1)) res <- cut(x, breaks = breaks, include.lowest = TRUE, labels = c("small", "medium", "large")) res})) z$ind.size.rank <- k3$ind.size.rank[match(paste(z$date, z$industry), paste(k3$date, k3$industry))] ## Convert rankings to characters strings for use in backtest z$ind.rank.30 <- as.character(z$ind.rank.30) z$ind.rank.182 <- as.character(z$ind.rank.182) z$ind.size.rank <- as.character(z$ind.size.rank) ## number of month end dates num.me.dates <- length(unique(z$date)) @ <>= data(daily.1998) data(daily.2007) min.date <- min(daily.1998$v.date) max.date <- max(daily.2007$v.date) num.stocks <- length(unique(z$symbol)) begin.date <- min(z$date) end.date <- max(z$date) num.ind <- length(unique(z$industry)) @ <>= ## TABLE 1 CALCULATIONS ## Calculate average number of stocks by industry for table 1 final <- tapply(avg.ind$total, avg.ind$industry, mean) ## Calculate average percent market cap for table 1 temp <- tapply(z$cap.usd, z$date, sum, na.rm=TRUE) sumcap <- aggregate(z$cap.usd, by = z[c("date", "industry")], sum, na.rm = TRUE) names(sumcap) <- c("date", "industry", "ind.cap") sumcap$tot.cap <- temp[match(as.character(sumcap$date), names(temp))] sumcap$perc.of.totcap <- sumcap$ind.cap/sumcap$tot.cap sumcapfinal <- tapply(sumcap$perc.of.totcap, sumcap$industry, mean) ## merge all information to form table 1 final.tbl <- as.data.frame(as.table(final)) sumcap.tbl <- as.data.frame(as.table(sumcapfinal)) names(final.tbl) <- c("industry", "avg.no.stocks") names(sumcap.tbl) <- c("industry", "avg.pct.mkt.cap") table1 <- merge(final.tbl, sumcap.tbl) ## calculate average industry returns and add to table 1 avg.ind.ret <- tapply(z$ret.30.0.d, z$industry, mean, na.rm = TRUE) avg.ind.ret.df <- as.data.frame(as.table(avg.ind.ret)) names(avg.ind.ret.df) <- c("industry", "avg.ind.ret") table1$avg.ind.ret <- avg.ind.ret.df$avg.ind.ret[match(table1$industry, avg.ind.ret.df$industry)] ## sorting table1 by average number of stocks and renaming rows table1.sorted <- table1[order(table1$avg.no.stocks, decreasing = TRUE),] row.names(table1.sorted) <- 1:nrow(table1.sorted) names(table1.sorted) <- c("Industry", "Avg. no. Stocks", "Avg. % Mkt. Cap.", "Avg. Ind. Ret") ## summary statistics meanstock <- round(mean(table1$avg.no.stock), digits = 3) meancap <- round(mean(table1$avg.pct.mkt.cap), digits = 3) meanret <- round(mean(table1$avg.ind.ret), digits = 3) @ <>= ## BACKTESTS ## Our next step would be to create subsets of our industry tables in order to run a pairwise backtest ind.winners.30 <- z[z$ind.rank.30 == "winners",] ind.losers.30 <- z[z$ind.rank.30 == "losers",] ind.neutral.30 <- z[z$ind.rank.30 == "neutral",] ind.winners.182 <- z[z$ind.rank.182 == "winners",] ind.losers.182 <- z[z$ind.rank.182 == "losers",] ind.neutral.182 <- z[z$ind.rank.182 == "neutral",] ## Our next step would be to create subsets of our industry size tables in order to run a pairwise backtest ind.large.30 <- z[z$ind.size.rank.30 == "large",] ind.small.30 <- z[z$ind.size.rank.30 == "small",] ind.medium.30 <- z[z$ind.size.rank.30 == "medium",] ind.large.182 <- z[z$ind.size.rank.182 == "large",] ind.small.182 <- z[z$ind.size.rank.182 == "small",] ind.medium.182 <- z[z$ind.size.rank.182 == "medium",] ## Here we run backtests on z for invidivual stock returns (30 days). library(backtest) bt.ret.30 <- backtest(z, id.var = "symbol", date.var = "date", in.var = "ret.30.0.d", ret.var = "ret.0.30.d", natural = TRUE, by.period = TRUE, buckets = 3) ## Here we run backtests on z for industry momentum (30 days) bt.ind.30 <- backtest(z, id.var = "symbol", date.var = "date", in.var = "ind.ret.30.d", ret.var = "ret.0.30.d", natural = TRUE, by.period = TRUE, buckets = 3) ## Here we run backtests on z for invidivual stock returns (182 days). bt.ret.182 <- backtest(z, id.var = "symbol", date.var = "date", in.var = "ret.182.0.d", ret.var = "ret.0.182.d", natural = TRUE, by.period = TRUE, buckets = 3) ## Here we run backtests on z for industry momentum (182 days) bt.ind.182 <- backtest(z, id.var = "symbol", date.var = "date", in.var = "ind.ret.182.d", ret.var = "ret.0.182.d", natural = TRUE, by.period = TRUE, buckets = 3) ## We run pairwise backtests on industry winner/loser/neutral portfolios pw.ind.ret.winners.30 <- backtest(ind.winners.30, id.var = "symbol", date.var = "date", in.var = "ret.30.0.d", ret.var = "ret.0.30.d", natural = TRUE, by.period = TRUE, buckets = 3) pw.ind.ret.losers.30 <- backtest(ind.losers.30, id.var = "symbol", date.var = "date", in.var = "ret.30.0.d", ret.var = "ret.0.30.d", natural = TRUE, by.period = TRUE, buckets = 3) pw.ind.ret.neutral.30 <- backtest(ind.neutral.30, id.var = "symbol", date.var = "date", in.var = "ret.30.0.d", ret.var = "ret.0.30.d", natural = TRUE, by.period = TRUE, buckets = 3) pw.ind.ret.winners.182 <- backtest(ind.winners.182, id.var = "symbol", date.var = "date", in.var = "ret.182.0.d", ret.var = "ret.0.182.d", natural = TRUE, by.period = TRUE, buckets = 3) pw.ind.ret.losers.182 <- backtest(ind.losers.182, id.var = "symbol", date.var = "date", in.var = "ret.182.0.d", ret.var = "ret.0.182.d", natural = TRUE, by.period = TRUE, buckets = 3) pw.ind.ret.neutral.182 <- backtest(ind.neutral.182, id.var = "symbol", date.var = "date", in.var = "ret.182.0.d", ret.var = "ret.0.182.d", natural = TRUE, by.period = TRUE, buckets = 3) ## We run pairwise backtests on industry size portfolios pw.large.30 <- backtest(ind.large.30, id.var = "symbol", date.var = "date", in.var = "ind.ret.30.d", ret.var = "ret.0.30.d", natural = TRUE, by.period = TRUE, buckets = 3) pw.small.30 <- backtest(ind.small.30, id.var = "symbol", date.var = "date", in.var = "ind.ret.30.d", ret.var = "ret.0.30.d", natural = TRUE, by.period = TRUE, buckets = 3) pw.medium.30 <- backtest(ind.medium.30, id.var = "symbol", date.var = "date", in.var = "ind.ret.30.d", ret.var = "ret.0.30.d", natural = TRUE, by.period = TRUE, buckets = 3) pw.large.182 <- backtest(ind.large.182, id.var = "symbol", date.var = "date", in.var = "ind.ret.182.d", ret.var = "ret.0.182.d", natural = TRUE, by.period = TRUE, buckets = 3) pw.small.182 <- backtest(ind.small.182, id.var = "symbol", date.var = "date", in.var = "ind.ret.182.d", ret.var = "ret.0.182.d", natural = TRUE, by.period = TRUE, buckets = 3) pw.medium.182 <- backtest(ind.medium.182, id.var = "symbol", date.var = "date", in.var = "ind.ret.182.d", ret.var = "ret.0.182.d", natural = TRUE, by.period = TRUE, buckets = 3) ## Backtest to compare industry vs. individual stock momentum strategy (182 days) bt.ind.vs.ret.182 <- backtest(z, id.var = "symbol", date.var = "date", in.var = c("ret.182.0.d", "ind.ret.182.d"), ret.var = "ret.0.182.d", natural = TRUE, by.period = TRUE, buckets = 3) ## Here we create objects for the summary stats of different backtests (stock vs. ind momentum, 30 vs. 182 days) stats.bt.ret.30 <- summaryStats(bt.ret.30) stats.bt.ind.30 <- summaryStats(bt.ind.30) stats.bt.ret.182 <- summaryStats(bt.ret.182) stats.bt.ind.182 <- summaryStats(bt.ind.182) ## Here we create objects for the summary stats of different backtests (stock vs. ind momentum, 30 vs. 182 days) stats.w.30 <- summaryStats(pw.ind.ret.winners.30) stats.l.30 <- summaryStats(pw.ind.ret.losers.30) stats.n.30 <- summaryStats(pw.ind.ret.neutral.30) stats.w.182 <- summaryStats(pw.ind.ret.winners.182) stats.l.182 <- summaryStats(pw.ind.ret.losers.182) stats.n.182 <- summaryStats(pw.ind.ret.neutral.182) ## Here we create objects for the summary stats of industry size backtests stats.large.30 <- summaryStats(pw.large.30) stats.small.30 <- summaryStats(pw.small.30) stats.medium.30 <- summaryStats(pw.medium.30) stats.large.182 <- summaryStats(pw.large.182) stats.small.182 <- summaryStats(pw.small.182) stats.medium.182 <- summaryStats(pw.medium.182) ## We form objects that reference the mean spread between the winner and loser portfolios, average return of our loser portfolios, and average return of our winner portfolios. bt.ret.30.meanspread <- stats.bt.ret.30[nrow(stats.bt.ret.30), names(stats.bt.ret.30) == "spread"] bt.ret.30.avgloser <- stats.bt.ret.30[nrow(stats.bt.ret.30), names(stats.bt.ret.30) == "low"] bt.ret.30.avgwinner <- stats.bt.ret.30[nrow(stats.bt.ret.30), names(stats.bt.ret.30) == "high"] bt.ind.30.meanspread <- stats.bt.ind.30[nrow(stats.bt.ind.30), names(stats.bt.ind.30) == "spread"] bt.ind.30.avgloser <- stats.bt.ind.30[nrow(stats.bt.ind.30), names(stats.bt.ind.30) == "low"] bt.ind.30.avgwinner <- stats.bt.ind.30[nrow(stats.bt.ind.30), names(stats.bt.ind.30) == "high"] bt.ret.182.meanspread <- stats.bt.ret.182[nrow(stats.bt.ret.182), names(stats.bt.ret.182) == "spread"] bt.ret.182.avgloser <- stats.bt.ret.182[nrow(stats.bt.ret.182), names(stats.bt.ret.182) == "low"] bt.ret.182.avgwinner <- stats.bt.ret.182[nrow(stats.bt.ret.182), names(stats.bt.ret.182) == "high"] bt.ind.182.meanspread <- stats.bt.ind.182[nrow(stats.bt.ind.182), names(stats.bt.ind.182) == "spread"] bt.ind.182.avgloser <- stats.bt.ind.182[nrow(stats.bt.ind.182), names(stats.bt.ind.182) == "low"] bt.ind.182.avgwinner <- stats.bt.ind.182[nrow(stats.bt.ind.182), names(stats.bt.ind.182) == "high"] ## We form objects that reference the mean spread between the winner and loser portfolios, average return of our loser portfolios, and average return of our winner portfolios within our various industry winner/loser/neutral portfolios. pw.w.ret.30.meanspread <- stats.w.30[nrow(stats.w.30), names(stats.w.30) == "spread"] pw.w.ret.30.avgloser <- stats.w.30[nrow(stats.w.30), names(stats.w.30) == "low"] pw.w.ret.30.avgwinner <- stats.w.30[nrow(stats.w.30), names(stats.w.30) == "high"] pw.l.ret.30.meanspread <- stats.l.30[nrow(stats.l.30), names(stats.l.30) == "spread"] pw.l.ret.30.avgloser <- stats.l.30[nrow(stats.l.30), names(stats.l.30) == "low"] pw.l.ret.30.avgwinner <- stats.l.30[nrow(stats.l.30), names(stats.l.30) == "high"] pw.n.ret.30.meanspread <- stats.n.30[nrow(stats.n.30), names(stats.n.30) == "spread"] pw.n.ret.30.avgloser <- stats.n.30[nrow(stats.n.30), names(stats.n.30) == "low"] pw.n.ret.30.avgwinner <- stats.n.30[nrow(stats.n.30), names(stats.n.30) == "high"] pw.w.ret.182.meanspread <- stats.w.182[nrow(stats.w.182), names(stats.w.182) == "spread"] pw.w.ret.182.avgloser <- stats.w.182[nrow(stats.w.182), names(stats.w.182) == "low"] pw.w.ret.182.avgwinner <- stats.w.182[nrow(stats.w.182), names(stats.w.182) == "high"] pw.l.ret.182.meanspread <- stats.l.182[nrow(stats.l.182), names(stats.l.182) == "spread"] pw.l.ret.182.avgloser <- stats.l.182[nrow(stats.l.182), names(stats.l.182) == "low"] pw.l.ret.182.avgwinner <- stats.l.182[nrow(stats.l.182), names(stats.l.182) == "high"] pw.n.ret.182.meanspread <- stats.n.182[nrow(stats.n.182), names(stats.n.182) == "spread"] pw.n.ret.182.avgloser <- stats.n.182[nrow(stats.n.182), names(stats.n.182) == "low"] pw.n.ret.182.avgwinner <- stats.n.182[nrow(stats.n.182), names(stats.n.182) == "high"] ## We form objects that reference the mean spread between the winner and loser portfolios of our industry size portfolios and average returns of our portfolios within our various industry size winner/loser/neutral portfolios. pw.large.30.meanspread <- stats.large.30[nrow(stats.large.30), names(stats.large.30) == "spread"] pw.large.30.avgloser <- stats.large.30[nrow(stats.large.30), names(stats.large.30) == "low"] pw.large.30.avgwinner <- stats.large.30[nrow(stats.large.30), names(stats.large.30) == "high"] pw.small.30.meanspread <- stats.small.30[nrow(stats.small.30), names(stats.small.30) == "spread"] pw.small.30.avgloser <- stats.small.30[nrow(stats.small.30), names(stats.small.30) == "low"] pw.small.30.avgwinner <- stats.small.30[nrow(stats.small.30), names(stats.small.30) == "high"] pw.medium.30.meanspread <- stats.medium.30[nrow(stats.medium.30), names(stats.medium.30) == "spread"] pw.medium.30.avgloser <- stats.medium.30[nrow(stats.medium.30), names(stats.medium.30) == "low"] pw.medium.30.avgwinner <- stats.medium.30[nrow(stats.medium.30), names(stats.medium.30) == "high"] pw.large.182.meanspread <- stats.large.182[nrow(stats.large.182), names(stats.large.182) == "spread"] pw.large.182.avgloser <- stats.large.182[nrow(stats.large.182), names(stats.large.182) == "low"] pw.large.182.avgwinner <- stats.large.182[nrow(stats.large.182), names(stats.large.182) == "high"] pw.small.182.meanspread <- stats.small.182[nrow(stats.small.182), names(stats.small.182) == "spread"] pw.small.182.avgloser <- stats.small.182[nrow(stats.small.182), names(stats.small.182) == "low"] pw.small.182.avgwinner <- stats.small.182[nrow(stats.small.182), names(stats.small.182) == "high"] pw.medium.182.meanspread <- stats.medium.182[nrow(stats.medium.182), names(stats.medium.182) == "spread"] pw.medium.182.avgloser <- stats.medium.182[nrow(stats.medium.182), names(stats.medium.182) == "low"] pw.medium.182.avgwinner <- stats.medium.182[nrow(stats.medium.182), names(stats.medium.182) == "high"] @ \title{Do Industries Explain Momentum? - A Replication of Moskowitz and Grinblatt 2004 \footnotemark[1]\footnotetext[1]{Paul Fraulo (10plf@williams.edu) and Jimmy Nguyen (jpn1@williams.edu) are students at Williams College. We thank Professor David Kane and teaching fellow David E. Phillips. We also thank Bill Jannen and Vincent Pham for providing valuable feedback and assistance. The code used to replicate the results in the paper was written in R and is available from the authors \cite{R}. Contact David Kane at dave@kanecap.com for access to the data.}} \author{Paul Fraulo and Jimmy Nguyen} \maketitle \newpage \begin{abstract} We replicate Moskowitz and Grinblatt's (1999) findings on industry momentum using data for US large cap stocks between 1998 and 2007. Moskowitz and Grinblatt find that industry momentum strategies appear to be highly profitable. Moskowitz and Grinblatt also demonstrate that individual stock momentum strategies, which buy past winning stocks and sell past losing stocks, are less profitable after controlling for industry momentum \cite{IndustryMomentum}. After replicating Moskowitz and Grinblatt's results, we find that industry momentum strategies do provide greater returns than individual stock momentum strategies. We also find that the optimal time horizon of an industry momentum strategy is uncorrelated with the size of the industries the strategy is applied to. \end{abstract} \section{Introduction} Jegadeesh and Titman (1993) demonstrate that stocks which have performed well (poorly) over the last few months continue to do well (poorly) over the following months, and label this occurrence the ``momentum effect''. This momentum effect, originally shown by Jegadeesh and Titman (1993) \cite{jt} has spurred numerous subsequent attempts to further our understanding of what creates this arbitrage opportunity in the market, such as George and Hwang (2004) \cite{52weekhigh} who showed the relationship between the 52-week high and momentum as well as Han and Grinblatt (2002) \cite{disposition} who demonstrated the disposition effect as it relates to momentum. Moskowitz and Grinblatt (1999) (hereafter MG) \cite{IndustryMomentum} attempted to show that the momentum effect, originally demonstrated by Jegadeesh and Titman (hereafter JT), appears to be stronger when viewed by industry rather than by individual stocks. Although Grundy and Martin (2001) \cite{portfoliochoice} claim that industry effects are not the primary cause of the momentum phenomenon, we reproduce MG's finding and show that industry momentum accounts for much of the observed individual stock momentum. We also find that an industry momentum strategy is more effective than an individual stock momentum strategy, producing an average of $19\%$ returns annually (ignoring transaction costs). Our data covers a time span from 1998 through 2007. It contains data of the top 1500 securities for each year by market capitalization, which in total is \Sexpr{num.stocks} securities from \Sexpr{num.ind} industries (we use S\&P industry classifications). It is important to note this aspect of our study is different from MG, who included less liquid stocks and only divided their securities into 20 industries. We sort the returns of individual stocks into three equal sized groups: winners, losers, and a neutral group. We then go long the winner portfolio and sell short the loser portfolio. Each month, we form our portfolios by looking at the past returns for 6 months (1 month) and hold those positions for the following 6 months (1 month). We term these strategies (6, 6) and (1, 1) respectively Using our data set we find average spreads for a (1, 1) and a (6, 6) individual stock momentum strategy, industry momentum strategy, and individual stock momentum strategy controlling for industry momentum. Our results indicate that an industry momentum strategy was more effective than an individual stock momentum strategy over both a 6 month and 1 month horizon, with the highest average spread of \Sexpr{round(bt.ind.182.meanspread, digits = 3)} coming from a (6, 6) industry momentum strategy compared to the \Sexpr{round(bt.ret.182.meanspread, digits = 3)} spread of our (6, 6) individual stock momentum strategy. As an extension to MG's original industry momentum paper, we demonstrate the effect of industry size (by market cap) on the ideal time horizon of the momentum strategy and find that the optimal time horizons of our industry momentum strategy seems to have no correlation with the size of the industries (measured by number of stocks they contain). It is interesting to note, however, the continued prevalence of this momentum effect despite using a data set that has no time overlap with the data used in MG's original paper which cover dates between 1963 to 1995. \section{Data and Methods} <>= ## We use xtable to generate the results for table 1. However, xtable does not allow enough customization, so we manually create the table, using results generated from xtable. ## library(xtable) ## xtable(table1.sorted[1:20,], caption = "Summary Statistics by Industry", label = NULL, digits = c(0, 0, 2, 4, 4), table.placement = "h", caption.placement = "t") @ Our data includes \Sexpr{num.stocks} securities over a 10 year period between \Sexpr{begin.date} and \Sexpr{end.date}. The data include \Sexpr{num.ind} industries classified by the Global Industry Classification Standard (GICS). In order to be included in a portfolio for a given date, a stock must have been in the top 1500 stocks by market capitalization as of December 31 of the previous year. For example, if a stock is classified as top 1500 at the end of a particular year, that status is recognized in the following year. Stocks must also trade above \$5 on the last trade date of the month. Stocks that trade below \$5 on the last trade date of a particular month are either data errors or stocks that we would not want to include in our portfolios. We calculate previous and forward returns (both 1-month and 6-month) of each security for \Sexpr{num.me.dates} month-end dates between June 1998 and June 2007. The 1-month and 6-month industry returns are equal-weighted averages of all stocks in a particular industry for each month. We calculate returns using all available data. For example, a 2-month old company on a particular date will have 1-month previous returns, but 6-month previous returns calculated using 2-months of data. Since our daily returns data begins on \Sexpr{min.date}, we have 6-month previous returns data beginning \Sexpr{begin.date}. Likewise, since our daily returns data ends on \Sexpr{max.date}, we have 6-month forward returns until \Sexpr{end.date}. Thus, in order to improve the accuracy of our calculations, we only use 1-month and 6-month returns data for June 1998 through June 2007. Table I displays the average number of stocks, average percentage of total market capitalization, and average returns for the top 20 industry portfolios by average number of stocks. The average number of stocks for all industries is \Sexpr{meanstock}. The average percentage of total market capitalization for all industries is \Sexpr{meancap}. Industry portfolios are formed monthly, from \Sexpr{begin.date} and \Sexpr{end.date}. Our (1, 1) (and (6, 6)) momentum strategy looks at the previous 1-month (6-month) returns and forms winner, loser, and neutral portfolios each month. We take long positions on the securities in the winner portfolio and short positions on the securities in the loser portfolio, holding those positions for 1 month (6 months). Winner (loser) industry portfolios include all stocks in the top (bottom) third performing industries. Winner (loser) individual stock momentum portfolios include the top (bottom) third performing stocks. Since both strategies are self-financing and involve equal-weighted long-short portfolios, the return of a given strategy for a particular month is measured by the spread between the winner and loser portfolios of that month. The average spread of a strategy, calculated over all the months in the data, gives the best indication of the effectiveness of that strategy. \begin{table}[t!] \small \begin{center} \caption{{\small{\bf{Summary Statistics for the top 20 Industries by Average Number of Stocks}} \newline This is a partial replication of MG's Table I on page 1254 \cite{IndustryMomentum}. Below, we report the average number of stocks, average percentage of total market capitalization, and average returns for the top 20 industry portfolios by average number of stocks. There are a total of 69 industry portfolios that are formed monthly, from 1998-06-30 to 2007-06-29, using US large cap stocks that are in the top 1500 stocks by market capitalization. Market capitalization is measured as of December 31 of the previous year. Average industry returns take the average 1-month return of all stocks in a particular industry over all dates.}} \begin{tabular}{rlrrr} \hline & Industry & Avg. no. Stocks & Avg. \% Mkt. Cap. & Avg. Ind. Ret \\ \hline 1 & BANKS & 76.91 & 0.0447 & 0.0101 \\ 2 & REITS & 69.96 & 0.0147 & 0.0154 \\ 3 & MEDIA & 63.09 & 0.0505 & 0.0157 \\ 4 & INSUR & 60.11 & 0.0485 & 0.0119 \\ 5 & OILGS & 56.57 & 0.0563 & 0.0249 \\ 6 & SOFTW & 52.20 & 0.0468 & 0.0423 \\ 7 & SPRET & 45.91 & 0.0227 & 0.0231 \\ 8 & SEMIS & 44.84 & 0.0326 & 0.0335 \\ 9 & HEPSV & 43.78 & 0.0199 & 0.0234 \\ 10 & HOTEL & 37.13 & 0.0153 & 0.0193 \\ 11 & CHEMS & 35.74 & 0.0155 & 0.0155 \\ 12 & COMSS & 34.33 & 0.0083 & 0.0178 \\ 13 & ITCON & 33.68 & 0.0142 & 0.0218 \\ 14 & MACHN & 32.96 & 0.0121 & 0.0186 \\ 15 & COMEQ & 31.80 & 0.0337 & 0.0435 \\ 16 & CPMKT & 31.75 & 0.0277 & 0.0207 \\ 17 & HEQSP & 31.55 & 0.0166 & 0.0239 \\ 18 & ENEQS & 30.59 & 0.0128 & 0.0229 \\ 19 & BIOTC & 29.21 & 0.0166 & 0.0422 \\ 20 & ELUTL & 28.90 & 0.0155 & 0.0114 \\ \hline \end{tabular} \end{center} \end{table} In comparing the efficiency of an industry momentum strategy to an individual stock momentum strategy, we simply modify the criteria by which returns are measured. For the individual stock momentum strategy, stocks are ranked based on their own individual returns as seen in JT's paper. For the industry momentum strategy, however, stocks are ranked based on the average returns of the industry to which that stock belongs. \section{Results} We find that our (6, 6) individual stock momentum strategy as well as our industry momentum strategy result in significant average monthly returns of \Sexpr{round((bt.ret.182.meanspread/6), digits = 3)} and \Sexpr{round((bt.ind.182.meanspread/6), digits = 3)} respectively \footnotemark[1]\footnotetext[1]{We use the backtest package to determine the average returns for each strategy.}. Our (1, 1) industry momentum strategy resulted in \Sexpr{round(bt.ind.30.meanspread, digits = 3)} average monthly returns while the (1, 1) individual stock strategy gave us an average of \Sexpr{round(bt.ret.30.meanspread, digits = 3)} per month. The industry momentum strategy is more effective than the individual stock momentum strategy. Furthermore, a 6-month horizon captures a greater amount of the momentum effect than the 1-month horizon. These results are summarized in Table 2. In Table 3 we demonstrate using pairwise backtests that industry returns explain (are correlated with) much of the individual stock momentum. Although we find similar (6, 6) results in our industry winner portfolio as our original individual stock momentum test, the remaining spreads reported in table 4 demonstrate significantly diminished returns. We report the results of our extension in Table 4. We find that industry size has negligible correlation with the ideal investment horizon. The industry momentum strategy appears to be approximately equally effective in all three of our industry size subsets. However, it is difficult to measure the effect using only two data points. For instance, it is possible that the ideal horizon is 7 months for small industries, 6 months for medium industries, and 5 months for large industries. In this particular case, the 6-month strategy would be approximately equally effective for both the large and small subsets. This explanation is supported by the fact that the medium size industry portfolio exhibited the highest average returns. It is also possible that our strategy of forming subsets is sub-optimal. For future research, it would be interesting to look for a size (by market cap) effect of individual stocks. If the size effect presents itself in this fashion, it industry size should not show any correlation with optimal time-horizons since large/small/medium industries as we've defined them should have a random distribution of large/small/medium individual securities. \begin{table}[h!] \small \begin{center} \caption{{\small {\bf Average Spreads for Individual Stock Momentum and Industry Momentum Strategies} \newline This is a partial replication of MG's Table II (page 1261) \cite{IndustryMomentum}. Individual stock winner (loser) portfolios include the top (bottom) third performing stocks by average monthly returns. Industry winner (loser) portfolios include all stocks in the top (bottom) third performing industries by monthly average returns. Panel A and Panel B report the average monthly returns from 1998-06-30 through 2007-06-29 for (1, 1) and (6, 6) individual stock momentum strategies and industry momentum strategies. We go long the winner portfolios and sell short the loser portfolios. We take the difference between average winner and loser portfolio returns and report them here as ``spreads''. All portfolios are equal-weighted. The returns for our (6, 6) strategy are \emph{monthly} returns.}} \centering \begin{tabular}[h!]{l l l} \hline & Panel A - Individual Stock Momentum &\\ \hline & (1, 1) & (6, 6)\\ Winner & 0.017 & 0.024\\ Loser & 0.021 & 0.015\\ Spread & -0.0036 & 0.0093\\ \hline & Panel B - Industry Momentum &\\ \hline & (1, 1) & (6, 6)\\ Winner & 0.023 & 0.027\\ Loser & 0.013 & 0.011\\ Spread & 0.010 & 0.016\\ \hline \end{tabular} \label{table2} \end{center} \end{table} \begin{table}[p] \small \begin{center} \caption{{\small {\bf Pairwise Comparisons of Individual Stock Momentum and Industry Momentum Strategies} \newline This is a partial replication of MG's Table III (page 1270) \cite{IndustryMomentum}. Individual stock winner (loser) portfolios include the top (bottom) third performing stocks by average monthly returns. Industry winner (loser) portfolios include all stocks in the top (bottom third) performing industries by average monthly return. Panel A, B, and C report excess individual stock momentum \emph{within} industry winner, loser, and neutral portfolios. We go long the winner portfolios and sell short the loser portfolios. We take the difference between average winner and loser portfolio returns and report them here as ``spreads''. All portfolios are equal-weighted. The returns for our (6, 6) strategy are \emph{monthly} returns.}} \centering \begin{tabular}[p]{l l l l} \hline & Panel A - Excess Momentum for Industry Winner Portfolio &\\ \hline & (1, 1) & (6, 6)\\ Winner & 0.022 & 0.032\\ Loser & 0.026 & 0.023\\ Spread & -0.0038 & 0.0093\\ \hline & Panel B - Excess Momentum for Industry Loser Portfolio \\ \hline & (1, 1) & (6, 6)&\\ Winner & 0.011 & 0.011\\ Loser & 0.018 & 0.011\\ Spread & -0.0077 & 0.00032\\ \hline & Panel C - Excess Momentum for Industry Neutral Portfolio \\ \hline & (1, 1) & (6, 6)&\\ Winner & 0.013 & 0.018\\ Loser & 0.02 & 0.016\\ Spread & -0.0065 & 0.002\\ \hline \end{tabular} \label{table3} \end{center} \end{table} \begin{figure}[p] \centering \caption{ {\small {\bf Comparison of (6, 6) Industry vs. Individual Stock Momentum Strategies} \newline The figure below displays the cumulative spread return and cumulative quantile return for a (6, 6) industry momentum and individual stock momentum strategy over a 10 year span between 1998 and 2007. Spread return is the difference between average high and low quantile returns. The three quantiles are calculated each month based on 6-month industry and individual stock returns. The quantiles also represent equal weighted portfolios with long positions in the high quantile and short positions in the stocks in the low quantile. The portfolios are rebalanced each month.}} \vspace{.1in} \includegraphics{draft4test-007} \label{fig:spread} \end{figure} \newpage \section{Extension} George and Hwang (2004) \cite{52weekhigh} show momentum effects with a 52-week high strategy implemented in a similar fashion to the Industry momentum strategy. Their paper includes an argument for why momentum effects are observed based on the concept of cognitive bias. The idea is that investors act irrationally, expressing a reluctance to purchase stocks near their 52-week high even if the fundamentals of the security indicate the stock is undervalued by the market. The effect is, on average, stocks near their 52-week high are trading at a price below their efficient market price. Eventually we see the market overcomes this inefficiency and the fundamentals or the ``news'' is reflected in the stock's price. It is this period of inefficiency that creates the arbitrage opportunity for momentum investors to buy the stock in the interim (between the time that the fundamentals move and the market price of the stock moves), in order to capitalize on the systematic irrational behavior of the other investors in the market. We hypothesized that larger industries (measured by market capitalization) have a relatively shorter interim period between when the news enters the market and when the market price reflects the news. Specifically, the higher liquidity and ``attention'' given to these larger industries allows them to overcome market inefficiency relatively faster. Since more capital in the industry means that there probably are more investors in that industry, we would expect there to be a higher frequency of fundamental reevaluations of that industry and therefore a faster increase in the demand for the stocks in that industry relative to a smaller cap industry. We tested for this effect by grouping our industries into 3 categories by market capitalization: large, medium, and small. We then ran our (6, 6) and (1, 1) industry momentum strategies on each group, however we found no correlation between industry size and optimal time horizon of our industry momentum strategy. \begin{table}[p] \begin{center} \caption{{\small {\bf Pairwise Comparisons of Industry Momentum Strategies Within Industry Size Portfolios} \newline This is a partial replication of MG's Table III (page 1270) \cite{IndustryMomentum}. Industry winner (loser) portfolios include all stocks in the top (bottom third) performing industries by average monthly return. Industry size portfolios are created based on the average number of stocks in a particular industry. Buckets are formed to create balanced portfolios of stocks. Our (1, 1) strategy calculates the average number of stocks on a monthly basis. Our (6, 6) strategy uses the average number of stocks over all dates. Large industries are the top $11.5\%$ of industries by average number of stocks. Small industries are the bottom $70\%$ of industries by average number of stocks. Medium industries are the remaining industries. Panel A, B, and C report industry momentum \emph{within} large, small, and medium industry size portfolios. We go long the winner portfolios and sell short the loser portfolios. We take the difference between average winner and loser portfolio returns and report them here as ``spreads''. All portfolios are equal-weighted. The returns for our (6, 6) strategy are \emph{monthly} returns.}} \centering \begin{tabular}[t]{l l l l} \hline & Panel A - Industry Momentum for Large Industries &\\ \hline & (1, 1) & (6, 6)\\ Winner & \Sexpr{format(pw.large.30.avgwinner, digits = 2)} & \Sexpr{format((pw.large.182.avgwinner/6), digits = 2)}\\ Loser & \Sexpr{format(pw.large.30.avgloser, digits = 2)} & \Sexpr{format((pw.large.182.avgloser/6), digits = 2)}\\ Spread & \Sexpr{format(pw.large.30.meanspread, digits = 2)} & \Sexpr{format((pw.large.182.meanspread/6), digits = 2)}\\ \hline & Panel B - Industry Momentum for Small Industries \\ \hline & (1, 1) & (6, 6)&\\ Winner & \Sexpr{format(pw.small.30.avgwinner, digits = 2)} & \Sexpr{format((pw.small.182.avgwinner/6), digits = 2)}\\ Loser & \Sexpr{format(pw.small.30.avgloser, digits = 2)} & \Sexpr{format((pw.small.182.avgloser/6), digits = 2)}\\ Spread & \Sexpr{format(pw.small.30.meanspread, digits = 2)} & \Sexpr{format((pw.small.182.meanspread/6), digits = 2)}\\ \hline & Panel C - Industry Momentum for Medium Industries \\ \hline & (1, 1) & (6, 6)&\\ Winner & \Sexpr{format(pw.medium.30.avgwinner, digits = 2)} & \Sexpr{format((pw.medium.182.avgwinner/6), digits = 2)}\\ Loser & \Sexpr{format(pw.medium.30.avgloser, digits = 2)} & \Sexpr{format((pw.medium.182.avgloser/6), digits = 2)}\\ Spread & \Sexpr{format(pw.medium.30.meanspread, digits = 2)} & \Sexpr{format((pw.medium.182.meanspread/6), digits = 2)}\\ \hline \end{tabular} \label{table3} \end{center} \end{table} \section{Conclusion} Jegadeesh and Titman (1993) find that a momentum strategy, which buys past winning stocks and sells past losing stocks, can yield significant returns. Moskowitz and Grinblatt (1999) find that industry momentum is the source of momentum trading profits. Rather than looking at past winning and losing stocks, MG look at all stocks in past winning and losing industries. MG demonstrate that individual stock momentum strategies are less profitable after controlling for industry momentum. Using US large cap stocks between \Sexpr{begin.date} and \Sexpr{end.date}, we replicate MG's results and find that industry momentum strategies yield significant returns and account for most profits from individual stock momentum strategies. We also find that industry momentum strategies are approximately equally effective for large, medium, and small industries over a 6-month horizon. \begin{thebibliography}{99} \bibitem{backtest} Kyle Campbell, Jeff Enos, Daniel Gerlanc, and David Kane. Backtest. Computer software. Vers. 3.0. Backtest: Exploring portfolio based conjectures about financial instruments. 31 Oct. 2008. 27 Jan. 2009 . \bibitem{52weekhigh} Thomas J. George and Chuan-Yang Hwang. The 52-week High and Momentum Investing. \emph{Journal of Finance}, 59(5):2145-2176, October 2004. \bibitem{IndustryMomentum} Mark Grinblatt and Tobias J. Moskowitz. Do Industries Explain Momentum?. \emph{Journal of Finance}, 54(4):1249-1290, October 1999. \bibitem{disposition} Mark Grinblatt and Bing Han. The Disposition Effect and Momentum. Working Paper. The National Bureau of Economic Research. \bibitem{portfoliochoice} Bruce D. Grundy and J. Spencer Martin. Understanding the Nature of the Risks and the Source of the Rewards to Momentum Investing. \emph{Review of Financial Studies}, 14(1):29-78, March 2001. \bibitem{R} Ross Ihaka and Robert Gentleman. R: A Language for Data Analysis and Graphics. \emph{Journal of Computational and Graphical Statistics}, 5(3):299-314, September 1996. \bibitem{jt} Narasimhan Jegadeesh and Sheridan Titman. Returns to Buying Winners and Selling Losers: Implications for Stock Market Efficiency. \emph{Journal of Finance}, 48(1):65-91, March 1993. \end{thebibliography} \end{document}