This post begins to describe the blog data collected (separately) by Anton Strezhnev and myself. One of the first things I did was to set the date variable in R format so that I could do some exploration.
library(foreign) # setwd('get your own') monkey1 <- read.dta('finalMonkeyCageData.dta') monkey1$newdate <- as.Date(monkey1$date, "%m/%d/%Y") monkey1$weekdaynum <- format(monkey1$newdate, "%w") day_abbr_list <- c("Sun","Mon","Tue","Wed","Thu","Fri","Sat") par(mfrow=c(3,1)) boxplot(monkey1$tweets ~ monkey1$weekdaynum, xaxt='n',xlab='',ylab="Tweets",col='blue') axis(1,labels=day_abbr_list, at=c(1,2,3,4,5,6,7)) boxplot(monkey1$likes ~ monkey1$weekdaynum, xaxt='n',xlab='',ylab="Likes",col='red') axis(1,labels=day_abbr_list, at=c(1,2,3,4,5,6,7)) boxplot(monkey1$comments ~ monkey1$weekdaynum,xlab='',xaxt='n',ylab="Comments",col='green') axis(1,labels=day_abbr_list, at=c(1,2,3,4,5,6,7))
The result was this plot:
For tweets and likes it looks like earlier in the week (Sunday, Monday) is better, while comments get an additional bump on Saturday and Wednesday. In the next couple of posts we'll look at how these three activities are correlated with page views, and how comments are distributed on the other blogs I scraped.