About a month ago, Joshua Tucker posted some hypotheses about the number of tweets and likes that posts get on The Monkey Cage. Anton Strezhnev took up his question, building a screen scraper in Python and making all of his data public. His tentative conclusion was that posts containing graphics are more likely to be "Liked" than tweeted.
Coincidentally, Josh Cutler is teaching a course on Python for the Duke Political Science Department this semester, and one of our assignments was to build a blog scraper.* I took Anton's scraper as a starting point and built three more, to get data from Andrew Gelman's blog, Freakonomics, and Modeled Behavior. The idea behind these choices was to make comparisons between economics and political science blogs, and to have gradations of "wonkiness," another of the proposed hypotheses. Although it's pretty hard to escape wonkiness entirely in the academic blogosphere, here's how I see the categorization:
Here's what you can expect from this series (not necessarily in this order):
- How do comments/tweets/likes correlate with page views?
- How do comments predict (correlate with) tweets and likes?
- What other factors predict tweets and like? (post length, images, time since previous post)
- What predicts comments? (same potential explanations)
- Are there author- or category- specific factors on the blogs?
Note: I'm not sure how the term "scraper" emerged, but it refers to a script that collects information from websites without doing any permanent damage to the website. Unless you forget to put in a time delay and crash the blog--but I'm not naming any names.