Automatically Setup New R and LaTeX Projects

You have a finite amount of keystrokes in your life. Automating repetitive steps is one of the great benefits of knowing a bit about coding, even if the code is just a simple shell script. That’s why I set up an example project on Github using a Makefile and sample article (inspired by Rob Hyndman). This post explains the structure of that project and explains how to modify it for your own purposes.

Running the Makefile with the example article in an otherwise-empty project directory will create:

  • a setup.R file for clearing the workspace, setting paths, and loading libraries
  • a data directory for storing files (csv, rda, etc)
  • a drafts directory for LaTeX, including a generic starter article
  • a graphics library for storing plots and figures to include in your article
  • an rcode directory for your R scripts

It also supplies some starter code for the setup.R file in the main directory and a start.R file in the rcode directory. This takes the current user and sets relative paths to the project directories with simple variable references in R. For example, after running setup.R in your analysis file you can switch to the data directory with setwd(pathData), then create a plot and save it after running setwd(pathGraphics). Because of the way the setup.R file works, you could have multiple users working on the same project and not need to change any of the other scripts.

If you want to change this structure, there are two main ways to do it. You can add more (or fewer) directories by modifying the mkdir lines in the Makefile. You can add (or remove) text in the R files by changing the echo lines.

If you decide to make major customizations to this file or already have your own structure for projects like this, leave a link in the comments for other readers.

H/T to Josh Cutler for encouraging me to write a post about my project structure and automation efforts. As a final note, this is the second New Year’s Eve in a row that I have posted a tech tip. Maybe it will become a YSPR tradition!

Update: If you’re interested in your own automated project setup, check out ProjectTemplate by John Myles White. Thank to Trey Causey on Twitter and Zachary Jones in the comments for sharing this.

Eliminate File Redundancy with Ruby

Say you have a file with many repeated, unnecessary lines that you want to remove. For safety’s sake, you would rather make an abbreviated copy of the file rather than replace it. Ruby makes this a cinch. You just iterate over the file, putting all lines the computer has already “seen” into a dictionary. If a line is not in the dictionary, it must be new, so write it to the output file. Here’s the code designed with .tex files in mind, but easily adaptable:

puts 'Filename?'
filename = gets.chomp
input = File.open(filename+'.tex')
output = File.open(filename+'2.tex', 'w')
seen = {}
input.each do |line|
  if (seen[line]) 
  else
    output.write(line)
    seen[line] = true
  end
end
input.close()
output.close()

Where would this come in handy? Well, the .tex extension probably already gave you a clue that I am reducing redundancy in a \LaTeX file. In particular, I have an R plot generated as a tikz graphic. The R plot includes a rug at the bottom (tick marks indicating data observations)–but the data set includes over 9,000 observations, so many of the lines are drawn right on top of each other. The \LaTeX compiler got peeved at having to draw so many lines, so Ruby helped it out by eliminating the redundancy. One special tweak for using the script above to modify tikz graphics files is to change the line

if (seen[line])

to

if (seen[line]) && !(line.include? 'node') &&  !(line.include? 'scope') && !(line.include? 'path') && !(line.include? 'define')

if your plot has multiple panes (e.g. par(mfrow=c(1,2)) in R) so that Ruby won’t ignore seemingly redundant lines that are actually specifying new panes. The modified line is a little long and messy, but it works, and that was the main goal here. The resulting \LaTeX file compiles easily and more quickly than it did with all those redundant lines, thanks to Ruby.

Wednesday Nerd Fun: Create Your Own Crossword Puzzles

Whether you want to make your own crossword puzzles, or just wonder how they are created, this post is for you. A user over at StackExchange asked how to create a puzzle in \LaTeX. Another user named Thorsten gave a very comprehensive answer, which forms the basis for this post.

The \LaTeX package to use is cwpuzzle. It isn’t quite as easy as I had envisioned, but is still relatively simple. The key part of Thorsten’s code looks like this:

\begin{Puzzle}{16}{12} 
|*    |[1]O |[2]P |E  |R     |A  |T     |I  |O    |N  |*     |*    |[3]B |*  |*  |*  |. 
|*    |*    |L    |*  |*     |*  |*     |*  |*    |*  |*     |[4]R |A    |N  |G  |E  |. 
|[5]E |*    |A    |*  |[6]M  |*  |*     |*  |*    |*  |*     |*    |R    |*  |*  |*  |. 
|S    |*    |[7]C |O  |O     |R  |D     |I  |N    |A  |T     |E    |G    |R  |I  |D  |. 
|T    |*    |E    |*  |D     |*  |*     |*  |*    |*  |*     |*    |R    |*  |*  |*  |. 
|I    |*    |V    |*  |E     |*  |*     |*  |[8]V |A  |R     |I    |A    |B  |L  |E  |. 
|[9]M |E    |A    |N  |*     |*  |*     |*  |*    |*  |*     |*    |P    |*  |*  |*  |. 
|A    |*    |L    |*  |[10]L |I  |N     |E  |G    |R  |[11]A |P    |H    |*  |*  |*  |. 
|T    |*    |U    |*  |*     |*  |*     |*  |*    |*  |X     |*    |*    |*  |*  |*  |. 
|I    |*    |E    |*  |*     |*  |[12]S |C  |A    |L  |E     |M    |O    |D  |E  |L  |. 
|O    |*    |*    |*  |*     |*  |*     |*  |*    |*  |S     |*    |*    |*  |*  |*  |. 
|N    |*    |*    |*  |*     |*  |*     |*  |*    |*  |*     |*    |*    |*  |*  |*  |. 
\end{Puzzle}

And here is the result:

The Puzzle

The Answers

A pretty neat tool, overall.

Crossword fans might also like this video, with remarks from a classic crossword puzzle “grid man:”

PyCon 2012 Video Round-Up

The videos from PyCon 2012 are posted. Here are the ones I plan to watch, along with their summaries:

Checking Mathematical Proofs Written in TeX

ProofCheck is a set of Python scripts which parse and check mathematics written using TeX. Its homepage is http://www.proofcheck.org. Unlike computer proof assistants which require immersion in the equivalent of a programming language, ProofCheck attempts to handle mathematical language formalized according to the author’s preferences as much as possible.

Sketching a Better Product

If writing is a means for organizing your thoughts, then sketching is a means for organizing your thoughts visually. Just as good writing requires drafts, good design requires sketches: low-investment, low-resolution braindumps. Learn how to use ugly sketching to iterate your way to a better product.

Bayesian Statistics Made (as) Simple (as Possible)

This tutorial is an introduction to Bayesian statistics using Python. My goal is to help participants understand the concepts and solve real problems. We will use material from my (nb: Allen Downey’s) book, Think Stats: Probability and Statistics for Programmers (O’Reilly Media).

SQL for Python Developers

Relational databases are often the bread-and-butter of large-scale data storage, yet they are often poorly understood by Python programmers. Organizations even split programmers into SQL and front-end teams, each of which jealously guards its turf. These tutorials will take what you already know about Python programming, and advance into a new realm: SQL programming and database design.

Web scraping: Reliably and efficiently pull data from pages that don’t expect it

Exciting information is trapped in web pages and behind HTML forms. In this tutorial, you’ll learn how to parse those pages and when to apply advanced techniques that make scraping faster and more stable. We’ll cover parallel downloading with Twisted, gevent, and others; analyzing sites behind SSL; driving JavaScript-y sites with Selenium; and evading common anti-scraping techniques.

Some of it may be above my head at this stage, but I think it’s great that the Python community makes all of these resources available.

How to Count Words in LaTex Documents

One thing that can be hard to adjust to for new LaTeX users is not being able to easily get a word count relative to other programs. Now there is a solution, thanks to Matthias Orlowski and Alex Iliopoulos. I share Matthias’s instructions here, with a couple notes of my own at the end.

1. Check whether Perl is installed by typing “which perl” in Terminal. That should be the case since Mac OS X ships with an installation of Perl.

2. Check whether texcount is installed by typing “which texcount” to Terminal. That should also be the case if you installed MacTex recently. If not, install it via TeXLive.

3. Copy the code below to your Preferences.el file which should be located at ~/Library/Preferences/Aquamacs Emacs

; count words in latex docs
(defun latex-word-count ()
  (interactive)
  (shell-command (concat "PATH "
    "-inc "; texcount option (set to count documents included via \input)
    (buffer-file-name))))

; that's [ctrl-c w] as the hotkey
(global-set-key (quote [f6]) 'latex-word-count)

“PATH ” is the path returned by ‘which texcount’ in (2), and the space at the end is important.

4. Restart Aquamacs and open a Tex file. Hit F6 and see the magic happen!

Now, obviously these instructions are only for OS X, but I would be happy to share Windows instructions if someone wants to adapt them. The first thing to look out for is that on recent OS X Macbooks the default for F6 is to brighten the keyboard backlight. You can change this by going into the Keyboard section on System Preferences and selecting “Use all F1, F2, etc. keys as standard function keys.” The second potential problem is if the path of the Tex file you are using includes spaces; this can possibly be addressed by putting a forward slash (\) instead of the space, but I cannot verify that from experience.

If you run into other issues with this script, or have new adaptations of it, feel free to leave them in the comments.