Eliminate File Redundancy with Ruby

Say you have a file with many repeated, unnecessary lines that you want to remove. For safety’s sake, you would rather make an abbreviated copy of the file rather than replace it. Ruby makes this a cinch. You just iterate over the file, putting all lines the computer has already “seen” into a dictionary. If a line is not in the dictionary, it must be new, so write it to the output file. Here’s the code designed with .tex files in mind, but easily adaptable:

puts 'Filename?'
filename = gets.chomp
input = File.open(filename+'.tex')
output = File.open(filename+'2.tex', 'w')
seen = {}
input.each do |line|
  if (seen[line]) 
  else
    output.write(line)
    seen[line] = true
  end
end
input.close()
output.close()

Where would this come in handy? Well, the .tex extension probably already gave you a clue that I am reducing redundancy in a \LaTeX file. In particular, I have an R plot generated as a tikz graphic. The R plot includes a rug at the bottom (tick marks indicating data observations)–but the data set includes over 9,000 observations, so many of the lines are drawn right on top of each other. The \LaTeX compiler got peeved at having to draw so many lines, so Ruby helped it out by eliminating the redundancy. One special tweak for using the script above to modify tikz graphics files is to change the line

if (seen[line])

to

if (seen[line]) && !(line.include? 'node') &&  !(line.include? 'scope') && !(line.include? 'path') && !(line.include? 'define')

if your plot has multiple panes (e.g. par(mfrow=c(1,2)) in R) so that Ruby won’t ignore seemingly redundant lines that are actually specifying new panes. The modified line is a little long and messy, but it works, and that was the main goal here. The resulting \LaTeX file compiles easily and more quickly than it did with all those redundant lines, thanks to Ruby.

About these ads

About You Study Politics, Right?

Graduate student in political science at Duke University.
This entry was posted in Uncategorized and tagged , , , , , , , , , , , . Bookmark the permalink.

2 Responses to Eliminate File Redundancy with Ruby

  1. josh_cutler says:

    Nice to see you working in Ruby! One way to make that code even Rubier (?) would to replace:

    if (seen[line])
    else

    with:

    unless seen[line]

  2. That’s a nice change–I forgot about the unless operator.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s