linux unwrap text file

I spent forever trying to figure out how to unwrap an apache access_log that was splitting requests between two lines. I finally found the answer!

Create a perl script called unwrap.pl and run the script like this:

# perl unwrap.pl access_log > accesslog2
#!/bin/env perl

use strict;

my $sentinel = 0;
my $previous_line;
my $result = "";

sub early_indent { # line next_line
    my $line = shift(@_);
    my $next_line = shift(@_);

    my @words = split(' ', $next_line);
    return (length ($line . " " . $words[0]) < length($next_line));
}

while (<>)
{
    my $this_line = $_;
    chomp $this_line;
    if ($sentinel)
    {
	if (($this_line eq "")
	    || ($previous_line eq "")
	    || ($this_line =~ /^[^A-Za-z0-9]/)
	    || early_indent($previous_line, $this_line))
	{ $result .= $previous_line . "\n"; }
	else {$result .= ($previous_line . " "); }
    }
    $previous_line = $this_line;
    $sentinel = 1;
}

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>