=head1 Perl Slurp Ease =head2 Introduction One of the common Perl idioms is processing text files line by line: while( ) { do something with $_ } This idiom has several variants, but the key point is that it reads in only one line from the file in each loop iteration. This has several advantages, including limiting memory use to one line, the ability to handle any size file (including data piped in via STDIN), and it is easily taught and understood to Perl newbies. In fact newbies are the ones who do silly things like this: while( ) { push @lines, $_ ; } foreach ( @lines ) { do something with $_ } Line by line processing is fine, but it isn't the only way to deal with reading files. The other common style is reading the entire file into a scalar or array, and that is commonly known as slurping. Now, slurping has somewhat of a poor reputation, and this article is an attempt at rehabilitating it. Slurping files has advantages and limitations, and is not something you should just do when line by line processing is fine. It is best when you need the entire file in memory for processing all at once. Slurping with in memory processing can be faster and lead to simpler code than line by line if done properly. The biggest issue to watch for with slurping is file size. Slurping very large files or unknown amounts of data from STDIN can be disastrous to your memory usage and cause swap disk thrashing. You can slurp STDIN if you know that you can handle the maximum size input without detrimentally affecting your memory usage. So I advocate slurping only disk files and only when you know their size is reasonable and you have a real reason to process the file as a whole. Note that reasonable size these days is larger than the bad old days of limited RAM. Slurping in a megabyte is not an issue on most systems. But most of the files I tend to slurp in are much smaller than that. Typical files that work well with slurping are configuration files, (mini-)language scripts, some data (especially binary) files, and other files of known sizes which need fast processing. Another major win for slurping over line by line is speed. Perl's IO system (like many others) is slow. Calling C<< <> >> for each line requires a check for the end of line, checks for EOF, copying a line, munging the internal handle structure, etc. Plenty of work for each line read in. Whereas slurping, if done correctly, will usually involve only one I/O call and no extra data copying. The same is true for writing files to disk, and we will cover that as well (even though the term slurping is traditionally a read operation, I use the term ``slurp'' for the concept of doing I/O with an entire file in one operation). Finally, when you have slurped the entire file into memory, you can do operations on the data that are not possible or easily done with line by line processing. These include global search/replace (without regard for newlines), grabbing all matches with one call of C, complex parsing (which in many cases must ignore newlines), processing *ML (where line endings are just white space) and performing complex transformations such as template expansion. =head2 Global Operations Here are some simple global operations that can be done quickly and easily on an entire file that has been slurped in. They could also be done with line by line processing but that would be slower and require more code. A common problem is reading in a file with key/value pairs. There are modules which do this but who needs them for simple formats? Just slurp in the file and do a single parse to grab all the key/value pairs. my $text = read_file( $file ) ; my %config = $text =~ /^(\w+)=(.+)$/mg ; That matches a key which starts a line (anywhere inside the string because of the C modifier), the '=' char and the text to the end of the line (again, C makes that work). In fact the ending C<$> is not even needed since C<.> will not normally match a newline. Since the key and value are grabbed and the C is in list context with the C modifier, it will grab all key/value pairs and return them. The C<%config>hash will be assigned this list and now you have the file fully parsed into a hash. Various projects I have worked on needed some simple templating and I wasn't in the mood to use a full module (please, no flames about your favorite template module :-). So I rolled my own by slurping in the template file, setting up a template hash and doing this one line: $text =~ s/<%(.+?)%>/$template{$1}/g ; That only works if the entire file was slurped in. With a little extra work it can handle chunks of text to be expanded: $text =~ s/<%(\w+)_START%>(.+?)<%\1_END%>/ template($1, $2)/sge ; Just supply a C