How can I pre-allocate a string in Perl?

Question

I have a Perl script that crunches a lot of data. There are a bunch of string variables that start small but grow really long due to the repeated use of the dot (concatentation) operator. Will growing the string in this manner result in repeated reallocations? If yes, is there a way to pre-allocate a string?

score 15 · Answer 1 · answered May 02 '09 at 04:29

Yes, Perl growing a string will result in repeated reallocations. Perl allocates a little bit of extra space to strings, but only a few bytes. You can see this using Devel::Peek. This reallocation is very fast and often does not actually copy the memory. Trust your memory manager, that's why you're programming in Perl and not C. Benchmark it first!

You can preallocate arrays with $#array = $num_entries and a hash with keys %hash = $num_keys but length $string = $strlen doesn't work. Here's a clever trick I dug up on Perlmonks.

my $str = "";
vec($str, $length, 8)=0;
$str = "";

Or if you want to get into XS you can call SvGROW().

chaos' suggestion to use an array and then join it all together will use more than double the memory. Memory for the array. Memory for each scalar allocated for each element in the array. Memory for the string held in each scalar element. Memory for the copy when joining. If it results in simpler code, do it, but don't think you're saving any memory.

score 7 · Accepted Answer · edited May 30 '15 at 11:30

7

Alternate suggestion that will be much easier to cope with: push the strings onto an array and join it when you're done.

edited May 30 '15 at 11:30

Sinan Ünür

113,391
15
187
326

answered May 02 '09 at 00:00

chaos

115,791
31
292
308

7

Although every element in the array creates an SV with all of its overhead. You'll use up a lot more memory this way. – brian d foy Sep 19 '09 at 23:24

score 7 · Answer 3 · edited Jan 26 '10 at 20:11

Perl's strings are mutable, so appending to a string does NOT incur a string duplication penalty.

You can try all you want to find a "faster" way, but this smells really bad of premature optimization.

For an example, I whipped up a class that abstracted away the hard work. It works perfectly, but it's, for all its goofy tricks, really slow.

Here's the result:

         Rate  magic normal
magic  1.72/s     --   -93%
normal 23.9/s  1289%     --

Yes, that's right, Perl is 1200% faster than what I thought was a respectable implementation.

Profile your code and find what the real problems are, don't try optimising stuff that isn't even a known problem.

#!/usr/bin/perl

use strict;
use warnings;

{

    package MagicString;
    use Moose;

    has _buffer => (
        isa => 'Str',
        is  => 'rw',
    );
    has _buffer_size => (
        isa     => 'Int',
        is      => 'rw',
        default => 0,
    );
    has step_size => (
        isa     => 'Int',
        is      => 'rw',
        default => 32768,
    );
    has _tail_pos => (
        isa     => 'Int',
        is      => 'rw',
        default => 0,
    );

    sub BUILD {
        my $self = shift;
        $self->_buffer( chr(0) x $self->step_size );
    }

    sub value {
        my $self = shift;
        return substr( $self->{buffer}, 0, $self->{_tail_pos} );
    }

    sub append {
        my $self  = shift;
        my $value = shift;
        my $L     = length($value);
        if ( ( $self->{_tail_pos} + $L ) > $self->{_buffer_size } ){
            $self->{buffer} .= (chr(0) x $self->{step_size} );
            $self->{_buffer_size} += $self->{step_size};
        }
        substr( $self->{buffer}, $self->{_tail_pos}, $L, $value );
        $self->{_tail_pos} += $L;
    }
    __PACKAGE__->meta->make_immutable;
}


use Benchmark qw( :all :hireswallclock );

cmpthese( -10 , {
        magic => sub{
            my $x = MagicString->new();
            for ( 1 .. 200001 ){
                $x->append( "hello");
            }
            my $y = $x->value();
        },
        normal =>sub{
            my $x = '';
            for ( 1 .. 200001 ){
                $x .= 'hello';
            }
            my $y = $x;
        }
    });
#use Data::Dumper;
#print Dumper( length( $x->value() ));

Saying Perl doesn't duplicate the string is only half the truth. Perl allocates only a few characters extra to a string, so Perl will most likely grow the memory containing the string when appending. This may cause the memory to be copied. But this happens in your system's memory manager which is very fast. Remember, O(n) will beat O(logn) in math class, but in the real world the constant time of the algorithm matters. C is fast. — Schwern, May 02 '09 at 04:34
Indeed, O(1) is not very good if O(1) is several days for one step, while O(n^2) may take only seconds :) Though, maybe an advantage if your data size is so large that the O(n^2) approach exceeds several weeks and that size data set is common. — Kent Fredric, Jan 09 '14 at 10:05

score 3 · Answer 4 · edited May 23 '17 at 12:30

3

I don't know specifically how Perl strings are implemented but a pretty good guess is that it's constant amortized time. This means that even if you do find a way to pre-allocate your string chances are that the combined time it will save for all the script's users will be less than the time you spent asking this question on Stack Overflow.

edited May 23 '17 at 12:30

Community

1
1

answered May 05 '09 at 18:07

Motti

99,411
44
178
249

score 0 · Answer 5 · edited Jan 26 '10 at 20:09

0

I would go the array/join way:

push(@array, $crunched_bit)

And then $str = join('', @array), if nothing more, to have access to all the elements for debugging at some later time.

edited Jan 26 '10 at 20:09

Peter Mortensen

28,342
21
95
123

answered May 02 '09 at 06:24

This will use up quite a bit of extra memory since every array element needs a new SV. – brian d foy Sep 19 '09 at 23:26

score -2 · Answer 6 · answered May 02 '09 at 00:04

-2

Yes, pre-extending strings that you know will grow is a good idea.

You can use the 'x' operator to do this. For example, to preallocate 1000 spaces:

$s = " " x 1000:

answered May 02 '09 at 00:04

Kevin Beck

2,328
2
15
27

And then use substr on the lhs of assignments. Uuuuugly. – chaos May 02 '09 at 00:06
While this will create a string containing 1000 spaces, when I then say "$s = 'foo'", will I get a 1000-character string with only the first three used or will it give me a new 3-character string and throw yours away? (I suspect the latter, but don't actually know how perl will handle it.) – Dave Sherohman May 02 '09 at 00:08
1

If you reassign it, it will throw away the old result (assuming away references to it). You would need to do string replacement, like Dave said, to modify only parts of it. ++array-then-join – Anonymous May 02 '09 at 00:10
2

Perl will NOT throw away memory already allocated to a scalar. Thus `$s = " " x 1000; $s = "";` will preallocate 1000+ bytes to $s and leave you with an empty string. However, Perl has to calculate that `" " x 1000` which is a bit of a waste. The vec() solution is better. Or just don't worry about it. – Schwern May 02 '09 at 04:36
Right, but the context is gone -- so filling it with zeros or blanks is a waste of time. – Anonymous May 02 '09 at 20:36

How can I pre-allocate a string in Perl?

6 Answers6