1

I want to write the contents of lists on a text file. The way I currently do it is the following:

for k in li:
    outp.write(str(k))
out.write('\n')

however, this is taking a long time to run (I have tens of millions of lines)

Is there any faster way?

Sample Lines

The lists are from a sparse matrix so there are plenty of zeros. The non null elements are ones only

eg of such a list: [0 0 0 0 0 0 0 1 0 0 1 1 1 0 0]

The numbers are separated by tabs

bigTree
  • 1,952
  • 6
  • 24
  • 42
  • 2
    Does it have to be human readable? i.e. What uses the file afterwards? – doctorlove Feb 20 '14 at 09:34
  • @doctorlove yes in a text file read – bigTree Feb 20 '14 at 09:38
  • Please post a couple of sample lines in your lists. We might be able to exploit some properties of the data – inspectorG4dget Feb 20 '14 at 09:38
  • What does "in a text file read " mean? If a machine reads it, binary (eg pickle) might be quicker. And then you don't need to make it a string first. – doctorlove Feb 20 '14 at 09:40
  • @doctorlove no I write it in a text file and this text file has to be read by a human. Also, I need strings because the person who's gonna use it needs it in this format – bigTree Feb 20 '14 at 09:40
  • 1
    @bigTree Then maybe this entry could be interesting to you: [Speed up writing to files](http://stackoverflow.com/questions/4961589/speed-up-writing-to-files). Writing to the disk is slow. There is not a golden rule you can use to speed it up. But maybe you find some optimizations there... – Mathias G. Feb 20 '14 at 09:46
  • Actually the list @MathiasGhys gave answers my question. I'll close this post thans – bigTree Feb 20 '14 at 09:46
  • I wonder if it would be faster for you to save the coordinates of the `1`s, rather than the entire matrix. You could easily recreate it when you read it back – inspectorG4dget Feb 20 '14 at 10:02

2 Answers2

1

You should be able to get a pretty decent speed up by only writing once and leaving out all of those manual casts:

out.write('\n'.join(li))

There's probably further optimizations, but really you should look at a binary-based filed format for something where you have such a big array.

Specifically marshal seems like a great fit, but there are faster things out there.

Slater Victoroff
  • 19,762
  • 18
  • 78
  • 135
1

I wonder if storing the coordinates of the 1s would yield desired speedups, as you mention that this is a sparse matrix. Consider the following:

Write to file:

def writeMatrix(matrix):
    ones = [[i for i,num in row if num==1] for row in matrix]
    with open('path/to/file', 'w') as outfile:
        for row in ones:
            outfile.write(' '.join(str(i) for i in row))
            outfile.write('\n')

Read from file:

def readMatrix(infilepath, width):
    answer = []
    with open(infilepath) as infile:
        for line in infile:
            row = [None]*width
            for i in set(int(i) for i in line.split()):
                row[i] = 1
            answer.append(row)
    return answer
inspectorG4dget
  • 97,394
  • 22
  • 128
  • 222