0

I'm trying to compress some csv files using gzip and then upload them to S3. I need to use streams to compress and load because the files could be very large and I don't want to write the file back to disk before loading it to s3. I'm new to using streams in Powershell and I'm struggling to figure out the issue.

This is what I have so far but I can't get it to work. It loads a very small gzip file that shows my original file inside but I can't extract it - I get an "Unexpected end of data" error. I believe it's not finalizing the gzip stream or something like that. If I remove the "gzip" commands and just write out the inputFileStream to S3 then it works to load the uncompressed file, so I know the S3 load using a stream works.
Also, I'm using "CopyTo" which I believe will bring the whole file into memory which I don't want either (let me know if I'm not correct with that thinking).

$sourcePath =  "c:\temp\myfile.csv"
$bucketName = "mybucket"
$s3Key = "staging/compress_test/"

$fileInfo = Get-Item -Path $sourcePath
$destPath = "$s3Key$($fileInfo.Name).gz"

$outputMemoryStream = New-Object System.IO.MemoryStream 
$gzipStream = New-Object System.IO.Compression.GZipStream $outputMemoryStream, ([IO.Compression.CompressionMode]::Compress)

$inputFileStream = New-Object System.IO.FileStream $sourcePath, ([IO.FileMode]::Open), ([IO.FileAccess]::Read), ([IO.FileShare]::Read)
$inputFileStream.CopyTo($gzipStream)

Write-S3Object -BucketName $destBucket -Key $destPath -Stream $outputMemoryStream -ProfileName Dev -Region us-east-1

$inputFileStream.Close()
$outputMemoryStream.Close()

UPDATE: Thanks @FoxDeploy. I got it at least loading the file now. I needed to close the gzip stream before writing to S3 causing the gzip to finalize. But as I suspected the "CopyTo" causes the full file to be compressed and stored in memory and then it loads to S3. I would like it to stream to S3 as it's compressing to reduce the memory load, if that's possible.
Here's the current working code:

$sourcePath =  "c:\temp\myfile.csv"
$bucketName = "mybucket"
$s3Key = "staging/compress_test/"

$fileInfo = Get-Item -Path $sourcePath
$destPath = "$s3Key$($fileInfo.Name).gz"

$outputMemoryStream = New-Object System.IO.MemoryStream 
$gzipStream = New-Object System.IO.Compression.GZipStream $outputMemoryStream, ([IO.Compression.CompressionMode]::Compress), true

$inputFileStream = New-Object System.IO.FileStream $sourcePath, ([IO.FileMode]::Open), ([IO.FileAccess]::Read), ([IO.FileShare]::Read)
$inputFileStream.CopyTo($gzipStream)

$gzipStream.Close()

Write-S3Object -BucketName $bucketName -Key $destPath -Stream $outputMemoryStream -ProfileName Dev -Region us-east-1

$inputFileStream.Close()
$outputMemoryStream.Close()
JeffR
  • 1,733
  • 1
  • 15
  • 18
  • 1
    You likely need to finalize this file before you try to upload it. Move the `Write-S3Object` command to the end of your function. – FoxDeploy Apr 29 '20 at 17:05
  • Thanks @FoxDeploy. I tried it but then I get a "Cannot access a closed stream" error. I also tried moving it below the inputFileStream.close but that gave me the same result with a file being loaded but it's not closed. – JeffR Apr 29 '20 at 18:12
  • 1
    Could you closely double check the `$inputFileStream`, `$outputMemoryStream` and `gzipStream` variables. Something about the three triggers my 'possibly an error here' sense in my nose. – FoxDeploy Apr 29 '20 at 18:21

0 Answers0