I want to read 100K+ files from s3 and zip them into single large file. The individual file size can range between few Kb to 1MB and the final zip file could go easily beyond 3GB.
Given AWS Lambda has memory limitation of 3GB and tmp
directory storage of 512MB. How would you do that using AWS lambda? I am using .Net Core 3
The code below will fail when zip size go beyond 3Gb
var zipStream = new MemoryStream();
using (System.IO.Compression.ZipArchive zip = new ZipArchive(zipStream, ZipArchiveMode.Create, true))
{
for(int i =0;i<sourceFils.Count;i++)
{
var zipItem = zip.CreateEntry("file"+i.ToString()+".pdf");
using (var entryStream = zipItem.Open())
{
var source = GetFileFromS3(sourceFiles[i]);
await source.CopyToAsync(entryStream);
}
}
}
//upload zip file to S3. For brevity Upload code is not included.
_s3Client.Upload(zipStream);
Most of the example i have seen for large file processing, are using Node JS and also don't go beyond 3GB. I am looking for C# .Net Core example. Also I am trying to avoid splitting zip into multiple zip files that are less than 3GB each
1>How would you do this using AWS Lambda without splitting zip file? 2>Is there S3 Stream available that would directly read/write from S3?