5

I have a top directory ds237 which has multiple sub-directories under it as below:

ds237/
├── dataset_description.json
├── derivatives
├── sub-01
├── sub-02
├── sub-03
├── sub-04
├── sub-05
├── sub-06
├── sub-07
├── sub-08
├── sub-09
├── sub-10
├── sub-11
├── sub-12
├── sub-13
├── sub-21
├── sub-22
├── sub-23
├── sub-24
├── sub-25
├── sub-26
├── sub-27
├── sub-28
├── sub-29

I am trying to create multiple zip files(with proper zip names) from ds237 as per size of the zip files. sub01-01.zip: contain sub-01 to sub-07 sub08-13.zip : it contains sub08 to sub-13

I have written a logic which creates a list of sub-directories [sub-01,sub-02, sub-03, sub-04, sub-05]. I have created the list so that the total size of the all subdirectories in the list should not be > 5gb.

My question: is how can I write a function to zip these sub-dirs (which are in a list) into a destination zip file with a proper name. Basically i want to write a function as follows:

def zipit([list of subdirs], 'path/to/zipfile/sub*-*.zip'):

I Linux I generally achieve this by:

'zip -r compress/sub01-08.zip ds237/sub-0[1-8]'
JeJo
  • 20,530
  • 5
  • 29
  • 68

3 Answers3

11

Looking at https://stackoverflow.com/a/1855118/375530, you can re-use that answer's function to add a directory to a ZipFile.

import os
import zipfile


def zipdir(path, ziph):
    # ziph is zipfile handle
    for root, dirs, files in os.walk(path):
        for file in files:
            ziph.write(os.path.join(root, file),
                       os.path.relpath(os.path.join(root, file),
                                       os.path.join(path, '..')))


def zipit(dir_list, zip_name):
    zipf = zipfile.ZipFile(zip_name, 'w', zipfile.ZIP_DEFLATED)
    for dir in dir_list:
        zipdir(dir, zipf)
    zipf.close()

The zipit function should be called with your pre-chunked list and a given name. You can use string formatting if you want to use a programmatic name (e.g. "path/to/zipfile/sub{}-{}.zip".format(start, end)).

Jerr
  • 483
  • 1
  • 7
  • 14
  • the above script will create a zip file by excluding the path of directory. Let say i zip `/Users/aba/ds100/sub-0[1-6]` into `sub01-06.zip` then when i uncompress the zip, it should generate following path `ds100/sub-01` and other directories. – learnningprogramming Sep 19 '17 at 23:38
  • 1
    You can also change the `relpath` to go two directories up from `path`. So change `os.path.join(path, '..')` to `os.path.join(path, '..', '..')` and it should work. – Jerr Sep 20 '17 at 14:54
  • it does the job partially but when i uncompress the `sub01-06.zip` and `sub07-09.zip`, ideally it should uncompress into `ds100/sub-01 ds100/sub-02 ds100/sub-03 ds100/sub-04 ds100/sub-05 ds100/sub-06 ds100/sub-07 ds100/sub-08 ds100/sub-09, However above script with chnages you suggested crates two different `ds100` – learnningprogramming Sep 20 '17 at 21:11
  • Not sure what you're seeing, I ran a similar test and was able to extract both zips to fill in the `ds100` directory. There may be some configuration with your unzip tool. You can also use `unzip zip_file.zip -d output_directory` to unzip the file `zip_file.zip` to `output_directory`. This would also be an alternate to changing the code to put `ds100` in there, where you would just specify the output directory as `ds100`. – Jerr Sep 22 '17 at 06:38
1

You can use subprocess calling 'zip' and passing the paths as arguments

doze
  • 247
  • 2
  • 12
1

The following will give you zip file with a first folder ds100:

import os
import zipfile    

def zipit(folders, zip_filename):
    zip_file = zipfile.ZipFile(zip_filename, 'w', zipfile.ZIP_DEFLATED)

    for folder in folders:
        for dirpath, dirnames, filenames in os.walk(folder):
            for filename in filenames:
                zip_file.write(
                    os.path.join(dirpath, filename),
                    os.path.relpath(os.path.join(dirpath, filename), os.path.join(folders[0], '../..')))

    zip_file.close()


folders = [
    "/Users/aba/ds100/sub-01",
    "/Users/aba/ds100/sub-02",
    "/Users/aba/ds100/sub-03",
    "/Users/aba/ds100/sub-04",
    "/Users/aba/ds100/sub-05"]

zipit(folders, "/Users/aba/ds100/sub01-05.zip")

For example sub01-05.zip would have a structure similar to:

ds100
├── sub-01
|   ├── 1
|       ├── 2
|   ├── 1
|   ├── 2
├── sub-02
    ├── 1
        ├── 2
    ├── 1
    ├── 2
Martin Evans
  • 37,882
  • 15
  • 62
  • 83