16

I have a big compressed csv file (25gb) and I want to import it into PostgreSQL 9.5 version. Is there any fast way to import zip or qzip file into postgres without extracting the file?

wildplasser
  • 38,231
  • 6
  • 56
  • 94
Arezoo
  • 312
  • 3
  • 14
  • 3
    There is nothing built-in, but `copy` can get the input from a program so you could run unzip and pipe the output into the `copy` (or `\copy`) command. Some SQL clients also offer that feature – a_horse_with_no_name Jan 19 '17 at 10:07
  • I am using pgAdmin III client which does not have option for zip file. using this `copy` command that you mentioned would be faster or extracting the file first then import into postgresql? – Arezoo Jan 19 '17 at 10:14
  • COPY is the command used to IMPORT and EXPORT Data from/to CSV and which doesn't have capability to extract a zip file.If you want to import using pgAdmin what you've to do is extract the zip file then use COPY to import extracted CSV file – Vivek S. Jan 19 '17 at 10:18

3 Answers3

24

There is an old trick to use a named pipe (works on Unix, don't know about Windows)

  • create a named pipe: mkfifo /tmp/omyfifo
  • write the file contents to it: zcat mycsv.csv.z > /tmp/omyfifo &
  • [from psql] copy mytable(col1,...) from '/tmp/omyfifo'
  • [when finished] : rm /tmp/omyfifo

The zcat in the backgound will block until a reader (here: the COPY command) will start reading, and it will finish at EOF. (or if the reader closes the pipe)

You could even start multiple pipes+zcat pairs, which will be picked up by multiple COPY statements in your sql script.


This will work from pgadmin, but the fifo (+zcat process) should be present on the machine where the DBMS server runs.


BTW: a similar trick using netcat can be used to read a file from a remote machine (which of course should write the file to the network socket)

joop
  • 3,864
  • 1
  • 12
  • 22
18

example how to do it with zcat and pipe:

-bash-4.2$ psql -p 5555 t -c "copy tp to '/tmp/tp.csv';"
COPY 1
-bash-4.2$ gzip /tmp/tp.csv
-bash-4.2$ zcat /tmp/tp.csv.gz | psql -p 5555 t -c "copy tp from stdin;"
COPY 1
-bash-4.2$ psql -p 5555 t -c "select count(*) from tp"
 count
-------
     2
(1 row)

also from 9.3 release you can:

psql -p 5555 t -c "copy tp from program 'zcat /tmp/tp.csv.gz';"

without pipe at all

Vao Tsun
  • 37,644
  • 8
  • 70
  • 98
  • 2
    With the `program` attribute, you can move the `zcat` inside the copy command itself. `psql -p 5555 t -c "copy tp from program 'zcat /tmp/tp.csv.gz';"` The ability to run programs that generate the output is independently useful. (https://www.postgresql.org/docs/9.5/static/sql-copy.html) – Canonical Chris Feb 06 '18 at 18:31
  • 1
    true - `copy from stdin` requires SU anyway - so could be `copy form program` – Vao Tsun Feb 06 '18 at 18:52
1

If you have a ZIP (.zip) instead of a GZIP (.gz) archive, you can use unzip -p to pipe the zipped file.

psql -p 5555 t -c "copy tp from program 'unzip -p /tmp/tp.csv.zip';"
andschar
  • 1,703
  • 1
  • 14
  • 26