How to import CSV file data into a PostgreSQL table?

Question

How can I write a stored procedure that imports data from a CSV file and populates the table?

I have a user interface that uploads the csv file, to hook up this i need the stored procedure that actually copies the data from the cvs file — vardhan, Jun 07 '10 at 06:58
Bozhidar Batsov already gave you a link to an example, the fine manual could also help: http://www.postgresql.org/docs/8.4/interactive/sql-copy.html — Frank Heikens, Jun 07 '10 at 07:11
Current manual: http://www.postgresql.org/docs/current/static/sql-copy.html — Basil Bourque, Aug 09 '14 at 20:37
Related: http://stackoverflow.com/questions/17662631/how-to-copy-from-csv-file-to-postgresql-table-with-headers-in-csv-file — G. Cito, Nov 19 '14 at 04:50
See this answer: http://stackoverflow.com/a/42726226/1614973 — Dmitri Zaitsev, Mar 10 '17 at 19:26

score 815 · Answer 1 · edited Aug 11 '17 at 07:16

815

Take a look at this short article.

Solution paraphrased here:

Create your table:

CREATE TABLE zip_codes 
(ZIP char(5), LATITUDE double precision, LONGITUDE double precision, 
CITY varchar, STATE char(2), COUNTY varchar, ZIP_CLASS varchar);

Copy data from your CSV file to the table:

COPY zip_codes FROM '/path/to/csv/ZIP_CODES.txt' WITH (FORMAT csv);

edited Aug 11 '17 at 07:16

dylanfprice

28
4

answered Jun 07 '10 at 06:24

Bozhidar Batsov

52,348
13
95
111

50

actually use \copy would do the same trick if you do not have the super user access; it complaints on my Fedora 16 when using COPY with a non-root account. – asksw0rder Oct 15 '12 at 17:07
82

TIP: you can indicate what columns you have in the CSV using zip_codes(col1, col2, col3). The columns must be listed in the same order that they appear in the file. – David Pelaez Jan 02 '13 at 05:16
6

@asksw0rder does \copy have the same syntax? bcoz I'm getting a syntax error with \copy – JhovaniC May 29 '13 at 19:59
6

Should I include the header row? – bernie2436 Oct 27 '13 at 23:09
131

You can easily include the header row -- just add HEADER in the options: `COPY zip_codes FROM '/path/to/csv/ZIP_CODES.txt' DELIMITER ',' CSV HEADER;` http://www.postgresql.org/docs/9.1/static/sql-copy.html – Barrett Clark Nov 08 '13 at 15:17
This is REALLY COOL... I typed the copy statement in a query window using PGAdmin and it works beautifully. – itsols Nov 19 '13 at 07:29
I have a multiple delimiter like , and " the How can I copy – user88 Feb 02 '15 at 10:41
1

How to use (at client `psql`) FROM `./relativePath/file` ? Not works for me – Peter Krauss Mar 31 '15 at 15:01
2

Will this overwrite data in an existing table, or append it? – Andy Ray Nov 27 '15 at 22:29
@AndyRay " while COPY FROM copies data from a file to a table (appending the data to whatever is in the table already)" from the manual on the link above. http://www.postgresql.org/docs/current/static/sql-copy.html. – Robban1980 Mar 04 '16 at 04:57
If you have NULL values in your CSV, define them using the `NULL AS` flag: `COPY zip_codes FROM '/path/to/csv/ZIP_CODES.txt' DELIMITER ',' CSV NULL AS '';`. Replace with whatever you have in your CSV, often "NULL" or "". – Robin Métral Feb 17 '20 at 08:07

score 254 · Answer 2 · edited Dec 10 '19 at 03:33

254

If you don't have permission to use COPY (which work on the db server), you can use \copy instead (which works in the db client). Using the same example as Bozhidar Batsov:

Create your table:

CREATE TABLE zip_codes 
(ZIP char(5), LATITUDE double precision, LONGITUDE double precision, 
CITY varchar, STATE char(2), COUNTY varchar, ZIP_CLASS varchar);

Copy data from your CSV file to the table:

\copy zip_codes FROM '/path/to/csv/ZIP_CODES.txt' DELIMITER ',' CSV

You can also specify the columns to read:

\copy zip_codes(ZIP,CITY,STATE) FROM '/path/to/csv/ZIP_CODES.txt' DELIMITER ',' CSV

See the documentation for COPY:

Do not confuse COPY with the psql instruction \copy. \copy invokes COPY FROM STDIN or COPY TO STDOUT, and then fetches/stores the data in a file accessible to the psql client. Thus, file accessibility and access rights depend on the client rather than the server when \copy is used.

and note:

For identity columns, the COPY FROM command will always write the column values provided in the input data, like the INSERT option OVERRIDING SYSTEM VALUE.

edited Dec 10 '19 at 03:33

Brian Burns

14,953
5
69
59

answered Jun 20 '15 at 07:26

bjelli

8,232
3
27
45

\copy voters(ZIP,CITY) FROM '/Users/files/Downloads/WOOD.TXT' DELIMITER ',' CSV HEADER; ERROR: extra data after last expected column CONTEXT: COPY voters, line 2: "OH0012781511,87,26953,HOUSEHOLDER,SHERRY,LEIGH,,11/26/1965,08/19/1988,,211 N GARFIELD ST , ,BLOOMD..." – JZ. Sep 06 '15 at 17:29
@JZ. I had a similar error. It was because I had extra blank columns. Check your csv and if you have blank columns, that could be the reason. – alex bennett Jul 08 '16 at 04:32
5

This is somewhat misleading: the difference between `COPY` and `\copy` is much more than just permissions, and you can't simply add a `\` to make it magically work. See the description (in the context of export) here: http://stackoverflow.com/a/1517692/157957 – IMSoP Jan 26 '17 at 16:29
@IMSoP: you're right, I added a mention of server and client to clarify – bjelli Jan 27 '17 at 09:03
@bjelli is \copy slower than copy? I have a 1.5MB file and a db.m4.large instance on RDS and it's been hours that this copy command has been running (at least 3). – Sebastian May 28 '18 at 22:38
@Sebastian: the important difference is that \copy works from the client. so you still have to transmit all the data to the server. with COPY (no slash) you first upload all the data to the server with other means (sftp, scp) and then do the import on the server. but transmitting 1.5 MB does not sound like it should talk 3 hours - no matter which way you do it. – bjelli May 29 '18 at 20:49
This worked for me, and I use Windows OS - just change the (absolute) path formatting style. It's good to know that this method is easy to learn and implement as I have been trying to do the same procedure with SQL and it does not work as easy as this method. – Nathan Benton Apr 19 '19 at 22:20
I am very confused about the distinction of \copy on client and copy on server, since mysql, mariadb do not have such concepts to trouble users. – AbstProcDo Dec 07 '19 at 02:50
https://www.postgresqltutorial.com/import-csv-file-into-posgresql-table/ – ktaria Mar 27 '20 at 10:01
I tried to import 16Gb but I got error: ERROR: out of memory DETAIL: Cannot enlarge string buffer containing 1073725476 bytes by 65536 more bytes. – Igor Apr 29 '20 at 13:46
@Igor this is a separate question, answered here: https://stackoverflow.com/questions/56714274/postgrest-postgresql-cannot-enlarge-string-buffer-message – bjelli Apr 30 '20 at 14:45

score 86 · Answer 3 · edited Nov 07 '18 at 19:22

86

One quick way of doing this is with the Python pandas library (version 0.15 or above works best). This will handle creating the columns for you - although obviously the choices it makes for data types might not be what you want. If it doesn't quite do what you want you can always use the 'create table' code generated as a template.

Here's a simple example:

import pandas as pd
df = pd.read_csv('mypath.csv')
df.columns = [c.lower() for c in df.columns] #postgres doesn't like capitals or spaces

from sqlalchemy import create_engine
engine = create_engine('postgresql://username:password@localhost:5432/dbname')

df.to_sql("my_table_name", engine)

And here's some code that shows you how to set various options:

# Set it so the raw sql output is logged
import logging
logging.basicConfig()
logging.getLogger('sqlalchemy.engine').setLevel(logging.INFO)

df.to_sql("my_table_name2", 
          engine, 
          if_exists="append",  #options are ‘fail’, ‘replace’, ‘append’, default ‘fail’
          index=False, #Do not output the index of the dataframe
          dtype={'col1': sqlalchemy.types.NUMERIC,
                 'col2': sqlalchemy.types.String}) #Datatypes should be [sqlalchemy types][1]

edited Nov 07 '18 at 19:22

Brian Burns

14,953
5
69
59

answered Apr 18 '15 at 20:22

RobinL

9,064
6
39
54

6

In addition, the `if_exists` parameter can be set to replace or append to an existing table, e.g. `df.to_sql("fhrs", engine, if_exists='replace')` – joelostblom Apr 30 '15 at 00:47
1

username and password : need to create Login and assign DB to user. If uses pgAdmin, then create "Login/Group role" using GUI – Somnath Kadam Mar 24 '17 at 12:52
12

Pandas is a super slow way of loading to sql (vs csv files). Can be orders of magnitude slower. – user48956 May 04 '17 at 18:46
1

This could be a way to write data but it is super slow even with batch and good computing power. Using CSVs is a good way to accomplish this. – Ankit Singh Jul 13 '18 at 14:09
1

`df.to_sql()` is really slow, you can use `d6tstack.utils.pd_to_psql()` from [d6tstack](https://github.com/d6t/d6tstack) see [performance comparison](https://github.com/d6t/d6tstack/blob/master/examples-sql.ipynb) – citynorman Dec 04 '18 at 04:07

mehmet · Answer 4 · 2019-07-03T16:21:29.073

Most other solutions here require that you create the table in advance/manually. This may not be practical in some cases (e.g., if you have a lot of columns in the destination table). So, the approach below may come handy.

Providing the path and column count of your csv file, you can use the following function to load your table to a temp table that will be named as target_table:

The top row is assumed to have the column names.

create or replace function data.load_csv_file
(
    target_table text,
    csv_path text,
    col_count integer
)

returns void as $$

declare

iter integer; -- dummy integer to iterate columns with
col text; -- variable to keep the column name at each iteration
col_first text; -- first column name, e.g., top left corner on a csv file or spreadsheet

begin
    create table temp_table ();

    -- add just enough number of columns
    for iter in 1..col_count
    loop
        execute format('alter table temp_table add column col_%s text;', iter);
    end loop;

    -- copy the data from csv file
    execute format('copy temp_table from %L with delimiter '','' quote ''"'' csv ', csv_path);

    iter := 1;
    col_first := (select col_1 from temp_table limit 1);

    -- update the column names based on the first row which has the column names
    for col in execute format('select unnest(string_to_array(trim(temp_table::text, ''()''), '','')) from temp_table where col_1 = %L', col_first)
    loop
        execute format('alter table temp_table rename column col_%s to %s', iter, col);
        iter := iter + 1;
    end loop;

    -- delete the columns row
    execute format('delete from temp_table where %s = %L', col_first, col_first);

    -- change the temp table name to the name given as parameter, if not blank
    if length(target_table) > 0 then
        execute format('alter table temp_table rename to %I', target_table);
    end if;

end;

$$ language plpgsql;

Hi Mehmet, thanks for the answer you posted but when I run your code I get the following error message : ERROR: schema "data" does not exist — user2867432, Nov 08 '16 at 05:34
user2867432 you need to change schema name that you use accordingly (e.g., `public`) — mehmet, Nov 08 '16 at 13:05
Hi Mehmet, Thanks for solution, it's perfect but This works only if the postgres DB user is superuser, is ther any way to make it work without superuser? — Geeme, Jun 23 '17 at 09:05
Geeme: read "security definer" [here](https://www.postgresql.org/docs/current/static/sql-createfunction.html), but I have not used it myself. — mehmet, Jun 23 '17 at 16:55
Beautiful answer! I am not going to too generic though in my code for readability for others. — Manohar Reddy Poreddy, Jan 15 '20 at 09:33

score 32 · Answer 5 · edited May 23 '17 at 11:47

You could also use pgAdmin, which offers a GUI to do the import. That's shown in this SO thread. The advantage of using pgAdmin is that it also works for remote databases.

Much like the previous solutions though, you would need to have your table on the database already. Each person has his own solution but what I usually do is open the CSV in Excel, copy the headers, paste special with transposition on a different worksheet, place the corresponding data type on the next column then just copy and paste that to a text editor together with the appropriate SQL table creation query like so:

CREATE TABLE my_table (
    /*paste data from Excel here for example ... */
    col_1 bigint,
    col_2 bigint,
    /* ... */
    col_n bigint 
)

pls show a couple of sample rows of your pasted data – dcorking Apr 19 '18 at 11:07 — dcorking, Apr 19 '18 at 11:07

Andreas L. · Answer 6 · 2015-10-21T13:37:41.123

19

As Paul mentioned, import works in pgAdmin:

right click on table -> import

select local file, format and coding

here is a german pgAdmin GUI screenshot:

similar thing you can do with DbVisualizer (I have a license, not sure about free version)

right click on a table -> Import Table Data...

edited Oct 21 '15 at 13:37

answered Sep 17 '15 at 08:55

Andreas L.

2,380
18
22

2

DBVisualizer took 50 seconds to import 1400 rows with three fields -- and I had to cast everything back from a String to whatever it was supposed to be. – Noumenon Sep 29 '16 at 10:46

score 19 · Answer 7 · edited Nov 16 '16 at 19:21

19

COPY table_name FROM 'path/to/data.csv' DELIMITER ',' CSV HEADER;

edited Nov 16 '16 at 19:21

Tunaki

116,530
39
281
370

answered Nov 16 '16 at 19:06

timxor

818
10
14

score 10 · Answer 8 · edited Apr 23 '18 at 17:57

10

create a table first
Then use copy command to copy the table details:

copy table_name (C1,C2,C3....)
from 'path to your csv file' delimiter ',' csv header;

Thanks

edited Apr 23 '18 at 17:57

eyllanesc

190,383
15
87
142

answered Dec 22 '17 at 07:33

user9130085

101
1
2

3

How is this not the accepted answer? Why would I write a python script when the database already has a command to do this? – Wes May 17 '19 at 04:02

flowera · Answer 9 · 2017-10-09T20:53:59.237

Personal experience with PostgreSQL, still waiting for a faster way.

1. Create table skeleton first if the file is stored locally:

    drop table if exists ur_table;
    CREATE TABLE ur_table
    (
        id serial NOT NULL,
        log_id numeric, 
        proc_code numeric,
        date timestamp,
        qty int,
        name varchar,
        price money
    );
    COPY 
        ur_table(id, log_id, proc_code, date, qty, name, price)
    FROM '\path\xxx.csv' DELIMITER ',' CSV HEADER;

2. When the \path\xxx.csv is on the server, postgreSQL doesn't have the permission to access the server, you will have to import the .csv file through the pgAdmin built in functionality.

Right click the table name choose import.

If you still have problem, please refer this tutorial. http://www.postgresqltutorial.com/import-csv-file-into-posgresql-table/

score 8 · Answer 10 · edited Jan 18 '17 at 09:02

8

Use this SQL code

    copy table_name(atribute1,attribute2,attribute3...)
    from 'E:\test.csv' delimiter ',' csv header

the header keyword lets the DBMS know that the csv file have a header with attributes

for more visit http://www.postgresqltutorial.com/import-csv-file-into-posgresql-table/

edited Jan 18 '17 at 09:02

alfonx

5,962
1
46
57

answered Jan 12 '17 at 07:27

djdere

191
1
8

score 7 · Answer 11 · edited Jun 09 '19 at 20:56

How to import CSV file data into a PostgreSQL table?

steps:

Need to connect postgresql database in terminal
```
psql -U postgres -h localhost
```
Need to create database
```
create database mydb;
```

Need to create user

create user siva with password 'mypass';

Connect with database
```
\c mydb;
```
Need to create schema
```
create schema trip;
```

Need to create table

create table trip.test(VendorID int,passenger_count int,trip_distance decimal,RatecodeID int,store_and_fwd_flag varchar,PULocationID int,DOLocationID int,payment_type decimal,fare_amount decimal,extra decimal,mta_tax decimal,tip_amount decimal,tolls_amount int,improvement_surcharge decimal,total_amount
);

Import csv file data to postgresql

COPY trip.test(VendorID int,passenger_count int,trip_distance decimal,RatecodeID int,store_and_fwd_flag varchar,PULocationID int,DOLocationID int,payment_type decimal,fare_amount decimal,extra decimal,mta_tax decimal,tip_amount decimal,tolls_amount int,improvement_surcharge decimal,total_amount) FROM '/home/Documents/trip.csv' DELIMITER ',' CSV HEADER;

Find the given table data
```
select * from trip.test;
```

score 5 · Answer 12 · edited Jun 21 '18 at 13:58

5

IMHO, the most convenient way is to follow "Import CSV data into postgresql, the comfortable way ;-)", using csvsql from csvkit, which is a python package installable via pip.

edited Jun 21 '18 at 13:58

ruvim

5,241
2
19
27

answered Nov 07 '15 at 09:54

sal

1,072
1
11
30

3

Link rot is voracious! The article you linked to no longer works, which makes me uncomfortable :( – chbrown Jul 27 '16 at 20:18
you might want to mention that his is py. – mountainclimber11 Aug 09 '16 at 14:46
1

For me I get a MemoryError if trying to import a large CSV so it looks like it doesn't stream. – DavidC Oct 20 '16 at 12:32
@DavidC Interesting. How big is your file? How much memory do you have? If it doesnt stream as it appears, I suggest chunking the data before insertion – sal Oct 31 '16 at 12:13
@DavidC Or you use the csvql command without the --insert option and insert the data later via COPY, or you go by RobinL's answer http://stackoverflow.com/a/29722393/2772305 – sal Oct 31 '16 at 12:20
1

The file was 5GBs in size and I have 2GB memory. I gave up on it and use a script to generate CREATE TABLE and COPY commands in the end. – DavidC Nov 01 '16 at 09:51
As an alternative, [d6tstack](https://github.com/d6t/d6tstack) streams and also deals with schema changes, [see examples](https://github.com/d6t/d6tstack/blob/master/examples-sql.ipynb) – citynorman Dec 04 '18 at 04:14

score 3 · Answer 13 · answered Apr 26 '19 at 16:45

In Python, you can use this code for automatic PostgreSQL table creation with column names:

import pandas, csv

from io import StringIO
from sqlalchemy import create_engine

def psql_insert_copy(table, conn, keys, data_iter):
    dbapi_conn = conn.connection
    with dbapi_conn.cursor() as cur:
        s_buf = StringIO()
        writer = csv.writer(s_buf)
        writer.writerows(data_iter)
        s_buf.seek(0)
        columns = ', '.join('"{}"'.format(k) for k in keys)
        if table.schema:
            table_name = '{}.{}'.format(table.schema, table.name)
        else:
            table_name = table.name
        sql = 'COPY {} ({}) FROM STDIN WITH CSV'.format(table_name, columns)
        cur.copy_expert(sql=sql, file=s_buf)

engine = create_engine('postgresql://user:password@localhost:5432/my_db')

df = pandas.read_csv("my.csv")
df.to_sql('my_table', engine, schema='my_schema', method=psql_insert_copy)

It's also relatively fast, I can import more than 3.3 million rows in about 4 minutes.

Vlad Dinulescu · Answer 14 · 2020-12-06T22:18:58.780

3

You can also use pgfutter, or, even better, pgcsv.

These tools create the table columns from you, based on the CSV header.

pgfutter is quite buggy, I'd recommend pgcsv.

Here's how to do it with pgcsv:

sudo pip install pgcsv
pgcsv --db 'postgresql://localhost/postgres?user=postgres&password=...' my_table my_file.csv

edited Dec 06 '20 at 22:18

answered Nov 10 '19 at 21:26

Vlad Dinulescu

914
10
20

score 1 · Answer 15 · answered Apr 23 '18 at 17:55

If you need simple mechanism to import from text/parse multiline CSV you could use:

CREATE TABLE t   -- OR INSERT INTO tab(col_names)
AS
SELECT
   t.f[1] AS col1
  ,t.f[2]::int AS col2
  ,t.f[3]::date AS col3
  ,t.f[4] AS col4
FROM (
  SELECT regexp_split_to_array(l, ',') AS f
  FROM regexp_split_to_table(
$$a,1,2016-01-01,bbb
c,2,2018-01-01,ddd
e,3,2019-01-01,eee$$, '\n') AS l) t;

DBFiddle Demo

score 1 · Answer 16 · answered Jun 08 '19 at 15:09

1

I created a small tool that imports csv file into PostgreSQL super easy, just a command and it will create and populate the tables, unfortunately, at the moment all fields automatically created uses the type TEXT

csv2pg users.csv -d ";" -H 192.168.99.100 -U postgres -B mydatabase

The tool can be found on https://github.com/eduardonunesp/csv2pg

answered Jun 08 '19 at 15:09

Eduardo Pereira

672
1
6
17

You made a separate tool for the equivalent of `psql -h 192.168.99.100 -U postgres mydatabase -c "COPY users FROM 'users.csv' DELIMITER ';' CSV"`? I guess the part where it creates the table is nice, but since every field is text it's not super useful – GammaGames Oct 17 '19 at 14:52
1

Ops, thanks for the heads up. Yes, I did it, well it took just a few hours and I learned cool stuff in Go and pq and database API in Go. – Eduardo Pereira Oct 18 '19 at 15:17

score 1 · Answer 17 · answered Dec 06 '19 at 04:27

DBeaver Community Edition (dbeaver.io) makes it trivial to connect to a database, then import a CSV file for upload to a PostgreSQL database. It also makes it easy to issue queries, retrieve data, and download result sets to CSV, JSON, SQL, or other common data formats.

It is a FOSS multi-platform database tool for SQL programmers, DBAs and analysts that supports all popular databases: MySQL, PostgreSQL, SQLite, Oracle, DB2, SQL Server, Sybase, MS Access, Teradata, Firebird, Hive, Presto, etc. It's a viable FOSS competitor to TOAD for Postgres, TOAD for SQL Server, or Toad for Oracle.

I have no affiliation with DBeaver. I love the price (FREE!) and full functionality, but I wish they would open up this DBeaver/Eclipse application more and make it easy to add analytics widgets to DBeaver / Eclipse, rather than requiring users to pay for the $199 annual subscription just to create graphs and charts directly within the application. My Java coding skills are rusty and I don't feel like taking weeks to relearn how to build Eclipse widgets, (only to find that DBeaver has probably disabled the ability to add third-party widgets to the DBeaver Community Edition.)

Can DBeaver power users who are Java developers provide some insight about the steps to create analytics widgets to add into the Community Edition of DBeaver?

It would have been nice to understand how to actually use DBeaver to import a CSV file. Anyway, this might help: https://dbeaver.com/docs/wiki/Data-transfer/ — umbe1987, Apr 17 '20 at 10:01

score 1 · Answer 18 · answered Nov 08 '20 at 11:38

You can create a bash file as import.sh (that your CSV format is a tab delimiter)

#!/usr/bin/env bash

USER="test"
DB="postgres"
TBALE_NAME="user"
CSV_DIR="$(pwd)/csv"
FILE_NAME="user.txt"

echo $(psql -d $DB -U $USER  -c "\copy $TBALE_NAME from '$CSV_DIR/$FILE_NAME' DELIMITER E'\t' csv" 2>&1 |tee /dev/tty)

And then run this script.

score 1 · Answer 19 · answered Feb 11 '21 at 15:23

You can use the pandas library if the file is note very large. Be careful when using iter over pandas dataframes. I am doing this here to demonstrate the possibility. One could also consider the pd.Dataframe.to_sql() function when copying from a dataframe to an sql table

Assuming you have created the table you want you could:

import psycopg2
import pandas as pd
data=pd.read_csv(r'path\to\file.csv', delimiter=' ')

#prepare your data and keep only relevant columns

data.drop(['col2', 'col4','col5'], axis=1, inplace=True)
data.dropna(inplace=True)
print(data.iloc[:3])


conn=psycopg2.connect("dbname=db user=postgres password=password")
cur=conn.cursor()

for index,row in data.iterrows():
      cur.execute('''insert into table (col1,col3,col6) 
    VALUES (%s,%s,%s)''', (row['col1'], row['col3'], row['col6'])
        
cur.close()
conn.commit()

conn.close()
print('\n db connection closed.')

score 0 · Answer 20 · edited Jul 04 '17 at 09:00

0

Create table and have required columns that are used for creating table in csv file.

Open postgres and right click on target table which you want to load & select import and Update the following steps in file options section
Now browse your file in filename
Select csv in format
Encoding as ISO_8859_5

Now goto Misc. options and check header and click on import.

edited Jul 04 '17 at 09:00

Pang

8,605
144
77
113

answered Jul 04 '17 at 08:23

suriruler

49
1
3

How to import CSV file data into a PostgreSQL table?

20 Answers20

Linked

Related