Code Monkey home page Code Monkey logo

pgfutter's People

Contributors

aguynamedryan avatar ghidinelli avatar lukasmartinelli avatar mattmacleod avatar mrsaints avatar nk412 avatar rrbarbosa avatar toneill-newinnov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pgfutter's Issues

pgfutter issue when use fields "x,y,z" and delimiter ";"

/opt/jboss/pgfutter --db database --table tableName --user userName--pass password --host local --schema public csv --fields "columna0,columna1,columna2,columna3,columna4,columna5,columna6,columna7,columna8,columna9,columna10,columna11,columna12,columna13,columna14,columna15,columna16,columna17,columna18,columna19,columna20,columna21,columna22,columna23,columna24,columna25,columna26,columna27,columna28,columna29,columna30,columna31,columna32" --delimiter ";" filename.csv

pgfutter reports "wrong number of fields in line" when execute command on windows it works, but when i run the same file with the same command i got this error

line 1, column 0: wrong number of fields in line: NO. OBLIGACION;CEDULA;NOMBRE DEUDOR;DIRECCION DEUDOR;PROYECTO1;DIRECCION CASA ADQUIRIDA;TELEFONO DOMICILIO;CIUDAD DOMICILIO;TELEFONO CONTACTO 1;TELEFONO CONTACTO 2;TELEFONO CONTACTO 3;NIT EMPRESA;NOMBRE EMPRESA;DIRECCION EMPRESA;TELEFONO EMPRESA;CIUDAD EMPRESA;VALOR VIVIENDA;MONTO INICIAL;PLAZO;SALDO EN PESOS;VALOR CUOTA;TASA DE INTERES;DIAS EN MORA;VALOR VENCIDO;NO. CUOTAS PENDIENTES;FECHA ULTIMO PAGO;CICLO FACT.;SALDO A IMPACTAR;FECHA DESEMBOLSO;FECHA PROX. PAGO;OBSERVACIONES;EXTRA1;EXTRA2;

-d <delimiter> does not work with tab

using a custom delimiter I am getting this:

pgfutter --dbname tester --host mydb --port 32063 --username tester csv -d "\t" ~/sc/data/jobs.csv
Please specify the correct delimiter with -d.
Header column contains a delimiter character: job_db_inx account cpus_req exit_code job_name id_job id_user nodelist priority state partition timelimit time_submit time_eligible time_start time_end time_suspended

Pgfutter exiting with no errors

I'm trying to import a ~1000 row CSV with 40 headers and PgFutter is able to find the headers and shows them but then it just exits, no table created, no error, nothing.
./pgfutter_linux_386 --username "TEST" --pw "TEST" --db "api" --schema "public" --table "ola" csv hola.csv
delimiter in header column detected: LAT01;LON01;LAT02;LON02;LAT03;LON03;LAT04;LON04;LAT05;LON05;LAT06;LON06;LAT07;LON07;LAT08;LON08;LAT09;LON09;LAT10;LON10;LAT11;LON11;LAT12;LON12;LAT13;LON13;LAT14;LON14;LAT15;LON15;LAT16;LON16;LAT17;LON17;LAT18;LON18;LAT19;LON19;LAT20;LON20;LAT21;LON21;LAT22;LON22;LAT23;LON23;LAT24;LON24;LAT25;LON25;LAT26;LON26;LAT27;LON27;LAT28;LON28;LAT29;LON29;LAT30;LON30;LAT31;LON31;LAT32;LON32;LAT33;LON33;LAT34;LON34;LAT35;LON35;

Specifying data format

This is not an issue report, but just a request.
I think that data format is obtained from the extension suffix of an input file. So, the following usage would be simple:
$ pgfutter traffic_violations.csv

Specifying data format would be also welcome as follows:
$ pgfutter --json unknown_format_data_file

CR+LF EOL not handled correctly in Windows

Using pgfutter_windows_amd64.exe binary on Windows 10, with CSV files edited in Notepad++ in UTF-8 encoding:
If the EOL delimiter is set to CR+LF (ie Windows standard), a CR (ie ASCII 13) is appended to each entry in the last column if it's a text field, and if not an additional row of blank entries is added to the table.
If the EOL is set to LF only (ie Linux standard), then pgfutter fails with this exception:

2 columns
[serial_id some_text]
 127 B / 127 B [==============================================================================================] 100.00%panic: runtime error: index out of range

goroutine 1 [running]:
main.copyCSVRows(0xc04214e190, 0xc042146140, 0x0, 0x740f3a, 0x1, 0xc04214a500, 0x2, 0x2, 0xc04214e190, 0x0, ...)
        /usr/src/pgfutter/csv.go:99 +0x74e
main.importCSV(0xc04200e180, 0x39, 0xc0420140e0, 0x64, 0x743323, 0x6, 0xc04200a980, 0x11, 0x0, 0x0, ...)
        /usr/src/pgfutter/csv.go:172 +0x3fd
main.main.func2(0xc04209c420, 0xc042041200, 0xc04209c420)
        /usr/src/pgfutter/pgfutter.go:164 +0x396
github.com/codegangsta/cli.HandleAction(0x6e4f40, 0x75a8e0, 0xc04209c420, 0x0, 0xc0420347e0)
        /gopath/src/github.com/codegangsta/cli/app.go:501 +0xd9
github.com/codegangsta/cli.Command.Run(0x7412f4, 0x3, 0x0, 0x0, 0x0, 0x0, 0x0, 0x74d1f6, 0x18, 0x0, ...)
        /gopath/src/github.com/codegangsta/cli/command.go:165 +0x4c2
github.com/codegangsta/cli.(*App).Run(0xc0420f2000, 0xc042064000, 0xb, 0x10, 0x0, 0x0)
        /gopath/src/github.com/codegangsta/cli/app.go:259 +0x747
main.main()
        /usr/src/pgfutter/pgfutter.go:170 +0xcd5

I used the attached 2 files to test this:
test_data_with_CRLF.csv.txt
test_data_with_LF.csv.txt

field names don't seem to be quoted propertly.

I have a csv file where one of the fields is called 'references". When I attempt to import I get this error.

pq: syntax error at or near "references"

I rename the field to be _references and it gives the same error but for the field "order". I fix that and it imports.

I presume this is because the name is not being quoted?

Thanks.

& in Column name produces error

I tried importing a CSV file which contains ampersands in the header line. Example File:

"Login Name","Starting Date & Time"
"test","2014-07-01T07:41"
"nobdy","2014-07-01T08:00"

On Windows 7 this file results in the error message

pq: syntax error at or near "&"

Removing the ampersand resolves the error.

Windows/DOS end of line causes pgfutter to treat all data as one row

Great software and very helpful! Also withstands the csv-spectrum tests. Thumbs up!

I just identified an issue with the line endings. If the CSV file has Windows EOLs (and, for that matter, I would think legacy Mac ones too), then pgfutter treats it as containing one row only - basically the header. Thus it throws the weirdest of errors with some otherwise valid (in the Apple/Microsoft worlds) files.

To test that for yourself, you could use one of these files: file1, file2.

For those users having the same issue, a temporary fix would be to [programmatically] pass the file through a Bash script which replaces the EOLs. Good cross platform instructions here.

Does not accept wildcards in file names

When I try to run pgfutter on all the csv files in my current directory, using *.csv as the file name specification, it only runs on the first file it encounters, and stops. I had to use gnu parallels to run it on all files (which it did beautifully, so not a big issue, but still counter-intuitive).

pgfutter dies with a large stacktrace randomly while importing a 150,000 line file

I have about 300,000 lines of data to load. pgfutter always dies unless I split the file into 2 and then is dies sometimes. Usually 3 tries and it takes.

Here's the stack trace

failed MSpanList_Insert 0x5e4f70 0x6c6f6e0f0130 0x0
fatal error: MSpanList_Insert

runtime stack:
runtime.MSpanList_Insert(0x578850, 0x5e4f70)
    /goroot/src/runtime/mheap.c:692 +0x8f
runtime.MHeap_Free(0x5787e0, 0x5e4f70, 0x0)
    /goroot/src/runtime/mheap.c:500 +0x5b
runtime.MCentral_FreeSpan(0x57fb38, 0x5e4f70, 0x40, 0xc20809c000, 0xc20809df80, 0x0, 0x64)
    /goroot/src/runtime/mcentral.c:181 +0x1bb
runtime.MSpan_Sweep(0x5e4f70, 0xff00000000, 0x1)
    /goroot/src/runtime/mgc0.c:1099 +0x477
runtime.MHeap_Alloc(0x5787e0, 0x1, 0x1000000000d, 0x15159)
    /goroot/src/runtime/mheap.c:240 +0x66
runtime.MCentral_CacheSpan(0x57ffb8, 0xc2080f8708)
    /goroot/src/runtime/mcentral.c:85 +0x167
runtime.MCache_Refill(0x5e0000, 0xd, 0x10005e4820)
    /goroot/src/runtime/mcache.c:90 +0xa0

goroutine 1 [running]:
runtime.switchtoM()
    /goroot/src/runtime/asm_amd64.s:198 fp=0xc2080f87c8 sp=0xc2080f87c0
runtime.mallocgc(0xc0, 0x0, 0x3, 0x0)
    /goroot/src/runtime/malloc.go:178 +0x849 fp=0xc2080f8878 sp=0xc2080f87c8
runtime.rawstring(0xbe, 0x0, 0x0, 0x0, 0x0, 0x0)
    /goroot/src/runtime/string.go:195 +0x93 fp=0xc2080f88a8 sp=0xc2080f8878
runtime.slicebytetostring(0xc208082480, 0xbe, 0x207, 0x0, 0x0)
    /goroot/src/runtime/string.go:69 +0x4b fp=0xc2080f8938 sp=0xc2080f88a8
encoding/csv.(*Reader).parseRecord(0xc20805c320, 0xc208141580, 0x7, 0x8, 0x0, 0x0)
    /goroot/src/encoding/csv/reader.go:235 +0x449 fp=0xc2080f89f8 sp=0xc2080f8938
encoding/csv.(*Reader).Read(0xc20805c320, 0x0, 0x0, 0x0, 0x0, 0x0)
    /goroot/src/encoding/csv/reader.go:136 +0x5e fp=0xc2080f8a60 sp=0xc2080f89f8
main.copyCSVRows(0xc208033aa0, 0xc20805c320, 0x0, 0xc2080337f5, 0x1, 0xc2080c8100, 0x8, 0x8, 0x0, 0x0, ...)
    /usr/src/pgfutter/csv.go:57 +0xa9 fp=0xc2080f8c08 sp=0xc2080f8a60
main.importCSV(0x7fff5fbffb01, 0x13, 0xc2080f22a0, 0x6b, 0xc208033606, 0x6, 0xc208033630, 0xc, 0x0, 0xc2080f2230, ...)
    /usr/src/pgfutter/csv.go:144 +0x530 fp=0xc2080f8e40 sp=0xc2080f8c08
main.func·003(0xc208080a00)
    /usr/src/pgfutter/pgfutter.go:172 +0x332 fp=0xc2080f8f60 sp=0xc2080f8e40
runtime.call16(0x440268, 0xc2080580b0, 0x800000008)
    /goroot/src/runtime/asm_amd64.s:401 +0x45 fp=0xc2080f8f78 sp=0xc2080f8f60
reflect.Value.call(0x29e880, 0x440268, 0x13, 0x382790, 0x4, 0xc2080f93d0, 0x1, 0x1, 0x0, 0x0, ...)
    /goroot/src/reflect/value.go:419 +0x10e5 fp=0xc2080f9290 sp=0xc2080f8f78
reflect.Value.Call(0x29e880, 0x440268, 0x13, 0xc2080f93d0, 0x1, 0x1, 0x0, 0x0, 0x0)
    /goroot/src/reflect/value.go:296 +0xbc fp=0xc2080f92f0 sp=0xc2080f9290
github.com/codegangsta/cli.HandleAction(0x29e880, 0x440268, 0xc208080a00, 0x0, 0x0)
    /gopath/src/github.com/codegangsta/cli/app.go:487 +0x2b1 fp=0xc2080f9420 sp=0xc2080f92f0
github.com/codegangsta/cli.Command.Run(0x38eb70, 0x3, 0x0, 0x0, 0x0, 0x0, 0x0, 0x3dfcf0, 0x18, 0x0, ...)
    /gopath/src/github.com/codegangsta/cli/command.go:191 +0x10a5 fp=0xc2080f9998 sp=0xc2080f9420
github.com/codegangsta/cli.(*App).Run(0xc2080f4000, 0xc20800a000, 0xf, 0xf, 0x0, 0x0)
    /gopath/src/github.com/codegangsta/cli/app.go:240 +0xa76 fp=0xc2080f9c10 sp=0xc2080f9998
main.main()
    /usr/src/pgfutter/pgfutter.go:178 +0xcd3 fp=0xc2080f9f98 sp=0xc2080f9c10
runtime.main()
    /goroot/src/runtime/proc.go:63 +0xf3 fp=0xc2080f9fe0 sp=0xc2080f9f98
runtime.goexit()
    /goroot/src/runtime/asm_amd64.s:2232 +0x1 fp=0xc2080f9fe8 sp=0xc2080f9fe0

goroutine 5 [syscall]:
os/signal.loop()
    /goroot/src/os/signal/signal_unix.go:21 +0x1f
created by os/signal.init·1
    /goroot/src/os/signal/signal_unix.go:27 +0x35

goroutine 6 [chan receive]:
database/sql.(*DB).connectionOpener(0xc20805d720)
    /goroot/src/database/sql/sql.go:589 +0x4c
created by database/sql.Open
    /goroot/src/database/sql/sql.go:452 +0x31c

goroutine 9 [IO wait]:
net.(*pollDesc).Wait(0xc2080100d0, 0x72, 0x0, 0x0)
    /goroot/src/net/fd_poll_runtime.go:84 +0x47
net.(*pollDesc).WaitRead(0xc2080100d0, 0x0, 0x0)
    /goroot/src/net/fd_poll_runtime.go:89 +0x43
net.(*netFD).Read(0xc208010070, 0xc2080cc000, 0x1000, 0x1000, 0x0, 0x5f1da0, 0xc208157408)
    /goroot/src/net/fd_unix.go:242 +0x40f
net.(*conn).Read(0xc208058030, 0xc2080cc000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    /goroot/src/net/net.go:121 +0xdc
bufio.(*Reader).fill(0xc2080961e0)
    /goroot/src/bufio/bufio.go:97 +0x1ce
bufio.(*Reader).Read(0xc2080961e0, 0xc208034020, 0x5, 0x200, 0x0, 0x0, 0x0)
    /goroot/src/bufio/bufio.go:174 +0x26c
io.ReadAtLeast(0x5f3398, 0xc2080961e0, 0xc208034020, 0x5, 0x200, 0x5, 0x0, 0x0, 0x0)
    /goroot/src/io/io.go:298 +0xf1
io.ReadFull(0x5f3398, 0xc2080961e0, 0xc208034020, 0x5, 0x200, 0x0, 0x0, 0x0)
    /goroot/src/io/io.go:316 +0x6d
github.com/lib/pq.(*conn).recvMessage(0xc208034000, 0xc208153b00, 0x0, 0x0, 0x0)
    /gopath/src/github.com/lib/pq/conn.go:892 +0x17a
github.com/lib/pq.(*copyin).resploop(0xc20805a3c0)
    /gopath/src/github.com/lib/pq/copy.go:143 +0x55
created by github.com/lib/pq.(*conn).prepareCopyIn
    /gopath/src/github.com/lib/pq/copy.go:90 +0xae5

goroutine 10 [select]:
github.com/cheggaaa/pb.(*ProgressBar).refresher(0xc208080000)
    /gopath/src/github.com/cheggaaa/pb/pb.go:423 +0xff
created by github.com/cheggaaa/pb.(*ProgressBar).Start
    /gopath/src/github.com/cheggaaa/pb/pb.go:123 +0xcd

SSL Mode

Hi,

I'm trying to use PGFutter to import Json into a Heroku Postgres instance.

I've got everything setup but I'm getting this error: pq: no pg_hba.conf entry for host "198.11.218.38", user "xxx", database "xxx", SSL off.

I can't find a flag to enable sslmode=require. Is that possible? Or is there another issue I'm facing?

Thanks!

Invalid memory address error

Hey there,

Is there a column limitation to what pgfutter can import? Here's my code and resulting error message. These files have around 500 columns with a header row, but with every iteration of the file it throws an error, including after I reopen and save the .csv as UTF-8 encoding. Other, smaller files work just fine.

 ./pgfutter --user "pgfutter" \
     --pass "pgfutter" \
     --schema "model" \
     csv '/Users/user/Desktop/scwUTF.csv'

Here's the error that results. Other, much smaller .csv files import without any issue.

panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x0 pc=0x7c641]

goroutine 1 [running]:
database/sql.(*Stmt).Exec(0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
    /goroot/src/database/sql/sql.go:1302 +0x3e1
main.NewCSVImport(0xc208035400, 0xc2080332a0, 0x8, 0xc208033420, 0x6, 0xc208108000, 0x1f1, 0x200, 0x0, 0x0, ...)
    /usr/src/pgfutter/import.go:21 +0xfd
main.importCSV(0x7fff5fbffbf9, 0x2b, 0xc208011f80, 0x6c, 0xc2080332a0, 0x8, 0xc208033420, 0x6, 0x0, 0x5511f0, ...)
    /usr/src/pgfutter/csv.go:127 +0x444
main.func·002(0xc2080ac0f0)
    /usr/src/pgfutter/pgfutter.go:141 +0x369
github.com/codegangsta/cli.Command.Run(0x3784b0, 0x3, 0x0, 0x0, 0x0, 0x0, 0x0, 0x3c7d10, 0x18, 0x0, ...)
    /gopath/src/github.com/codegangsta/cli/command.go:127 +0xffe
github.com/codegangsta/cli.(*App).Run(0xc2080a8200, 0xc20800a000, 0x9, 0x9, 0x0, 0x0)
    /gopath/src/github.com/codegangsta/cli/app.go:159 +0xd7b
main.main()
    /usr/src/pgfutter/pgfutter.go:147 +0xaca

Thanks!

Drop table if exists option

Hello,

Would it be possible to add a parameter to drop (or truncate) the table if it already exists ?

Thanks

Homebrew formula

This is a really useful tool for getting unwieldy CSVs into Postgres for quick analysis.

Have you considered creating a Homebrew formula for it, for easy installation on OSX?

pgfutter --help throws exception

Downloaded and installed pgfutter binary on Mac (OSX 10.13.5) per readme. When I run pgfutter --help I get the exception listed below. I upgraded go from 1.10.2 -> 1.10.3 but it doesn't seem to help (I'm not sure if this is relevant)

failed MSpanList_Insert 0x5e4000 0x9a2e78b9883 0x0
fatal error: MSpanList_Insert

runtime stack:
runtime.throw(0x552fab)
	/goroot/src/runtime/panic.go:491 +0xad fp=0x7ffeefbff550 sp=0x7ffeefbff520
runtime.MSpanList_Insert(0x57bbe8, 0x5e4000)
	/goroot/src/runtime/mheap.c:692 +0x8f fp=0x7ffeefbff578 sp=0x7ffeefbff550
MHeap_FreeSpanLocked(0x5787e0, 0x5e4000, 0x100)
	/goroot/src/runtime/mheap.c:583 +0x163 fp=0x7ffeefbff5b8 sp=0x7ffeefbff578
MHeap_Grow(0x5787e0, 0x8, 0x0)
	/goroot/src/runtime/mheap.c:420 +0x1a8 fp=0x7ffeefbff5f8 sp=0x7ffeefbff5b8
MHeap_AllocSpanLocked(0x5787e0, 0x1, 0x0)
	/goroot/src/runtime/mheap.c:298 +0x365 fp=0x7ffeefbff638 sp=0x7ffeefbff5f8
mheap_alloc(0x5787e0, 0x1, 0x12, 0x0)
	/goroot/src/runtime/mheap.c:190 +0x121 fp=0x7ffeefbff660 sp=0x7ffeefbff638
runtime.MHeap_Alloc(0x5787e0, 0x1, 0x10000000012, 0x15159)
	/goroot/src/runtime/mheap.c:240 +0x66 fp=0x7ffeefbff698 sp=0x7ffeefbff660
MCentral_Grow(0x580558, 0x0)
	/goroot/src/runtime/mcentral.c:197 +0x8b fp=0x7ffeefbff700 sp=0x7ffeefbff698
runtime.MCentral_CacheSpan(0x580558, 0x0)
	/goroot/src/runtime/mcentral.c:85 +0x167 fp=0x7ffeefbff738 sp=0x7ffeefbff700
runtime.MCache_Refill(0x5e0000, 0x12, 0x0)
	/goroot/src/runtime/mcache.c:90 +0xa0 fp=0x7ffeefbff760 sp=0x7ffeefbff738
runtime.mcacheRefill_m()
	/goroot/src/runtime/malloc.c:368 +0x57 fp=0x7ffeefbff780 sp=0x7ffeefbff760
runtime.onM(0x440590)
	/goroot/src/runtime/asm_amd64.s:273 +0x9a fp=0x7ffeefbff788 sp=0x7ffeefbff780
runtime.mallocgc(0x120, 0x36db00, 0x0, 0x0)
	/goroot/src/runtime/malloc.go:178 +0x849 fp=0x7ffeefbff838 sp=0x7ffeefbff788
runtime.newobject(0x36db00, 0x5e0000)
	/goroot/src/runtime/malloc.go:353 +0x49 fp=0x7ffeefbff860 sp=0x7ffeefbff838
runtime.newG(0x2e27a)
	/goroot/src/runtime/proc.go:233 +0x2a fp=0x7ffeefbff878 sp=0x7ffeefbff860
allocg(0x5690a0)
	/goroot/src/runtime/proc.c:925 +0x1f fp=0x7ffeefbff888 sp=0x7ffeefbff878
runtime.malg(0x8000, 0x569320)
	/goroot/src/runtime/proc.c:2106 +0x1f fp=0x7ffeefbff8b8 sp=0x7ffeefbff888
runtime.mpreinit(0x569900)
	/goroot/src/runtime/os_darwin.c:137 +0x27 fp=0x7ffeefbff8d0 sp=0x7ffeefbff8b8
mcommoninit(0x569900)
	/goroot/src/runtime/proc.c:201 +0xc9 fp=0x7ffeefbff8f8 sp=0x7ffeefbff8d0
runtime.schedinit()
	/goroot/src/runtime/proc.c:138 +0x55 fp=0x7ffeefbff920 sp=0x7ffeefbff8f8
runtime.rt0_go(0x7ffeefbff950, 0x2, 0x7ffeefbff950, 0x0, 0x2, 0x7ffeefbffad0, 0x7ffeefbffad9, 0x0, 0x7ffeefbffae0, 0x7ffeefbffb1c, ...)
	/goroot/src/runtime/asm_amd64.s:95 +0x116 fp=0x7ffeefbff928 sp=0x7ffeefbff920

Does Not accept cyrillic letters?

Hello, when i create table from csv with rows who have cyrillic letters its not recognized by postgreSQL
pgfutter --db "Test" --port "5432" --user "postgres" --pw "1111" csv numbertest.csv

Test=# select * from import.numbertest;
ERROR: character with byte sequence 0xef 0xbf 0xbd in encoding "UTF8" has no equivalent in encoding "WIN1251"

Any suggestions?

null values in csv are imported as empty strings

I have a csv file which contains a number of null values in string columns. For example:

col_a, col_b,col_c
"val a",,"val c"
"val b",,"val c"

Once imported, the empty values are cast into empty strings. Is there a way to get it to import them as null?

Import text values as null instead of blank

When importing data to a Postgres database use the copy command default behavior which is null values instead of blank ones.

For an experienced DBA, it is expected that the default value of a blank text field to be null because if you insert data on a table and do not specify any value as a default for that field the database is going to insert it as a null value.

quoting Postgres doc:
"If no default value is declared explicitly, the default value is the null value. This usually makes sense because a null value can be considered to represent unknown data." - https://www.postgresql.org/docs/current/ddl-default.html

pgfutter gets killed when uploading large csv file

Hi,

When trying to upload a 1.87GB csv file with a few million rows through pgfutter, around 40% in of the upload it fails without much information:

 756.80 MB / 1.87 GB [==============================================================================================================================================================================>-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------]  39.62% 45s    74 Killed                  pgfutter --pw $dbpw --schema public --table my_upload_table csv upload.csv

Is there something like a verbose option to better troubleshoot and see why it is failing?

I think it could be a wrong formatted row in my csv file and/or OOM. Help appreciated.

Best,
FB

Not compatible with go1.8

Got these error for go1.8

 31.77 MB / 395.06 MB [=============>----------------------------------------------------------------------------------------------------------------------------------------------------------------]   8.04% 18sfailed MSpanList_Insert 0x5e4e38 0x134391768dc45 0x0
fatal error: MSpanList_Insert

runtime stack:
runtime.MSpanList_Insert(0x578850, 0x5e4e38)
	/goroot/src/runtime/mheap.c:692 +0x8f
runtime.MHeap_Free(0x5787e0, 0x5e4e38, 0x0)
	/goroot/src/runtime/mheap.c:500 +0x5b
runtime.MCentral_FreeSpan(0x57ffb8, 0x5e4e38, 0x3, 0xc20817c000, 0xc20817c180, 0x0, 0x64)
	/goroot/src/runtime/mcentral.c:181 +0x1bb
runtime.MSpan_Sweep(0x5e4e38, 0x4bb00000400, 0x1)
	/goroot/src/runtime/mgc0.c:1099 +0x477
runtime.MHeap_Alloc(0x5787e0, 0x1, 0x10000000017, 0x15159)
	/goroot/src/runtime/mheap.c:240 +0x66
runtime.MCentral_CacheSpan(0x580af8, 0xc2080f4980)
	/goroot/src/runtime/mcentral.c:85 +0x167
runtime.MCache_Refill(0x5e0000, 0xc200000017, 0x5e5998)
	/goroot/src/runtime/mcache.c:90 +0xa0

goroutine 1 [running]:
runtime.switchtoM()
	/goroot/src/runtime/asm_amd64.s:198 fp=0xc2080f4970 sp=0xc2080f4968
runtime.mallocgc(0x1c0, 0x0, 0xc200000003, 0x0)
	/goroot/src/runtime/malloc.go:178 +0x849 fp=0xc2080f4a20 sp=0xc2080f4970
runtime.rawstring(0x1a2, 0x0, 0x0, 0x0, 0x0, 0x0)
	/goroot/src/runtime/string.go:195 +0x93 fp=0xc2080f4a50 sp=0xc2080f4a20
runtime.slicebytetostring(0xc208133dc0, 0x1a2, 0x1a2, 0x0, 0x0)
	/goroot/src/runtime/string.go:69 +0x4b fp=0xc2080f4ae0 sp=0xc2080f4a50
main.copyJSONRows(0xc208032b40, 0xc2080923c0, 0xc208092300, 0x0, 0x0, 0x38efd0, 0x4)
	/usr/src/pgfutter/json.go:52 +0x68b fp=0xc2080f4c40 sp=0xc2080f4ae0
main.importJSON(0x7fff5fbff9a8, 0x14, 0xc2080ee230, 0x64, 0xc208033396, 0x6, 0xc20801ebc0, 0x14, 0x100, 0x393cf0, ...)
	/usr/src/pgfutter/json.go:133 +0xd25 fp=0xc2080f4e60 sp=0xc2080f4c40
main.func·001(0xc208048a00)
	/usr/src/pgfutter/pgfutter.go:119 +0x282 fp=0xc2080f4f60 sp=0xc2080f4e60
runtime.call16(0x440258, 0xc2080360b0, 0x800000008)
	/goroot/src/runtime/asm_amd64.s:401 +0x45 fp=0xc2080f4f78 sp=0xc2080f4f60
reflect.Value.call(0x29e880, 0x440258, 0x13, 0x382790, 0x4, 0xc2080f53d0, 0x1, 0x1, 0x0, 0x0, ...)
	/goroot/src/reflect/value.go:419 +0x10e5 fp=0xc2080f5290 sp=0xc2080f4f78
reflect.Value.Call(0x29e880, 0x440258, 0x13, 0xc2080f53d0, 0x1, 0x1, 0x0, 0x0, 0x0)
	/goroot/src/reflect/value.go:296 +0xbc fp=0xc2080f52f0 sp=0xc2080f5290
github.com/codegangsta/cli.HandleAction(0x29e880, 0x440258, 0xc208048a00, 0x0, 0x0)
	/gopath/src/github.com/codegangsta/cli/app.go:487 +0x2b1 fp=0xc2080f5420 sp=0xc2080f52f0
github.com/codegangsta/cli.Command.Run(0x393cf0, 0x4, 0x0, 0x0, 0x0, 0x0, 0x0, 0x40ed30, 0x33, 0x0, ...)
	/gopath/src/github.com/codegangsta/cli/command.go:191 +0x10a5 fp=0xc2080f5998 sp=0xc2080f5420
github.com/codegangsta/cli.(*App).Run(0xc2080f0000, 0xc20800a000, 0x3, 0x3, 0x0, 0x0)
	/gopath/src/github.com/codegangsta/cli/app.go:240 +0xa76 fp=0xc2080f5c10 sp=0xc2080f5998
main.main()
	/usr/src/pgfutter/pgfutter.go:178 +0xcd3 fp=0xc2080f5f98 sp=0xc2080f5c10
runtime.main()
	/goroot/src/runtime/proc.go:63 +0xf3 fp=0xc2080f5fe0 sp=0xc2080f5f98
runtime.goexit()
	/goroot/src/runtime/asm_amd64.s:2232 +0x1 fp=0xc2080f5fe8 sp=0xc2080f5fe0

goroutine 5 [syscall]:
os/signal.loop()
	/goroot/src/os/signal/signal_unix.go:21 +0x1f
created by os/signal.init·1
	/goroot/src/os/signal/signal_unix.go:27 +0x35

goroutine 6 [chan receive]:
database/sql.(*DB).connectionOpener(0xc20803b720)
	/goroot/src/database/sql/sql.go:589 +0x4c
created by database/sql.Open
	/goroot/src/database/sql/sql.go:452 +0x31c

goroutine 9 [IO wait]:
net.(*pollDesc).Wait(0xc2080ee060, 0x72, 0x0, 0x0)
	/goroot/src/net/fd_poll_runtime.go:84 +0x47
net.(*pollDesc).WaitRead(0xc2080ee060, 0x0, 0x0)
	/goroot/src/net/fd_poll_runtime.go:89 +0x43
net.(*netFD).Read(0xc2080ee000, 0xc2080c8000, 0x1000, 0x1000, 0x0, 0x5f1cc0, 0xc208032cb8)
	/goroot/src/net/fd_unix.go:242 +0x40f
net.(*conn).Read(0xc208036018, 0xc2080c8000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
	/goroot/src/net/net.go:121 +0xdc
bufio.(*Reader).fill(0xc208092060)
	/goroot/src/bufio/bufio.go:97 +0x1ce
bufio.(*Reader).Read(0xc208092060, 0xc208058f20, 0x5, 0x200, 0x0, 0x0, 0x0)
	/goroot/src/bufio/bufio.go:174 +0x26c
io.ReadAtLeast(0x5f3348, 0xc208092060, 0xc208058f20, 0x5, 0x200, 0x5, 0x0, 0x0, 0x0)
	/goroot/src/io/io.go:298 +0xf1
io.ReadFull(0x5f3348, 0xc208092060, 0xc208058f20, 0x5, 0x200, 0x0, 0x0, 0x0)
	/goroot/src/io/io.go:316 +0x6d
github.com/lib/pq.(*conn).recvMessage(0xc208058f00, 0xc20801ef00, 0x0, 0x0, 0x0)
	/gopath/src/github.com/lib/pq/conn.go:892 +0x17a
github.com/lib/pq.(*copyin).resploop(0xc2080381e0)
	/gopath/src/github.com/lib/pq/copy.go:143 +0x55
created by github.com/lib/pq.(*conn).prepareCopyIn
	/gopath/src/github.com/lib/pq/copy.go:90 +0xae5

goroutine 10 [select]:
github.com/cheggaaa/pb.(*ProgressBar).refresher(0xc208048000)
	/gopath/src/github.com/cheggaaa/pb/pb.go:423 +0xff
created by github.com/cheggaaa/pb.(*ProgressBar).Start
	/gopath/src/github.com/cheggaaa/pb/pb.go:123 +0xcd

If column name is an SQL keyword, pgfutter fails with no error message

Using pgfutter_windows_amd64.exe binary on Windows 10, with CSV files edited in Notepad++ in UTF-8 encoding:
I attempted to import a table which included a column named "Union" (for trade union), and pgfutter exited after printing out the column name list only, with no further message. By trial and error I figured out where the problem was, and also tested it by renaming the column to "test" (which worked) and then "select" (which didn't). If it was possible to catch invalid column names (and maybe prefix them with a '_') that would be great, but at the least it would be helpful if there was an error message, preferably identifying the rogue column name.

support JSONL

I'm starting to see some people use JSONL and I hope it gets more traction as it solves some classic csv issues.

You support JSON import already to a "data" column in postgres, but could you also support looking at the "fields" in the first json document and then using those as if it was a csv? Wish I had time to take a stab at. :(

Great project! Thanks so much it.

--table doesn't exist as an option

pgfutter csv --table accounts Account.csv 
Incorrect Usage: flag provided but not defined: -table

NAME:
   pgfutter csv - Import CSV into database

USAGE:
   pgfutter csv [command options] [arguments...]

OPTIONS:
   --excel                      support problematic Excel 2008 and Excel 2011 csv line endings
   --skip-header                skip header row
   --fields value               comma separated field names if no header row
   --delimiter value, -d value  field delimiter (default: ",")
   --skip-parse-delimiter       skip parsing escape sequences in the given delimiter
   

Documentation says there's a table name

Use passwords in .pgpass

Would be great if, rather than having to specify passwords on the command line or in an environment variable, pgfutter could read passwords from the .pgpass file located in the user's home directory (the same as psql or pgAdmin).

Trying to import a CSV, doesn't do anything (just outputs a single comma)

I think something is broken with pgfutter on OSX. I had to install from source before I could run it at all (issue #48). Now when I try to import a CSV, all I get is a single comma:

$ ~/go/bin/pgfutter csv ./test.csv
,
$ ~/go/bin/pgfutter csv ./foo.csv
,

test.csv is a real CSV file. foo.csv doesn't exist. Either way, it just prints a comma to the console and exits.

pgfutter is a really cool concept, and looks very helpful. Hope this is easy to fix.

No longer builds

➜  pgfutter git:(master) go install
go: creating new go.mod: module github.com/lukasmartinelli/pgfutter
go: finding github.com/JensRantil/go-csv latest
go: downloading github.com/JensRantil/go-csv v0.0.0-20190410123842-f8c512241072
go: extracting github.com/JensRantil/go-csv v0.0.0-20190410123842-f8c512241072
go: github.com/JensRantil/[email protected]: parsing go.mod: unexpected module path "github.com/tink-ab/go-csv"
go: error loading module requirements

Looks like tink-ab/go-csv is gone.

incorrect JSON format

From the README
"A lot of event logs contain JSON objects nowadays (e.g. GitHub Archive). pgfutter expects each line to have a valid JSON object. Importing JSON is only supported for Postgres 9.3 and Postgres 9.4 due to the JSON type.

Create friends.json.

{"name": "Jacob", "age": 26, "friends": ["Anthony"]}
{"name": "Anthony", "age": 25, "friends": []}
{"name": "Emma", "age": 28, "friends": ["Jacob", "Anthony"]}"

This is not valid JSON. Please see http://json.org/ to see what I mean.

Any plans to support properly formatted JSON?

Does not accepted escaped " in CSV

I don't know if this is part of the official CSV specification (if there is one), but it would be useful to handle escaped quotation marks. For example, pgfutter chokes on this line:

"cV7QpZd-EeSKzSIAC0cT7w@2","cV7QpZd-EeSKzSIAC0cT7w",2,5,"{\"typeName\":\"cml\",\"definition\":{\"dtdId\":\"assess/1\",\"value\":\"<co-content><text>For the gene At3g59490, retrieve the corresponding protein sequence from TAIR</text><text>(http://www.arabidopsis.org/tools/bulk/sequences/index.jsp). Remember to choose the correct dataset and output option.</text><text>Now, navigate to BLASTP at NCBI and paste your genes sequence into the “query sequence” box. Set the database to “non-redundant protein sequences (nr)”, keep all settings at default, and click BLAST.</text><text>Take note of the top match (ortholog) for each of the other species, for 20 different species excluding your query species. Which species’ gene is most closely related to your query gene?</text></co-content>\"}}",2015-02-17 22:14:27.187

however, when I remove all " with sed, it imports beautifully.

Where does the single JSON object get stored in memory?

Hi Lukas,
According to your docs...

Instead of using JSON lines you can also import a single JSON object into the database. This will load the JSON document into memory first.
I'm unsure of what this means, I've executed the command with no errors, but where in memory is my uploaded(?) JSON object?

Thanks in advance.

Remove unsupported unicode character \u0000 by default

Null byte and zero code point are not legal in a PostgreSQL string. Imported some data and when trying to query see the following error:

ERROR:  unsupported Unicode escape sequence
DETAIL:  \u0000 cannot be converted to text.

Would like if this was cleaned by default since it is an unsupported unicode escape sequence with the option to turn it off (just in case it was used as a delimiter?).

On OSX with GNU sed:
brew install gsed
gsed -i "s/\\u0000\|\x00//g" json.txt

https://unix.stackexchange.com/questions/36734/is-there-a-way-to-detect-null-bytes-nul-0-in-sed/36744#36744

Unexpected signal during execution

Tried to use pgfutter on a new CSV on OSX.

08:52 AM ~/Downloads  => pgfutter csv cell_towers.csv
 47.39 MB / 3.10 GB [=>--------------------------------------------------------------------------------------------------]   1.49% 4m8sfatal error: unexpected signal during runtime execution
[signal 0xb code=0x1 addr=0xb01dfacedebac1e pc=0x35e99]

runtime stack:
runtime.gothrow(0x40e3f0, 0x2a)
	/goroot/src/runtime/panic.go:503 +0x8e fp=0xc208107cd8 sp=0xc208107cc0
runtime.sigpanic()
	/goroot/src/runtime/sigpanic_unix.go:14 +0x5e fp=0xc208107d28 sp=0xc208107cd8
runtime.MSpanList_Remove(0x5e5998)
	/goroot/src/runtime/mheap.c:676 +0x29 fp=0xc208107d30 sp=0xc208107d28
runtime.MCentral_UncacheSpan(0x57f358, 0x5e5998)
	/goroot/src/runtime/mcentral.c:121 +0x7f fp=0xc208107d50 sp=0xc208107d30
runtime.MCache_ReleaseAll(0x5e0000)
	/goroot/src/runtime/mcache.c:111 +0x6a fp=0xc208107d78 sp=0xc208107d50
flushallmcaches()
	/goroot/src/runtime/mgc0.c:1224 +0x61 fp=0xc208107da0 sp=0xc208107d78
markroot(0xc208010000, 0xc200000004)
	/goroot/src/runtime/mgc0.c:536 +0x20a fp=0xc208107e00 sp=0xc208107da0
runtime.parfordo(0xc208010000)
	/goroot/src/runtime/parfor.c:76 +0xb2 fp=0xc208107e80 sp=0xc208107e00
gc(0xc208107fb8)
	/goroot/src/runtime/mgc0.c:1442 +0x25e fp=0xc208107f98 sp=0xc208107e80
runtime.gc_m()
	/goroot/src/runtime/mgc0.c:1371 +0xe0 fp=0xc208107fd0 sp=0xc208107f98
runtime.onM(0xc208012000)
	/goroot/src/runtime/asm_amd64.s:257 +0x68 fp=0xc208107fd8 sp=0xc208107fd0
runtime.mstart()
	/goroot/src/runtime/proc.c:818 fp=0xc208107fe0 sp=0xc208107fd8

goroutine 1 [garbage collection]:
runtime.switchtoM()
	/goroot/src/runtime/asm_amd64.s:198 fp=0xc2080ec790 sp=0xc2080ec788
runtime.gogc(0x0)
	/goroot/src/runtime/malloc.go:469 +0x1cf fp=0xc2080ec7c8 sp=0xc2080ec790
runtime.mallocgc(0x10, 0x0, 0x3, 0x0)
	/goroot/src/runtime/malloc.go:341 +0x391 fp=0xc2080ec878 sp=0xc2080ec7c8
runtime.rawstring(0xa, 0x0, 0x0, 0x0, 0x0, 0x0)
	/goroot/src/runtime/string.go:195 +0x93 fp=0xc2080ec8a8 sp=0xc2080ec878
runtime.slicebytetostring(0xc208074374, 0xa, 0x40, 0x0, 0x0)
	/goroot/src/runtime/string.go:69 +0x4b fp=0xc2080ec938 sp=0xc2080ec8a8
encoding/csv.(*Reader).parseRecord(0xc208074320, 0xc208141200, 0xc, 0x10, 0x0, 0x0)
	/goroot/src/encoding/csv/reader.go:235 +0x449 fp=0xc2080ec9f8 sp=0xc2080ec938
encoding/csv.(*Reader).Read(0xc208074320, 0x0, 0x0, 0x0, 0x0, 0x0)
	/goroot/src/encoding/csv/reader.go:136 +0x5e fp=0xc2080eca60 sp=0xc2080ec9f8
main.copyCSVRows(0xc208033d60, 0xc208074320, 0x0, 0xc208033746, 0x1, 0xc2080c6100, 0xe, 0x10, 0x0, 0x0, ...)
	/usr/src/pgfutter/csv.go:57 +0xa9 fp=0xc2080ecc08 sp=0xc2080eca60
main.importCSV(0x7fff5fbff2a5, 0xf, 0xc2080e6230, 0x64, 0xc208033500, 0x6, 0x7fff5fbff2a5, 0xb, 0x0, 0x571590, ...)
	/usr/src/pgfutter/csv.go:144 +0x530 fp=0xc2080ece40 sp=0xc2080ecc08
main.func·003(0xc208076a00)
	/usr/src/pgfutter/pgfutter.go:172 +0x332 fp=0xc2080ecf60 sp=0xc2080ece40
runtime.call16(0x440268, 0xc2080360b0, 0x800000008)
	/goroot/src/runtime/asm_amd64.s:401 +0x45 fp=0xc2080ecf78 sp=0xc2080ecf60
reflect.Value.call(0x29e880, 0x440268, 0x13, 0x382790, 0x4, 0xc2080ed3d0, 0x1, 0x1, 0x0, 0x0, ...)
	/goroot/src/reflect/value.go:419 +0x10e5 fp=0xc2080ed290 sp=0xc2080ecf78
reflect.Value.Call(0x29e880, 0x440268, 0x13, 0xc2080ed3d0, 0x1, 0x1, 0x0, 0x0, 0x0)
	/goroot/src/reflect/value.go:296 +0xbc fp=0xc2080ed2f0 sp=0xc2080ed290
github.com/codegangsta/cli.HandleAction(0x29e880, 0x440268, 0xc208076a00, 0x0, 0x0)
	/gopath/src/github.com/codegangsta/cli/app.go:487 +0x2b1 fp=0xc2080ed420 sp=0xc2080ed2f0
github.com/codegangsta/cli.Command.Run(0x38eb70, 0x3, 0x0, 0x0, 0x0, 0x0, 0x0, 0x3dfcf0, 0x18, 0x0, ...)
	/gopath/src/github.com/codegangsta/cli/command.go:191 +0x10a5 fp=0xc2080ed998 sp=0xc2080ed420
github.com/codegangsta/cli.(*App).Run(0xc2080e8000, 0xc20800a000, 0x3, 0x3, 0x0, 0x0)
	/gopath/src/github.com/codegangsta/cli/app.go:240 +0xa76 fp=0xc2080edc10 sp=0xc2080ed998
main.main()
	/usr/src/pgfutter/pgfutter.go:178 +0xcd3 fp=0xc2080edf98 sp=0xc2080edc10
runtime.main()
	/goroot/src/runtime/proc.go:63 +0xf3 fp=0xc2080edfe0 sp=0xc2080edf98
runtime.goexit()
	/goroot/src/runtime/asm_amd64.s:2232 +0x1 fp=0xc2080edfe8 sp=0xc2080edfe0

goroutine 2 [force gc (idle)]:
runtime.gopark(0x3b610, 0x568bc0, 0x3b1530, 0xf)
	/goroot/src/runtime/proc.go:130 +0x105 fp=0xc20801a798 sp=0xc20801a768
runtime.goparkunlock(0x568bc0, 0x3b1530, 0xf)
	/goroot/src/runtime/proc.go:136 +0x48 fp=0xc20801a7c0 sp=0xc20801a798
runtime.forcegchelper()
	/goroot/src/runtime/proc.go:99 +0xce fp=0xc20801a7e0 sp=0xc20801a7c0
runtime.goexit()
	/goroot/src/runtime/asm_amd64.s:2232 +0x1 fp=0xc20801a7e8 sp=0xc20801a7e0
created by runtime.init·4
	/goroot/src/runtime/proc.go:87 +0x25

goroutine 3 [runnable]:
runtime.Gosched()
	/goroot/src/runtime/proc.go:113 +0x14 fp=0xc20801dfc0 sp=0xc20801dfb0
runtime.bgsweep()
	/goroot/src/runtime/mgc0.go:87 +0x59 fp=0xc20801dfe0 sp=0xc20801dfc0
runtime.goexit()
	/goroot/src/runtime/asm_amd64.s:2232 +0x1 fp=0xc20801dfe8 sp=0xc20801dfe0
created by gc
	/goroot/src/runtime/mgc0.c:1386

goroutine 4 [finalizer wait]:
runtime.gopark(0x3b610, 0x571518, 0x3b0e70, 0xe)
	/goroot/src/runtime/proc.go:130 +0x105 fp=0xc208018f30 sp=0xc208018f00
runtime.goparkunlock(0x571518, 0x3b0e70, 0xe)
	/goroot/src/runtime/proc.go:136 +0x48 fp=0xc208018f58 sp=0xc208018f30
runtime.runfinq()
	/goroot/src/runtime/malloc.go:727 +0xba fp=0xc208018fe0 sp=0xc208018f58
runtime.goexit()
	/goroot/src/runtime/asm_amd64.s:2232 +0x1 fp=0xc208018fe8 sp=0xc208018fe0
created by runtime.createfing
	/goroot/src/runtime/malloc.go:707 +0x5e

goroutine 5 [syscall]:
runtime.notetsleepg(0x571d20, 0xffffffffffffffff, 0x569901)
	/goroot/src/runtime/lock_sema.go:266 +0x8c fp=0xc20801d768 sp=0xc20801d728
runtime.signal_recv(0x0)
	/goroot/src/runtime/sigqueue.go:109 +0x135 fp=0xc20801d7a0 sp=0xc20801d768
os/signal.loop()
	/goroot/src/os/signal/signal_unix.go:21 +0x1f fp=0xc20801d7e0 sp=0xc20801d7a0
runtime.goexit()
	/goroot/src/runtime/asm_amd64.s:2232 +0x1 fp=0xc20801d7e8 sp=0xc20801d7e0
created by os/signal.init·1
	/goroot/src/os/signal/signal_unix.go:27 +0x35

goroutine 6 [chan receive]:
runtime.gopark(0x3b610, 0xc2080926b8, 0x3aded0, 0xc)
	/goroot/src/runtime/proc.go:130 +0x105 fp=0xc2080186b0 sp=0xc208018680
runtime.goparkunlock(0xc2080926b8, 0x3aded0, 0xc)
	/goroot/src/runtime/proc.go:136 +0x48 fp=0xc2080186d8 sp=0xc2080186b0
runtime.chanrecv(0x28c540, 0xc208092660, 0xc2080187c8, 0x1, 0x0)
	/goroot/src/runtime/chan.go:467 +0x82c fp=0xc208018778 sp=0xc2080186d8
runtime.chanrecv2(0x28c540, 0xc208092660, 0xc2080187c8, 0x0)
	/goroot/src/runtime/chan.go:316 +0x2b fp=0xc2080187a8 sp=0xc208018778
database/sql.(*DB).connectionOpener(0xc2080755e0)
	/goroot/src/database/sql/sql.go:589 +0x4c fp=0xc2080187d8 sp=0xc2080187a8
runtime.goexit()
	/goroot/src/runtime/asm_amd64.s:2232 +0x1 fp=0xc2080187e0 sp=0xc2080187d8
created by database/sql.Open
	/goroot/src/database/sql/sql.go:452 +0x31c

goroutine 7 [syscall]:
runtime.notetsleepg(0x568dd8, 0xbebb461, 0x16)
	/goroot/src/runtime/lock_sema.go:266 +0x8c fp=0xc208017f68 sp=0xc208017f28
runtime.timerproc()
	/goroot/src/runtime/time.go:207 +0xfa fp=0xc208017fe0 sp=0xc208017f68
runtime.goexit()
	/goroot/src/runtime/asm_amd64.s:2232 +0x1 fp=0xc208017fe8 sp=0xc208017fe0
created by runtime.addtimerLocked
	/goroot/src/runtime/time.go:113 +0x1ba

goroutine 9 [IO wait]:
runtime.gopark(0x1a080, 0x5f3100, 0x384e50, 0x7)
	/goroot/src/runtime/proc.go:130 +0x105 fp=0xc2080192e0 sp=0xc2080192b0
runtime.netpollblock(0x5f30d8, 0x72, 0xc208126100)
	/goroot/src/runtime/netpoll.go:347 +0x170 fp=0xc208019320 sp=0xc2080192e0
runtime.netpollWait(0x5f30d8, 0x72, 0x8)
	/goroot/src/runtime/netpoll.go:150 +0x68 fp=0xc208019340 sp=0xc208019320
net.(*pollDesc).Wait(0xc2080e6060, 0x72, 0x0, 0x0)
	/goroot/src/net/fd_poll_runtime.go:84 +0x47 fp=0xc208019360 sp=0xc208019340
net.(*pollDesc).WaitRead(0xc2080e6060, 0x0, 0x0)
	/goroot/src/net/fd_poll_runtime.go:89 +0x43 fp=0xc208019388 sp=0xc208019360
net.(*netFD).Read(0xc2080e6000, 0xc2080c8000, 0x1000, 0x1000, 0x0, 0x5f1cc0, 0xc2081260f8)
	/goroot/src/net/fd_unix.go:242 +0x40f fp=0xc208019428 sp=0xc208019388
net.(*conn).Read(0xc208036030, 0xc2080c8000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
	/goroot/src/net/net.go:121 +0xdc fp=0xc208019470 sp=0xc208019428
bufio.(*Reader).fill(0xc208092180)
	/goroot/src/bufio/bufio.go:97 +0x1ce fp=0xc208019518 sp=0xc208019470
bufio.(*Reader).Read(0xc208092180, 0xc208054020, 0x5, 0x200, 0x0, 0x0, 0x0)
	/goroot/src/bufio/bufio.go:174 +0x26c fp=0xc2080195b0 sp=0xc208019518
io.ReadAtLeast(0x5f31c0, 0xc208092180, 0xc208054020, 0x5, 0x200, 0x5, 0x0, 0x0, 0x0)
	/goroot/src/io/io.go:298 +0xf1 fp=0xc208019608 sp=0xc2080195b0
io.ReadFull(0x5f31c0, 0xc208092180, 0xc208054020, 0x5, 0x200, 0x0, 0x0, 0x0)
	/goroot/src/io/io.go:316 +0x6d fp=0xc208019658 sp=0xc208019608
github.com/lib/pq.(*conn).recvMessage(0xc208054000, 0xc20811d0c0, 0x0, 0x0, 0x0)
	/gopath/src/github.com/lib/pq/conn.go:892 +0x17a fp=0xc208019728 sp=0xc208019658
github.com/lib/pq.(*copyin).resploop(0xc2080383c0)
	/gopath/src/github.com/lib/pq/copy.go:143 +0x55 fp=0xc2080197d8 sp=0xc208019728
runtime.goexit()
	/goroot/src/runtime/asm_amd64.s:2232 +0x1 fp=0xc2080197e0 sp=0xc2080197d8
created by github.com/lib/pq.(*conn).prepareCopyIn
	/gopath/src/github.com/lib/pq/copy.go:90 +0xae5

goroutine 10 [select]:
runtime.gopark(0x1ebd0, 0xc208019f40, 0x39afb0, 0x6)
	/goroot/src/runtime/proc.go:130 +0x105 fp=0xc208019d58 sp=0xc208019d28
runtime.selectgoImpl(0xc208019f40, 0x0, 0x18)
	/goroot/src/runtime/select.go:366 +0xb2c fp=0xc208019ef0 sp=0xc208019d58
runtime.selectgo(0xc208019f40)
	/goroot/src/runtime/select.go:183 +0x12 fp=0xc208019f10 sp=0xc208019ef0
github.com/cheggaaa/pb.(*ProgressBar).refresher(0xc208076000)
	/gopath/src/github.com/cheggaaa/pb/pb.go:423 +0xff fp=0xc208019fd8 sp=0xc208019f10
runtime.goexit()
	/goroot/src/runtime/asm_amd64.s:2232 +0x1 fp=0xc208019fe0 sp=0xc208019fd8
created by github.com/cheggaaa/pb.(*ProgressBar).Start
	/gopath/src/github.com/cheggaaa/pb/pb.go:123 +0xcd

Improper handling of delimiters

Apparently the CSV parser is seriously weak, as it can't handle commas appearing inside double quoted strings, something allowed by csv.

Foo,Bar,"Something long, with a comma in it" 

this is perfectly valid csv

Does not handle object of objects

I have a large (~2GB) JSON file that is a backup of a Firebase database. It is arranged as an object of objects (e.g.: {"objectA": {}, "objectB": [{},{}], "objectC": {}} ). When I import with pgfutter, I get "0 rows imported", even though pgfutter does seem to go through the whole file fairly quickly (less than 2 minutes).

I have experimented with trying to split the file with jq and writing my own stream-parsing code, but none of it seems to give me anything I can actually import properly with pgfutter. Can you help please?

Import fails with column named "end"

Thanks for supporting this awesome tool. I really appreciate the ease of use.

Found a minor bug when attempting to import a table that used the word "end" as a column name. Appears to capture the column names correctly but then fails to import any rows. If this isn't easily fixed a warning message would be very useful. Reprex below:

$ pgfutter csv friends.csv 
3 columns
[name age friends]
 73 B / 73 B [================================================================================================] 100.00% 0s
2 rows imported into public.friends

$ pgfutter csv friends_with_end.csv 
3 columns
[name end friends]

$ cat friends.csv 
name,age,friends
Jacob,26,"Anthony"
Anthony,25,""
Emma,28,"Jacob,Anthony"

$ cat friends_with_end.csv 
name,end,friends
Jacob,26,"Anthony"
Anthony,25,""
Emma,28,"Jacob,Anthony"

pgfutter version 1.2
psql 10.6 (Ubuntu 10.6-1.pgdg18.04+1)
Ubuntu 18.04.1 LTS

Is there a chance to implement multithreading?

I use pgfutter to import regularly json-files of several GBs and it works great!

The only thing is, it seems to run only on one thread, which leaves a lot of resources unused during the import process. Would be cool, if pgfutter would automatically split up the import file into junks which would get imported on multiple threads in parallel!
Is there a chance to implement this?

Invalid memory panic when # is in header

pgfutter cannot handle a "#" in the header, I suspect because Postgres doesn't support that so pgfutter should sanitize the header names or report an error on attempt. It results in a memory panic (similar to #6):

./pgfutter --table numbers --schema "foo" --host 127.0.0.1 --dbname test --pass 123 csv numbers.csv
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x0 pc=0x7d151]

goroutine 1 [running]:
database/sql.(*Stmt).Exec(0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
    /goroot/src/database/sql/sql.go:1302 +0x3e1
main.NewCSVImport(0xc20805d5e0, 0xc208033426, 0x6, 0xc208033450, 0x7, 0xc2080a4900, 0x9, 0x10, 0x0, 0x0, ...)
    /usr/src/pgfutter/import.go:21 +0xfd
main.importCSV(0x7fff5fbffc69, 0xc, 0xc208104070, 0x6b, 0xc208033426, 0x6, 0xc208033450, 0x7, 0x0, 0x55c470, ...)
    /usr/src/pgfutter/csv.go:127 +0x444
main.func·003(0xc20805ea00)
    /usr/src/pgfutter/pgfutter.go:165 +0x369
github.com/codegangsta/cli.Command.Run(0x380590, 0x3, 0x0, 0x0, 0x0, 0x0, 0x0, 0x3d05f0, 0x18, 0x0, ...)
    /gopath/src/github.com/codegangsta/cli/command.go:174 +0x1403
github.com/codegangsta/cli.(*App).Run(0xc20805e780, 0xc20800a000, 0xd, 0xd, 0x0, 0x0)
    /gopath/src/github.com/codegangsta/cli/app.go:187 +0x1221
main.main()
    /usr/src/pgfutter/pgfutter.go:171 +0xba4

goroutine 5 [syscall]:
os/signal.loop()
    /goroot/src/os/signal/signal_unix.go:21 +0x1f
created by os/signal.init·1
    /goroot/src/os/signal/signal_unix.go:27 +0x35

This file makes it fail:

CarNumber,fname,lname,License #,IsLicenseValid,LicenseStatus,LicenseExpires,HomeRegion,email
0,Randy,Foo,968,True,Competition,"Aug 31, 2016",SouthAtlantic,[email protected]

While this works:

CarNumber,fname,lname,LicenseNo,IsLicenseValid,LicenseStatus,LicenseExpires,HomeRegion,email
0,Randy,Foo,968,True,Competition,"Aug 31, 2016",SouthAtlantic,[email protected]

JSONB instead of JSON

Hi,
Would it be possible/make sense to allow JSONB instead of JSON as the data column type ?
Currently I end up changing the data column to JSONB after its imported which takes time.

Thanks for the a great project btw.

Regards

fatal error: MSpanList_Insert

using
https://fallingfruit.org/locations.csv.bz2

./pgfutter csv locations.csv

64.14 MB / 293.78 MB [==========================>---------------------------------------------------------------------------------------------] 21.83% 12sfailed MSpanList_Insert 0x5e50a8 0x11e98c9a501b0 0x0
fatal error: MSpanList_Insert

runtime stack:
runtime.MSpanList_Insert(0x578850, 0x5e50a8)
/goroot/src/runtime/mheap.c:692 +0x8f
runtime.MHeap_Free(0x5787e0, 0x5e50a8, 0x0)
/goroot/src/runtime/mheap.c:500 +0x5b
runtime.MCentral_FreeSpan(0x580438, 0x5e50a8, 0x20, 0xc208090000, 0xc208091f00, 0x0, 0x64)
/goroot/src/runtime/mcentral.c:181 +0x1bb
runtime.MSpan_Sweep(0x5e50a8, 0xa2900000a00, 0x1)
/goroot/src/runtime/mgc0.c:1099 +0x477
runtime.MHeap_Alloc(0x5787e0, 0x1, 0x10000000008, 0x15159)
/goroot/src/runtime/mheap.c:240 +0x66
runtime.MCentral_CacheSpan(0x57fa18, 0xc2080ec708)
/goroot/src/runtime/mcentral.c:85 +0x167
runtime.MCache_Refill(0x5e0000, 0x8, 0x10005e6490)
/goroot/src/runtime/mcache.c:90 +0xa0

goroutine 1 [running]:
runtime.switchtoM()
/goroot/src/runtime/asm_amd64.s:198 fp=0xc2080ec7c8 sp=0xc2080ec7c0
runtime.mallocgc(0x70, 0x0, 0x3, 0x0)
/goroot/src/runtime/malloc.go:178 +0x849 fp=0xc2080ec878 sp=0xc2080ec7c8
runtime.rawstring(0x6c, 0x0, 0x0, 0x0, 0x0, 0x0)
/goroot/src/runtime/string.go:195 +0x93 fp=0xc2080ec8a8 sp=0xc2080ec878
runtime.slicebytetostring(0xc208011200, 0x6c, 0x81f, 0x0, 0x0)
/goroot/src/runtime/string.go:69 +0x4b fp=0xc2080ec938 sp=0xc2080ec8a8
encoding/csv.(*Reader).parseRecord(0xc2080383c0, 0xc208123400, 0x5, 0x8, 0x0, 0x0)
/goroot/src/encoding/csv/reader.go:235 +0x449 fp=0xc2080ec9f8 sp=0xc2080ec938
encoding/csv.(*Reader).Read(0xc2080383c0, 0x0, 0x0, 0x0, 0x0, 0x0)
/goroot/src/encoding/csv/reader.go:136 +0x5e fp=0xc2080eca60 sp=0xc2080ec9f8
main.copyCSVRows(0xc208031e10, 0xc2080383c0, 0x0, 0xc208031795, 0x1, 0xc2080c6000, 0x10, 0x10, 0x0, 0x0, ...)
/usr/src/pgfutter/csv.go:57 +0xa9 fp=0xc2080ecc08 sp=0xc2080eca60
main.importCSV(0x7fff5fbffb91, 0xd, 0xc2080e62a0, 0x68, 0xc2080315b6, 0x6, 0x7fff5fbffb91, 0x9, 0x0, 0x571590, ...)
/usr/src/pgfutter/csv.go:144 +0x530 fp=0xc2080ece40 sp=0xc2080ecc08
main.func·003(0xc208056a00)
/usr/src/pgfutter/pgfutter.go:172 +0x332 fp=0xc2080ecf60 sp=0xc2080ece40
runtime.call16(0x440268, 0xc2080340b0, 0x800000008)
/goroot/src/runtime/asm_amd64.s:401 +0x45 fp=0xc2080ecf78 sp=0xc2080ecf60
reflect.Value.call(0x29e880, 0x440268, 0x13, 0x382790, 0x4, 0xc2080ed3d0, 0x1, 0x1, 0x0, 0x0, ...)
/goroot/src/reflect/value.go:419 +0x10e5 fp=0xc2080ed290 sp=0xc2080ecf78
reflect.Value.Call(0x29e880, 0x440268, 0x13, 0xc2080ed3d0, 0x1, 0x1, 0x0, 0x0, 0x0)
/goroot/src/reflect/value.go:296 +0xbc fp=0xc2080ed2f0 sp=0xc2080ed290
github.com/codegangsta/cli.HandleAction(0x29e880, 0x440268, 0xc208056a00, 0x0, 0x0)
/gopath/src/github.com/codegangsta/cli/app.go:487 +0x2b1 fp=0xc2080ed420 sp=0xc2080ed2f0
github.com/codegangsta/cli.Command.Run(0x38eb70, 0x3, 0x0, 0x0, 0x0, 0x0, 0x0, 0x3dfcf0, 0x18, 0x0, ...)
/gopath/src/github.com/codegangsta/cli/command.go:191 +0x10a5 fp=0xc2080ed998 sp=0xc2080ed420
github.com/codegangsta/cli.(*App).Run(0xc2080e8000, 0xc20800a000, 0x7, 0x7, 0x0, 0x0)
/gopath/src/github.com/codegangsta/cli/app.go:240 +0xa76 fp=0xc2080edc10 sp=0xc2080ed998
main.main()
/usr/src/pgfutter/pgfutter.go:178 +0xcd3 fp=0xc2080edf98 sp=0xc2080edc10
runtime.main()
/goroot/src/runtime/proc.go:63 +0xf3 fp=0xc2080edfe0 sp=0xc2080edf98
runtime.goexit()
/goroot/src/runtime/asm_amd64.s:2232 +0x1 fp=0xc2080edfe8 sp=0xc2080edfe0

goroutine 5 [syscall]:
os/signal.loop()
/goroot/src/os/signal/signal_unix.go:21 +0x1f
created by os/signal.init·1
/goroot/src/os/signal/signal_unix.go:27 +0x35

goroutine 6 [chan receive]:
database/sql.(*DB).connectionOpener(0xc208039860)
/goroot/src/database/sql/sql.go:589 +0x4c
created by database/sql.Open
/goroot/src/database/sql/sql.go:452 +0x31c

goroutine 9 [IO wait]:
net.(*pollDesc).Wait(0xc2080e6370, 0x72, 0x0, 0x0)
/goroot/src/net/fd_poll_runtime.go:84 +0x47
net.(*pollDesc).WaitRead(0xc2080e6370, 0x0, 0x0)
/goroot/src/net/fd_poll_runtime.go:89 +0x43
net.(*netFD).Read(0xc2080e6310, 0xc20810e000, 0x1000, 0x1000, 0x0, 0x5f1cc0, 0xc208144b50)
/goroot/src/net/fd_unix.go:242 +0x40f
net.(*conn).Read(0xc208034108, 0xc20810e000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/goroot/src/net/net.go:121 +0xdc
bufio.(*Reader).fill(0xc208092840)
/goroot/src/bufio/bufio.go:97 +0x1ce
bufio.(*Reader).Read(0xc208092840, 0xc208058f20, 0x5, 0x200, 0x0, 0x0, 0x0)
/goroot/src/bufio/bufio.go:174 +0x26c
io.ReadAtLeast(0x5f3220, 0xc208092840, 0xc208058f20, 0x5, 0x200, 0x5, 0x0, 0x0, 0x0)
/goroot/src/io/io.go:298 +0xf1
io.ReadFull(0x5f3220, 0xc208092840, 0xc208058f20, 0x5, 0x200, 0x0, 0x0, 0x0)
/goroot/src/io/io.go:316 +0x6d
github.com/lib/pq.(*conn).recvMessage(0xc208058f00, 0xc20814a6c0, 0x0, 0x0, 0x0)
/gopath/src/github.com/lib/pq/conn.go:892 +0x17a
github.com/lib/pq.(*copyin).resploop(0xc208036320)
/gopath/src/github.com/lib/pq/copy.go:143 +0x55
created by github.com/lib/pq.(*conn).prepareCopyIn
/gopath/src/github.com/lib/pq/copy.go:90 +0xae5

goroutine 10 [select]:
github.com/cheggaaa/pb.(*ProgressBar).refresher(0xc208056000)
/gopath/src/github.com/cheggaaa/pb/pb.go:423 +0xff
created by github.com/cheggaaa/pb.(*ProgressBar).Start
/gopath/src/github.com/cheggaaa/pb/pb.go:123 +0xcd

No import and almost no response ?

Hi
I have tried PGFutter on two windows servers today... I am unable to get the program to import... I does not give any response... Only if I really mess up with one of the the options I can get a "Incorrect Usage. flag provided but not defined: -usr"
Every other trial - for instance to provide a filename that does not exist - nothing

Very strange... Any ideas ?

Unable to import JSON

I'm on OSX 10.14, go version 1.11.5, installed pgfutter from source. I'm unable to import JSON into Postgres (tried with Postgres versions 9.4 and 10.6). When I try importing a test file with 4 objects in JSON Lines format, pgfutter says 1 row was imported successfully and 3 failed:

pgfutter --ignore-errors --jsonb json test
 0 B / 53 B [------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------]   0.00%
{"foo":"bar"}
{"banana":3}
{"apple":[4]}
 53 B / 53 B [====================================================================================================================================================================================================================] 100.00% 0s

Looking in the db table, no rows were created.

I've fallen back to importing via COPY reading from stdin:

# HACK: COPY, when running in text mode, unescapes backslashes, which renders JSON that contains that escape character invalid. So we need to prevent
# COPY from doing that. To do so, we use csv mode plus QUOTE and DELIMITER characters that are prohibited by JSON spec.
# Cf. https://stackoverflow.com/a/52355439, http://adpgtech.blogspot.com/2014/09/importing-json-data.html
< $file psql -h localhost -p 5432 mydb -c "COPY ingest (data) FROM STDIN csv quote e'\x01' delimiter e'\x02';"

Table never creates

Trying to import test data into PostgreSQL 9.1
pgfutter --user postgres --port 30000 --schema public csv friends.csv
74 B / 74 B [====================] 100.00% 0
3 rows imported into public.friends

No table is created though. Looking in pg_log I see this...
ERROR: schema "public" already exists
STATEMENT: CREATE SCHEMA public

I have tried creating a new schema as well but still no joy. This is on a CentOS 6.7 box. Here is the version info if it helps.
PostgreSQL 9.1.23 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-17), 64-bit

Does not Strip UTF BOM from the Column Name

When the file contains the UTF BOM (0xEFBBBF), the BOM characters become the part of the first column name, which makes it impossible to select or easily rename later in psql or any other tool.

For example:


$ hexdump -C file-with-utf-bom.csv

00000000  ef bb bf 74 69 74 6c 65  2c 66 69 72 73 74 2c 6c  |...title,first,l|
00000010  61 73 74 2c 65 6d 61 69  6c 2c 70 6f 73 69 74 69  |ast,email,positi|
00000020  6f 6e 2c 6f 72 67 61 6e  69 7a 61 74 69 6f 6e     |on,organization |

Queries then fail in a weird way:

select title from import.file_with_utf_bom;

ERROR:  column "title" does not exist
LINE 1:  select title from import.file_with_utf_bom;
                ^
HINT:  Perhaps you meant to reference the column "file_with_utf_bom.title".

The first field name in the database now contains those characters:

select column_name,  encode(column_name::bytea, 'hex')  as encoded
  from information_schema.columns 
 where table_name = 'file_with_utf_bom' and table_schema = 'import';


column_name   | encoded
--------------+---------------------------
title	        efbbbf7469746c65 <-- **note efbbbf here**
first	        6669727374
last	        6c617374
email	        656d61696c
position	706f736974696f6e
organization	6f7267616e697a6174696f6e

String sanitizier should check for non-printable characters when processing column names.

Workaround

For people running into this problem, there are two workarounds:

a) Remove BOM from the file before import

awk '{if(NR==1)sub(/^\xef\xbb\xbf/,"");print}' file-with-utf-bom.csv > file-wo-bom.csv

b) Rename the column with dynamic sql:

do $$
begin
execute 'alter table import.file_with_utf_bom rename column ' || convert_from('\xefbbbf7469746c65', 'utf-8') || ' to title';
end $$ language plpgsql;

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.