how do I split a large csv.gz file in Google Cloud Storage? -
i error when trying load table in google bq:
input csv files not splittable , @ least 1 of files larger maximum allowed size. size is: 56659381010. max allowed size is: 4294967296.
is there way split file using gsutil or without having upload again?
the largest compressed csv file can load bigquery 4 gigabytes. gcs unfortunately not provide way decompress compressed file, nor provide way split compressed file. gzip'd files can't arbitrarily split , reassembled in way tar file.
i imagine best bet spin gce instance in same region gcs bucket, download object instance (which should pretty fast, given it's few dozen gigabytes), decompress object (which slower), break csv file bunch of smaller ones (the linux split
command useful this), , upload objects gcs.
Comments
Post a Comment