Bulk data loader

📘

For a step-by-step tutorial on how to load data via the bulk data loader with example code included, visit the Load data to your table page.

If you're loading more than a few thousand rows, you'll want to bundle those into a bulk load for best performance. This is especially handy for an initial data load, where you may have large data files that you only need to load once.

To use the bulk loader, you'll need 2 files:

Data File

The data file to upload, in either JSON or CSV format (with more formats to come).

Control File

The second file needed is the control file, which provides the bulk loader with load parameters. The control file should include:

  • job_name - an arbitrary title to identify the load job
  • source_file - location of the data file
  • dest_table - name of the pre-existing table to load
  • chunk_records - number of records to add per load chunk (default is 1000)
    • optional performance currently is around 1MB per chunk
  • error_count - number of errors allowed before the load is stopped
    • value of 10 would allow 10 errors, the 11th error would stop the load
    • value of 0 would stop the load on any error
    • value of negative one (-1) will disable error tracking, allowing any number of errors

Loads are completed in chunks of records, allowing the engine to easily handle files of any size - from MB to PBs. Every load generates a timestamped log file that contains a record of all activity, as well as any records that contained errors.

You might note the absence of a few items:

  • No authentication - during controlled release, Space and Time will manually confirm (and schedule, if needed) all bulk uploads with each participant. Future versions will require additional security steps to prevent unauthorized loads.
  • No column definitions - columns are expected to match the destination table in count, position, and data type. Deviations simply throw errors. Future versions will allow optional definition of columns in the control file, which allows new intelligence in the bulk loader, such as:
    • file pre-validation, before sent to the database
    • loading by name, rather than positionally
    • optionally create table if missing

Files Built, Now What?

To ensure smooth operations during the Controlled Release, reach out to your Space and Time contact—they can provide you with upload locations and discuss scheduling.