Databolt Python Libraries

For data scientists and data engineers, DataBolt is a collection of python-based products to reduce the time it takes to get your data ready for analysis.

DataBolt Benefits

Designed to accelerate data science

Majority of time in data science is spent on tedious tasks unrelated to data analysis.

DataBolt simplifies those tasks so you can experience up to 10x productivity gains.

DataBolt Python

Modularized libraries to accelerate data workflows

Manage Workflows

Makes building complex data science workflows easy, fast and intuitive for 10x productivity gains

Manage Datasets

Turnkey solution to host data files, documentation and metadata so others can quickly use your data

Ingest Data

Quickly and reliably ingest raw CSV, TXT and Excel files to SQL, pandas, parquet and more

Join Data

Easily join different datasets without writing custom code using fuzzy matches

Standard features are free open source, for pro features request a demo

Github Request Demo

Manage Workflows

Python library for building highly effective data science workflows

Standard

  • Build workflow with task dependencies and parameters
  • Check task dependencies and their execution status
  • Intelligently execute tasks including dependencies
  • Intelligently continue workflows after changed/failed tasks

Pro

  • SQL storage integration
  • Dask and pyspark integration
  • Automatically detect data changes
  • Advanced machine learning features

Manage Datasets

Python library to push and pull data files like code

Standard

  • Quickly create public and private remote file storage
  • Push/pull data to/from remote file storage
  • Secure your data with best practice security
  • Centrally manage data files across multiple projects

Pro

  • Self-hosted remote storage
  • Onprem deploy
  • Data encryption
  • Data versioning

Ingest Data

Quickly ingest messy CSV and XLS files. Export to clean pandas, SQL, parquet

Standard

  • Check and fix data schema changes
  • Fast writing from pandas to postgres and MySQL
  • Ingest messy Excel files

Pro

  • Out-of-core support
  • MS SQL integration
  • Advanced database features

Join Data

Python library for easily joining datasets without writing custom code

Standard

  • Easily find join columns across dataframes
  • Automatic content-based exact joins
  • Prejoin quality diagnostics
  • Descriptive stats for id/string joins

Pro

  • Join >2 dataframes
  • Automatic Content-based similarity joins
  • Advanced join quality checks
  • Fast approximations for big data

Get started

You can get started for free with the standard open-source packages.
To get access to pro versions please request a demo.