Connectors

Introduction

datayoga supports a wide variety of connectors to support external sources including stream providers, relational databases, non-relational databases, blob storage, and external APIs.

the connections are defined in the connections.yaml. This file includes a reference to a logical name for each declared connection along with its extra configuration properties and credentials.

Some connectors require installation of optional drivers.

Connections.yaml Example

Example

dwh:
  type: postgresql
  username: pg
  password: ${oc.env:PG_PWD}
  host: localhost
  port: 5432
  database: rww

Supported Connectors

Connector Used by PyPi Driver Connector Properties Connection Arguments(connect_args) Query Arguments(query_args)
Amazon Athena relational.read relational.write pip install "PyAthenaJDBC>1.0.9 , pip install "PyAthena>1.2.0 aws_access_key_id aws_secret_access_key region_name    
Amazon Redshift relational.read relational.write pip install sqlalchemy-redshift username password aws_end_point database    
Apache Drill relational.read relational.write pip install sqlalchemy-drill      
Apache Druid relational.read relational.write pip install pydruid username password host port    
Apache Hive relational.read relational.write pip install pyhive host port database    
Apache Impala relational.read relational.write pip install impyla host port database    
Apache Kylin relational.read relational.write pip install kylinpy host port database password project    
Apache Pinot relational.read relational.write pip install pinotdb broker server    
Apache Solr relational.read relational.write pip install sqlalchemy-solr username password host port server_path collection    
Apache Spark SQL relational.read relational.write pip install pyhive host port database    
Ascend.io relational.read relational.write pip install impyla host port database    
Azure MS SQL relational.read relational.write pip install pymssql mssql+pymssql://UserName@presetSQL:TestPassword@presetSQL.database.windows.net:1433/TestSchema    
Big Query relational.read relational.write pip install pybigquery bigquery://{project_id}    
ClickHouse relational.read relational.write pip install clickhouse-sqlalchemy clickhouse+native://{username}:{password}@{hostname}:{port}/{database}    
CockroachDB relational.read relational.write pip install cockroachdb cockroachdb://root@{hostname}:{port}/{database}?sslmode=disable    
Dremio relational.read relational.write pip install sqlalchemy_dremio dremio://user:pwd@host:31010/    
Elasticsearch relational.read relational.write pip install elasticsearch-dbapi elasticsearch+http://{user}:{password}@{host}:9200/    
Exasol relational.read relational.write pip install sqlalchemy-exasol exa+pyodbc://{username}:{password}@{hostname}:{port}/my_schema?CONNECTIONLCALL=en_US.UTF-8&driver=EXAODBC    
Google Sheets relational.read relational.write pip install shillelagh[gsheetsapi] gsheets://    
Firebolt relational.read relational.write pip install firebolt-sqlalchemy firebolt://{username}:{password}@{database} or firebolt://{username}:{password}@{database}/{engine_name}    
Hologres relational.read relational.write pip install psycopg2 postgresql+psycopg2://<UserName>:<DBPassword>@<Database Host>/<Database Name>    
IBM Db2 relational.read relational.write pip install ibm_db_sa db2+ibm_db://    
IBM Netezza Performance Server relational.read relational.write pip install nzalchemy netezza+nzpy://<UserName>:<DBPassword>@<Database Host>/<Database Name>    
MySQL relational.read relational.write pip install mysqlclient mysql://<UserName>:<DBPassword>@<Database Host>/<Database Name> ssl_ca, ssl_cert, ssl_key  
Oracle relational.read relational.write pip install oracledb oracle+oracledb://    
PostgreSQL relational.read relational.write pip install psycopg2 postgresql://<UserName>:<DBPassword>@<Database Host>/<Database Name>   sslmode, sslrootcert, sslkey, sslcert
Trino relational.read relational.write pip install sqlalchemy-trino trino://{username}:{password}@{hostname}:{port}/{catalog}    
Presto relational.read relational.write pip install pyhive presto://    
SAP Hana relational.read relational.write pip install hdbcli sqlalchemy-hana or pip install apache-superset[hana] hana://{username}:{password}@{host}:{port}    
Snowflake relational.read relational.write pip install snowflake-sqlalchemy snowflake://{user}:{password}@{account}.{region}/{database}?role={role}&warehouse={warehouse}    
SQLite relational.read relational.write No additional library needed sqlite://    
SQL Server relational.read relational.write pip install pymssql mssql://    
Teradata relational.read relational.write pip install teradatasqlalchemy teradata://{user}:{password}@{host}    
TimescaleDB relational.read relational.write pip install psycopg2 username password host port database    
Vertica relational.read relational.write pip install sqlalchemy-vertica-python vertica+vertica_python://<UserName>:<DBPassword>@<Database Host>/<Database Name>    
YugabyteDB relational.read relational.write pip install psycopg2 postgresql://<UserName>:<DBPassword>@<Database Host>/<Database Name>    
Amazon S3 write_cloud_storage extract_cloud_storage pip install boto3      
GCP GS write_cloud_storage extract_cloud_storage pip install google-cloud-storage      
Azure write_cloud_storage extract_cloud_storage pip install azure-storage-blob      
Redis read_redis redis.write pip install redis      
MongoDB read_mongodb write_mongodb pip install pymongo      
ElasticSearch read_elasticsearch write_elasticsearch pip install elasticsearch nodes basic_auth ca_certs api_key bearer_auth    

Interpolation

DataYoga supports variable interpolation. The interpolated variable can be the path to another node in the configuration, and in that case the value will be the value of that node. This path may use either dot-notation (foo.1), brackets ([foo][1]) or a mix of both (foo[1], [foo].1).

pg1:
  type: postgresql
  username: pg
  password: ${}
  host: localhost
  port: 5432
  database: rww
pg2:
  type: ${pg1.type}
  username: pg
  password: ${}
  port: ${pg1.port}
  host: localhost

Interpolations are absolute by default. Relative interpolation are prefixed by one or more dots: The first dot denotes the level of the node itself and additional dots are going up the parent hierarchy. e.g. ${..foo} points to the foo sibling of the parent of the current node.

Environment Variables

Access to environment variables is supported using env:

Example:

pg1:
    pwd: ${env:PG}

It is possible to provide a default value in case the variable is not set:

pg1:
    host: ${env:PG_HOST,localhost}

Secrets

Access to secrets stored in tmpfs is supported using file:

The file should contain KEY=VALUE lines

Example:

pg1:
  pwd: ${file:/tmpfs/credentials:PWD}

It is possible to provide a default value in case the value is not set:

pg1:
  pwd: ${file:/tmpfs/credentials:PWD,12345}