Getting Started

Upgrade Pip

pip install --upgrade pip

Install DataYoga CLI

pip install datayoga

Verify that the installation completed successfully by running this command:

datayoga --version

Create New DataYoga Project

To create a new DataYoga project, use the init command:

datayoga init hello_world
cd hello_world

Directory structure

Run Your First Job

Let’s run our first job. It is pre-defined in the samples folder as part of the init command. Our job is going to load a CSV file from the samples folder and transform it into JSON:

datayoga run sample.hello

If all goes well, you should see a list of JSON values that correspond to each line in the input CSV file:

{"id": "1", "fname": "john", "lname": "doe", "credit_card": "1234-1234-1234-1234", "country_code": "972", "country_name": "israel", "gender": "M", "full_name": "John Doe", "greeting": "Hello Mr. John Doe"}
{"id": "2", "fname": "jane", "lname": "doe", "credit_card": "1000-2000-3000-4000", "country_code": "972", "country_name": "israel", "gender": "F", "full_name": "Jane Doe", "greeting": "Hello Ms. Jane Doe"}
{"id": "3", "fname": "bill", "lname": "adams", "credit_card": "9999-8888-7777-666", "country_code": "1", "country_name": "usa", "gender": "M", "full_name": "Bill Adams", "greeting": "Hello Mr. Bill Adams"}

That’s it! You’ve created your first job that loads data from CSV, runs it through a series of transformation steps, and shows the data to the standard output. A good start!

Enhance the Job Definition

Let’s customize the structure of the resulting JSON.

The jobs are located under the jobs folder and can be arrange into modules. The sample.hello job you just ran is located in jobs/sample/hello.yaml.

Add a new step of type map into the chain of steps. To do this, open the job definition in a text editor and add the map section below the comment to the job definition:

input:
  uses: files.read_csv
  with:
    file: sample.csv
steps:
  - uses: add_field
    with:
      fields:
        - field: full_name
          language: jmespath
          expression: concat([capitalize(fname), ' ' , capitalize(lname)])
        - field: greeting
          language: sql
          expression: "'Hello ' || CASE WHEN gender = 'F' THEN 'Ms.' WHEN gender = 'M' THEN 'Mr.' ELSE 'N/A' END || ' ' || full_name"
  #
  # map block
  #
  - uses: map
    with:
      expression:
        {
          greeting: greeting,
          details: { id: id, first_name: fname, last_name: lname }
        }
      language: jmespath
  - uses: std.write

Run the job again:

datayoga run sample.hello

You should now see the modified JSON as the output:

{"greeting": "Hello Mr. John Doe", "details": {"first_name": "john", "last_name": "doe"}}
{"greeting": "Hello Ms. Jane Doe", "details": {"first_name": "jane", "last_name": "doe"}}
{"greeting": "Hello Mr. Bill Adams", "details": {"first_name": "bill", "last_name": "adams"}}

Read on for a more detailed tutorial or check out the reference to see the different block types currently available.