Getting Started
Upgrade Pip
pip install --upgrade pip
Install DataYoga CLI
pip install datayoga
Verify that the installation completed successfully by running this command:
datayoga --version
Create New DataYoga Project
To create a new DataYoga project, use the init
command:
datayoga init hello_world
cd hello_world
Run Your First Job
Let’s run our first job. It is pre-defined in the samples folder as part of the init
command. Our job is going to load a CSV file from the samples folder and transform it into JSON:
datayoga run sample.hello
If all goes well, you should see a list of JSON values that correspond to each line in the input CSV file:
{"id": "1", "fname": "john", "lname": "doe", "credit_card": "1234-1234-1234-1234", "country_code": "972", "country_name": "israel", "gender": "M", "full_name": "John Doe", "greeting": "Hello Mr. John Doe"}
{"id": "2", "fname": "jane", "lname": "doe", "credit_card": "1000-2000-3000-4000", "country_code": "972", "country_name": "israel", "gender": "F", "full_name": "Jane Doe", "greeting": "Hello Ms. Jane Doe"}
{"id": "3", "fname": "bill", "lname": "adams", "credit_card": "9999-8888-7777-666", "country_code": "1", "country_name": "usa", "gender": "M", "full_name": "Bill Adams", "greeting": "Hello Mr. Bill Adams"}
That’s it! You’ve created your first job that loads data from CSV, runs it through a series of transformation steps, and shows the data to the standard output. A good start!
Enhance the Job Definition
Let’s customize the structure of the resulting JSON.
The jobs are located under the jobs
folder and can be arrange into modules. The sample.hello
job you just ran is located in jobs/sample/hello.yaml
.
Add a new step of type map
into the chain of steps. To do this, open the job definition in a text editor and add the map
section below the comment to the job definition:
input:
uses: files.read_csv
with:
file: sample.csv
steps:
- uses: add_field
with:
fields:
- field: full_name
language: jmespath
expression: concat([capitalize(fname), ' ' , capitalize(lname)])
- field: greeting
language: sql
expression: "'Hello ' || CASE WHEN gender = 'F' THEN 'Ms.' WHEN gender = 'M' THEN 'Mr.' ELSE 'N/A' END || ' ' || full_name"
#
# map block
#
- uses: map
with:
expression:
{
greeting: greeting,
details: { id: id, first_name: fname, last_name: lname }
}
language: jmespath
- uses: std.write
Run the job again:
datayoga run sample.hello
You should now see the modified JSON as the output:
{"greeting": "Hello Mr. John Doe", "details": {"first_name": "john", "last_name": "doe"}}
{"greeting": "Hello Ms. Jane Doe", "details": {"first_name": "jane", "last_name": "doe"}}
{"greeting": "Hello Mr. Bill Adams", "details": {"first_name": "bill", "last_name": "adams"}}
Read on for a more detailed tutorial or check out the reference to see the different block types currently available.