Expression Language

DataYoga supports both SQL and JMESPath expressions. JMESPath are especially useful to handle nested JSON data, while SQL is more suited to flat row-like structures.

Notes

  • Dot notation in expression represents nesting fields in the object, for example name.first_name refers to { "name": { "first_name": "John" } }.
  • In SQL, use backticks to reference a nested column:

    expression: UPPER(`nested.field`)
    
  • In order to refer to a field that contains a dot in its name, escape it, for example name\.first_name refers to { "name.first_name": "John" }.

JMESPath Custom Functions

DataYoga adds the following custom functions to the standard JMESPath library:

Function Description Example Comments
base64_decode Decodes a base64(RFC 4648) encoded string Input: {"encoded": "SGVsbG8gV29ybGQh"}
Expression: base64_decode(encoded)
Output: Hello World!
 
capitalize Capitalizes all the words in the string Input: {"name": "john doe"}
Expression: capitalize(name)
Output: John Doe
 
concat Concatenates an array of variables or literals Input: {"fname": "john", "lname": "doe"}
Expression: concat([fname, ' ' ,lname])
Output: john doe
This is equivalent to the more verbose built-in expression: ' '.join([fname,lname])
filter_entries Filters entries in a dictionary (object) based on the given JMESPath predicate Input: { "name": "John", "age": 30, "country": "US", "score": 15}
Expression: filter_entries(@, `key == 'name' \|\| key == 'age'`)
Output:{"name": "John", "age": 30 }
 
from_entries Converts an array of objects with key and value properties into a single object Input: [{"key": "name", "value": "John"}, {"key": "age", "value": 30}, {"key": "city", "value": null}]
Expression: from_entries(@)
Output: {"name": "John", "age": 30, "city": null}
 
hash Calculates a hash using the hash_name hash function and returns its hexadecimal representation Input: {"some_str": "some_value"}
Expression: hash(some_str, `sha1`)
Output: 8c818171573b03feeae08b0b4ffeb6999e3afc05
Supported algorithms: sha1 (default), sha256, md5, sha384, sha3_384, blake2b, sha512, sha3_224, sha224, sha3_256, sha3_512, blake2s
in Checks if an element matches any value in a list of values Input: {"el": "b"}
Expression: in(el, [“a”, “b”, “c”])
Output: True
 
left Returns a specified number of characters from the start of a given text string Input: {"greeting": "hello world!"}
Expression: left(greeting, 5)
Output: hello
 
lower Converts all uppercase characters in a string into lowercase characters Input: {"fname": "John"}
Expression: lower(fname)
Output: john
 
mid Returns a specified number of characters from the middle of a given text string Input: {"greeting": "hello world!"}
Expression: mid(greeting, 4, 3)
Output: o w
 
json_parse Returns parsed object from the given json string Input: {"data": '{"greeting": "hello world!"}'}
Expression: parse_json(data)
Output: {"greeting": "hello world!"}
 
regex_replace Replaces a string that matches a regular expression Input: {"text": "Banana Bannnana"}
Expression: regex_replace(text, 'Ban\w+', 'Apple Apple')
Output: Apple Apple
 
replace Replaces all the occurrences of a substring with a new one Input: {"sentence": "one four three four!"}
Expression: replace(sentence, 'four', 'two')
Output: one two three two!
 
right Returns a specified number of characters from the end of a given text string Input: {"greeting": "hello world!"}
Expression: right(greeting, 6)
Output: world!
 
split Splits a string into a list of strings after breaking the given string by the specified delimiter (comma by default) Input: {"departments": "finance,hr,r&d"}
Expression: split(departments)
Output: ['finance', 'hr', 'r&d']
Default delimiter is comma - a different delimiter can be passed to the function as the second argument, for example: split(departments, ';')
time_delta_days Returns the number of days between a given dt and now (positive) or the number of days that have passed from now (negative) Input: {"dt": '2021-10-06T18:56:16.701670+00:00'}
Expression: time_delta_days(dt)
Output: 365
If dt is a string, ISO datetime (2011-11-04T00:05:23+04:00, for example) is assumed. If dt is a number, Unix timestamp (1320365123, for example) is assumed.
time_delta_seconds Returns the number of seconds between a given dt and now (positive) or the number of seconds that have passed from now (negative) Input: {"dt": '2021-10-06T18:56:16.701670+00:00'}
Expression: time_delta_days(dt)
Output: 31557600
If dt is a string, ISO datetime (2011-11-04T00:05:23+04:00, for example) is assumed. If dt is a number, Unix timestamp (1320365123, for example) is assumed.
to_entries Converts a given object into an array of objects with key and value properties Input: {"name": "John", "age": 30, "city": null}
Expression: to_entries(@)
Output: [{"key": "name", "value": "John"}, {"key": "age", "value": 30}, {"key": "city", "value": null}]
 
upper Converts all lowercase characters in a string into uppercase characters Input: {"fname": "john"}
Expression: upper(fname)
Output: JOHN
 
uuid Generates a random UUID4 and returns it as a string in standard format Input: None
Expression: uuid()
Output: 3264b35c-ff5d-44a8-8bc7-9be409dac2b7