Expression Language
DataYoga supports both SQL and JMESPath expressions. JMESPath are especially useful to handle nested JSON data, while SQL is more suited to flat row-like structures.
Notes
- Dot notation in expression represents nesting fields in the object, for example
name.first_name
refers to{ "name": { "first_name": "John" } }
. -
In SQL, use backticks to reference a nested column:
expression: UPPER(`nested.field`)
- In order to refer to a field that contains a dot in its name, escape it, for example
name\.first_name
refers to{ "name.first_name": "John" }
.
JMESPath Custom Functions
DataYoga adds the following custom functions to the standard JMESPath library:
Function | Description | Example | Comments |
---|---|---|---|
base64_decode | Decodes a base64(RFC 4648) encoded string | Input: {"encoded": "SGVsbG8gV29ybGQh"} Expression: base64_decode(encoded) Output: Hello World! | |
capitalize | Capitalizes all the words in the string | Input: {"name": "john doe"} Expression: capitalize(name) Output: John Doe | |
concat | Concatenates an array of variables or literals | Input: {"fname": "john", "lname": "doe"} Expression: concat([fname, ' ' ,lname]) Output: john doe | This is equivalent to the more verbose built-in expression: ' '.join([fname,lname]) |
filter_entries | Filters entries in a dictionary (object) based on the given JMESPath predicate | Input: { "name": "John", "age": 30, "country": "US", "score": 15} Expression: filter_entries(@, `key == 'name' \|\| key == 'age'`) Output: {"name": "John", "age": 30 } | |
from_entries | Converts an array of objects with key and value properties into a single object | Input: [{"key": "name", "value": "John"}, {"key": "age", "value": 30}, {"key": "city", "value": null}] Expression: from_entries(@) Output: {"name": "John", "age": 30, "city": null} | |
hash | Calculates a hash using the hash_name hash function and returns its hexadecimal representation | Input: {"some_str": "some_value"} Expression: hash(some_str, `sha1`) Output: 8c818171573b03feeae08b0b4ffeb6999e3afc05 | Supported algorithms: sha1 (default), sha256, md5, sha384, sha3_384, blake2b, sha512, sha3_224, sha224, sha3_256, sha3_512, blake2s |
in | Checks if an element matches any value in a list of values | Input: {"el": "b"} Expression: in(el, [“a”, “b”, “c”]) Output: True | |
left | Returns a specified number of characters from the start of a given text string | Input: {"greeting": "hello world!"} Expression: left(greeting, 5) Output: hello | |
lower | Converts all uppercase characters in a string into lowercase characters | Input: {"fname": "John"} Expression: lower(fname) Output: john | |
mid | Returns a specified number of characters from the middle of a given text string | Input: {"greeting": "hello world!"} Expression: mid(greeting, 4, 3) Output: o w | |
json_parse | Returns parsed object from the given json string | Input: {"data": '{"greeting": "hello world!"}'} Expression: parse_json(data) Output: {"greeting": "hello world!"} | |
regex_replace | Replaces a string that matches a regular expression | Input: {"text": "Banana Bannnana"} Expression: regex_replace(text, 'Ban\w+', 'Apple Apple') Output: Apple Apple | |
replace | Replaces all the occurrences of a substring with a new one | Input: {"sentence": "one four three four!"} Expression: replace(sentence, 'four', 'two') Output: one two three two! | |
right | Returns a specified number of characters from the end of a given text string | Input: {"greeting": "hello world!"} Expression: right(greeting, 6) Output: world! | |
split | Splits a string into a list of strings after breaking the given string by the specified delimiter (comma by default) | Input: {"departments": "finance,hr,r&d"} Expression: split(departments) Output: ['finance', 'hr', 'r&d'] | Default delimiter is comma - a different delimiter can be passed to the function as the second argument, for example: split(departments, ';') |
time_delta_days | Returns the number of days between a given dt and now (positive) or the number of days that have passed from now (negative) | Input: {"dt": '2021-10-06T18:56:16.701670+00:00'} Expression: time_delta_days(dt) Output: 365 | If dt is a string, ISO datetime (2011-11-04T00:05:23+04:00, for example) is assumed. If dt is a number, Unix timestamp (1320365123, for example) is assumed. |
time_delta_seconds | Returns the number of seconds between a given dt and now (positive) or the number of seconds that have passed from now (negative) | Input: {"dt": '2021-10-06T18:56:16.701670+00:00'} Expression: time_delta_days(dt) Output: 31557600 | If dt is a string, ISO datetime (2011-11-04T00:05:23+04:00, for example) is assumed. If dt is a number, Unix timestamp (1320365123, for example) is assumed. |
to_entries | Converts a given object into an array of objects with key and value properties | Input: {"name": "John", "age": 30, "city": null} Expression: to_entries(@) Output: [{"key": "name", "value": "John"}, {"key": "age", "value": 30}, {"key": "city", "value": null}] | |
upper | Converts all lowercase characters in a string into uppercase characters | Input: {"fname": "john"} Expression: upper(fname) Output: JOHN | |
uuid | Generates a random UUID4 and returns it as a string in standard format | Input: None Expression: uuid() Output: 3264b35c-ff5d-44a8-8bc7-9be409dac2b7 |