In this article, you will learn how to get the most out of the Synthetic Variables Engine using special functions or their combination to render more complex expressions. If you are not familiar with Synthetic Variables yet, please refer here to leverage the benefits of this functionality.
- An IoT Entrepreneur license or greater.
Table of Contents
- Creating a Synthetic Variable using context data.
- Creating a Synthetic Variable using timestamp.
- Examples of data range functions.
- Examples of special Synthetic Functions.
1. Creating a Synthetic Variable using context data
Variable Context is one of the best ways to store metadata that isn't necessarily a numerical value. Most common usage of variable context is for the latitude and longitude GPS coordinates. While those values are indeed numerical, it comes quite handy to send them in a single dot instead of 2 separate variables.
Context data can only be used within your synthetic expression if the context is a numerical value. Accessing it from the Synthetic editor is straightforward, you just need to use the dot '.' operator over the variable, call its context and then the context key identifier:
Note: To find more information about context and other features to store values, please refer to Ubidots API documentation.
Suppose that you need to use the Haversine formula to calculate the shortest distance between two locations over the earth's surface based on Latitude and Longitude data. This would be as follows:
Δlat = lat2 -lat1
Δlng = lng2 -lng1
a = sin²(Δlat/2) + cos lat1 ⋅ cos lat2 ⋅ sin²(Δlng/2)
c = 2 ⋅ arctan2( √a, √(1−a) )
d = R ⋅ c
lat is latitude,
lng is longitude and
R is earth’s radius (
mean radius = 6,373km); note that angles need to be in radians in order to pass it to the trigonometric functions.
Keep in mind that latitude and longitude are numerical data stored inside the context of a variable as shown below:
The synthetic expression to calculate the distance will look similar to this one in the synthetic variable editor:
lat1, lat2, lng1, and
lng2 uses the value inside the context to make the proper calculations.
IMPORTANT NOTE: The example above assumes that the variables have the same timestamp. If not, the
fill_missing() expression needs to be included.
2. Creating a Synthetic Variable using timestamp
Similarly to accessing context data it happens with the timestamp. It can be accessed in Synthetic Variables using the dot '.' operator.
Imagine you have a variable, like in the image above, storing a value every time a machine TURNS ON (1) or TURNS OFF (0), and you want to calculate how long the machine remains in an ON state (1). This calculation can be done using the variable timestamp as shown below:
previousTime: Timestamps in milliseconds when the machine was ON
actualTime: Timestamps in milliseconds when the machine was OFF
dV: Difference between actual and previous value to monitor if there was a change on the machine ON/OF status.
dT: Time the machine remains ON.
At last, one needs to evaluate on
dV's sign to check for a negative value, which in turn indicates the machine when from ON to OFF. To do so, there's the function
where(), that based on an input condition, takes one of 2 actions for the
False condition outcome. For the case herein, it serves well to evaluate the sign and if True, save the time difference between the last 2 states, that is,
Keep in mind this time is in always in milliseconds [ms], if needed, you can convert it to seconds, minutes or hours multiplying for the correct factor.
3. Example of data range functions
Data range functions are a perfect fit to make statistical analysis to identify trends in a time period basis. With them, you will not only be able to calculate a value every fixed period of time but also have some flexibility to select the timestamp to which data will be stored, and an initial offset as well, if needed.
All data range functions follow the below syntax:
function_name(variable, data_range, position="start", offset=0)
data_range can be any of the options described here,
position (optional) can be either of 2 options,
"end", which indicates whether value is going to be timestamped at the beginning or end of the
data_range, and finally, the parameter
offset (optional), as it suggest, shifts the starting point from where the function will compute.
IMPORTANT NOTE: The data ranges "nW", "nM" and "nY" default to
position="end" whereas the rest to
The above synthetic expression calculates the
mean of variable
x every 1 hour (
"1H" ), and because non of the optional parameters are defined, the values will be timestamped beginning every hour, that is, all values received from 00:00:00 to 01:00:00 will be mean-aggregated into a single value timestamped at 00:00:00.
mean(x, "1H", position="end")
Similarly to example 1., synthetic expression 2. will compute variable
x mean value every hour but in this case, the engine, because of the
position="end" parameter, will timestamp each value at the end of the data range, so for values received from 00:00:00 to 01:00:00, the mean-aggregated value is timestamped at 00:59:59.
mean(x, "8H", offset=6)
Let's suppose you're working on a factory with the following shifts schedule: 6:00 AM – 2:00 PM, 2:00 PM – 10:00 PM and 10:00 PM – 6:00 AM. You'd like to average the variable
x every 8 hours to match each shift duration. However, the synthetic engine, by default, starts computing at 00:00:00 and not at the beginning of every shift. Here's when the last parameter,
offset, comes in handy. Making
offset=6 indicates the engine to start computing at 06:00:00 in periods of 8 hours, rendering 3 values per day timestamped at 06:00:00, 14:00:00 and 22:00:00.
mean(x, "8H", position="end", offset=6)
Based on the same example as in 3. we have now make use of all the parameters. In this case, the synthetic will compute starting at 06:00:00 in an 8 hours period basis but because of the
position="end" parameter, values will be timestamped at 13:59:59, 21:59:59 and 05:59:59 for the 6:00 AM, 2:00 PM and 10:00 PM shifts, respectively.
offset parameter works only with values that evenly subdivide 1 day.
4. Examples of special Synthetic Functions
There are many instances where a Synthetic Expression involves 2 or more variable time-series, and for those cases, because of the way the Synthetic Variables engine is built, each of the values to be computed must have the same timestamp, otherwise, the engine won't compute or output unexpected results. Nevertheless, not always, if not most of the cases, the variables time-series won't comply to this rule. For such instances, it exists the
fill_missing() function, which basically fills the gaps where there's a missing value in either of the variables time-series used within the expression. This function follows the below default syntax:
fill_missing(expression, first_fill="ffill", last_fill="None", fill_value="None")
where the only mandatory argument is
expression. The others,
fill_value are optional. Furthermore, the optional arguments
last_fill can take either of 3 values:
bfill (back_fill) or
fill_value can be
None or any other
To better understand this function, let's suppose you have a set of variables that you want to sum up together. As you can see in the below table, the variables A, B, C and D don't have data in all of the timestamps, so
fill_missing() function is needed to fill the gaps and ultimately perform the calculation. The examples from 1 to 4 are based on said table:
fill_missing(A + B + C + D)
By default, the
fill_missing() expression makes the
first_fill parameter as
ffill, meaning that the function will fill the gaps forward, starting in a point where it has enough data to fill the gaps for all the involved series. The result would be as follows:
fill_missing(A + B + C + D, first_fill="bfill")
On the other hand, the
first_fill can also be set as a
bfill. The gaps would be filled as shown below:
fill_missing(A + B + C + D, first_fill="ffill", last_fill="bfill")
Let's suppose you need to fill all the gaps of the variables, in that case you'd need to set both
last_fill in the expression, rendering the following:
fill_missing(A + B + C + D, fill_value=0)
Additionally, you can fill the gaps with a value. For this case, the function sets a "0" in all the gaps the variables have.
Other users also found helpful...