All Collections
User Guides
Analytics: Advanced Synthetic Variables
Analytics: Advanced Synthetic Variables

Learn how to take full advantage of synthetic variables using complex and special functions.

Written by Sergio M
Updated over a week ago

In this article, you will learn how to get the most out of the Synthetic Variables Engine using special functions or their combination to render more complex expressions. If you are not familiar with Synthetic Variables yet, please refer here to leverage the benefits of this functionality.


1. Creating a Synthetic Variable using context data

Variable Context is one of the best ways to store metadata that isn't necessarily a numerical value. Most common usage of variable context is for the latitude and longitude GPS coordinates. While those values are indeed numerical, it comes quite handy to send them in a single dot instead of 2 separate variables.

Context data can only be used within your synthetic expression if the context is a numerical value. Accessing it from the Synthetic editor is straightforward, you just need to use the dot '.' operator over the variable, call its context and then the context key identifier:


Note: To find more information about context and other features to store values, please refer to Ubidots API documentation.


Suppose that you need to use the Haversine formula to calculate the shortest distance between two locations over the earth's surface based on Latitude and Longitude data. This would be as follows:

Δlat = lat2 -lat1
Δlng = lng2 -lng1
a = sin²(Δlat/2) + cos lat1 ⋅ cos lat2 ⋅ sin²(Δlng/2)  
c =  2 ⋅ arctan2( √a, √(1−a) )
d = R ⋅ c 

Where lat is latitude, lng is longitude and R is earth’s radius (mean radius = 6,373km); note that angles need to be in radians in order to pass it to the trigonometric functions.

Keep in mind that latitude and longitude are numerical data stored inside the context of a variable as shown below:

The synthetic expression to calculate the distance will look similar to this one in the synthetic variable editor:

Where lat1, lat2, lng1, and lng2 uses the value inside the context to make the proper calculations.

IMPORTANT NOTE: The example above assumes that the variables have the same timestamp. If not, the fill_missing() expression needs to be included.

2. Creating a Synthetic Variable using the Device properties

Devices properties is a quickly way to add metadata to your Ubidots devices.

It is possible to use the device properties to calculate a synthetic variable as long as the property has a float number format. To access it from the synthetic editor you need to use the key {{device:<id>}} (remember to change <id> with the id of the device you want to access to the property. Then, you need to use the dot ‘.’ operator over the device, calls its properties and then the key identifier:


Note: To find more information about context and other features to store values, please refer to Ubidots API documentation.


Suppose you have calibrated your device and saved this information in the device property, and you want to perform operations with this information.

For example, calculate the difference between the current temperature and the calibration value.

The results obtained with the synthetic expression are shown below:

3. Creating a Synthetic Variable using timestamp

Similarly to accessing context data it happens with the timestamp. It can be accessed in Synthetic Variables using the dot '.' operator.



Imagine you have a variable, like in the image above, storing a value every time a machine TURNS ON (1) or TURNS OFF (0), and you want to calculate how long the machine remains in an ON state (1). This calculation can be done using the variable timestamp as shown below:


  • previousTime : Timestamps in milliseconds when the machine was ON.

  • actualTime  : Timestamps in milliseconds when the machine was OFF.

  • dV  : Difference between actual and previous value to monitor if there was a change on the machine ON/OF status.

  • dT : Time the machine remains ON.

At last, one needs to evaluate on dV's sign to check for a negative value, which in turn indicates the machine went from ON to OFF. To do so, there's the function where(), that based on an input condition, takes one of 2 actions for the True  or False condition outcome. For the case herein, it serves well to evaluate the sign and if True, save the time difference between the last 2 states, that is, dT.

Keep in mind this time is in always in milliseconds [ms], if needed, you can convert it to seconds, minutes or hours multiplying by the correct factor.

4. Example of data range functions

Data range functions are a perfect fit to make statistical analysis to identify trends in a time period basis. With them, you will not only be able to calculate a value every fixed period of time but also have some flexibility to select the timestamp to which data will be stored, and an initial offset as well, if needed.

All data range functions follow the below syntax:

function_name(variable, data_range, position="start", offset=0)

Where function_name and data_range can be any of the options described here, position (optional) can be either of 2 options, "start" or "end", which indicates whether value is going to be timestamped at the beginning or end of the data_range, and finally, the parameter offset (optional), as it suggest, shifts the starting point from where the function will compute.

IMPORTANT NOTE: The data ranges "W" and "M" default to position="end" whereas the rest to position="start".


mean(x, "1H")

The above synthetic expression calculates the mean of variable x every 1 hour ("1H" ), and because non of the optional parameters are defined, the values will be timestamped beginning every hour, that is, all values received from 00:00:00 to 01:00:00 will be mean-aggregated into a single value timestamped at 00:00:00.


mean(x, "1H", position="end")

Similarly to example 1., synthetic expression 2. will compute variable x mean value every hour but in this case, the engine, because of the position="end" parameter, will timestamp each value at the end of the data range, so for values received from 00:00:00 to 01:00:00, the mean-aggregated value is timestamped at 00:59:59.


mean(x, "8H", offset=6)

Let's suppose you're working on a factory with the following shifts schedule: 6:00 AM – 2:00 PM, 2:00 PM – 10:00 PM and 10:00 PM – 6:00 AM. You'd like to average the variable x every 8 hours to match each shift duration. However, the synthetic engine, by default, starts computing at 00:00:00 and not at the beginning of every shift. Here's when the last parameter, offset, comes in handy. Making offset=6 indicates the engine to start computing at 06:00:00 in periods of 8 hours, rendering 3 values per day timestamped at 06:00:00, 14:00:00 and 22:00:00.


mean(x, "8H", position="end", offset=6)

Based on the same example as in 3. we have now make use of all the parameters. In this case, the synthetic will compute starting at 06:00:00 in an 8 hours period basis but because of the position="end" parameter, values will be timestamped at 13:59:59, 21:59:59 and 05:59:59 for the 6:00 AM, 2:00 PM and 10:00 PM shifts, respectively.

IMPORTANT NOTE: offset parameter works only with values that evenly subdivide 1 day.

5. Examples of special Synthetic Functions

There are many instances where a Synthetic Expression involves 2 or more variable time-series, and for those cases, because of the way the Synthetic Variables engine is built, each of the values to be computed must have the same timestamp, otherwise, the engine won't compute or output unexpected results. Nevertheless, not always, if not most of the cases, the variables time-series won't comply to this rule. For such instances, it exists the fill_missing() function, which basically fills the gaps where there's a missing value in either of the variables time-series used within the expression. This function follows the below default syntax:

fill_missing(expression, first_fill="ffill", last_fill="None", fill_value="None")

Where the only mandatory argument is expression. The others, first_fill, last_fill and fill_value are optional. Furthermore, the optional arguments first_fill and last_fill can take either of 3 values: ffill (forward_fill), bfill (back_fill) or None, and fill_value can be None or any other numerical value.

To better understand this function, let's suppose you have a set of variables that you want to sum up together. As you can see in the below table, the variables A, B, C and D don't have data in all of the timestamps, so fill_missing() function is needed to fill the gaps and ultimately perform the calculation. The examples from 1 to 4 are based on said table:



fill_missing(A + B + C + D)

By default, the fill_missing() expression makes the first_fill  parameter as  ffill, meaning that the function will fill the gaps forward, starting in a point where it has enough data to fill the gaps for all the involved series. The result would be as follows:


fill_missing(A + B + C + D, first_fill="bfill")

On the other hand, the first_fill  can also be set as a bfill. The gaps would be filled as shown below:


fill_missing(A + B + C + D, first_fill="ffill", last_fill="bfill")

Let's suppose you need to fill all the gaps of the variables, in that case you'd need to set both first_fill  and last_fill in the expression, rendering the following:


fill_missing(A + B + C + D, fill_value=0)

Additionally, you can fill the gaps with a value. For this case, the function sets a "0" in all the gaps the variables have.

Did this answer your question?