In this article, you will learn how to get the most out of the Synthetic Variables Engine using special functions or their combination to render more complex expressions. If you are not familiar with Synthetic Variables yet, please refer here to leverage the benefits of this functionality.

## Requirements

- An IoT Entrepreneur license or greater.

## Table of Contents

- Creating a Synthetic Variable using context data.
- Creating a Synthetic Variable using timestamp.
- Examples of data range functions.
- Examples of special Synthetic Functions.

## 1. Creating a Synthetic Variable using context data

*Variable Context* is one of the best ways to store metadata that isn't necessarily a numerical value. Most common usage of variable context is for the latitude and longitude GPS coordinates. While those values are indeed numerical, it comes quite handy to send them in a single dot instead of 2 separate variables.

Context data can only be used within your synthetic expression **if the context is a numerical value**. Accessing it from the Synthetic editor is straightforward, you just need to use the dot '.' operator over the variable, call its context and then the context key identifier:

`{YOUR_VARIABLE}.context.context-key`

*Note**: To find more information about context and other features to store values, please refer to Ubidots API documentation.***Example:**

Suppose that you need to use the **Haversine** formula to calculate the shortest distance between two locations over the earth's surface based on Latitude and Longitude data. This would be as follows:

`Δlat = lat2 -lat1`

Δlng = lng2 -lng1

a = sin²(Δlat/2) + cos lat1 ⋅ cos lat2 ⋅ sin²(Δlng/2)

c = 2 ⋅ arctan2( √a, √(1−a) )

d = R ⋅ c

where `lat`

is **latitude**, `lng`

is **longitude** and `R`

is earth’s radius (`mean radius = 6,373km`

); note that angles need to be in radians in order to pass it to the trigonometric functions.

Keep in mind that **latitude** and **longitude** are numerical data stored inside the context of a variable as shown below:

The synthetic expression to calculate the distance will look similar to this one in the synthetic variable editor:

where `lat1, lat2, lng1`

, and `lng2`

uses the value inside the context to make the proper calculations.

*IMPORTANT NOTE**: The example above assumes that the variables have the same timestamp. If not, the *`fill_missing()`

* expression needs to be included.*

## 2. Creating a Synthetic Variable using timestamp

Similarly to accessing context data it happens with the timestamp. It can be accessed in Synthetic Variables using the dot '.' operator.

`{YOUR_VARIABLE}.timestamp`

**Example**

**Example:**

Imagine you have a variable, like in the image above, storing a value every time a machine **TURNS ON** (1) or **TURNS OFF** (0), and you want to calculate how long the machine remains in an ON state (1). This calculation can be done using the variable timestamp as shown below:

Where,

`previousTime`

: Timestamps in milliseconds when the machine was**ON**`actualTime`

: Timestamps in milliseconds when the machine was**OFF**`dV`

: Difference between actual and previous value to monitor if there was a change on the machine ON/OF status.`dT`

: Time the machine remains**ON.**

At last, one needs to evaluate on `dV's`

sign to check for a negative value, which in turn indicates the machine when from ON to OFF. To do so, there's the function `where()`

, that based on an input condition, takes one of 2 actions for the `True`

or `False`

condition outcome. For the case herein, it serves well to evaluate the sign and if True, save the time difference between the last 2 states, that is, `dT`

.

Keep in mind this time is in always in milliseconds [ms], if needed, you can convert it to seconds, minutes or hours multiplying for the correct factor.

## 3. Example of data range functions

Data range functions are a perfect fit to make statistical analysis to identify trends in a time period basis. With them, you will not only be able to calculate a value every fixed period of time but also have some flexibility to select the timestamp to which data will be stored, and an initial offset as well, if needed.

All data range functions follow the below syntax:

`function_name(variable, data_range, position="start", offset=0)`

where `function_name`

and `data_range`

can be any of the options described here, `position`

(optional) can be either of 2 options, `"start"`

or `"end"`

, which indicates whether value is going to be timestamped at the beginning or end of the `data_range`

, and finally, the parameter `offset`

(optional), as it suggest, shifts the starting point from where the function will compute.

**IMPORTANT NOTE: **The data ranges "nW", "nM" and "nY" default to `position="end"`

whereas the rest to `position="start"`

.

**Examples:****1.**

`mean(x, "1H")`

The above synthetic expression calculates the `mean`

of variable `x`

every 1 hour (`"1H"`

), and because non of the optional parameters are defined, the values will be timestamped beginning every hour, that is, all values received from 00:00:00 to 01:00:00 will be mean-aggregated into a single value timestamped at 00:00:00.

**2.**

`mean(x, "1H", position="end")`

Similarly to example **1.**, synthetic expression **2.** will compute variable `x`

mean value every hour but in this case, the engine, because of the `position="end"`

parameter, will timestamp each value at the end of the data range, so for values received from 00:00:00 to 01:00:00, the mean-aggregated value is timestamped at 00:59:59.

**3.**

`mean(x, "8H", offset=6)`

Let's suppose you're working on a factory with the following shifts schedule: 6:00 AM – 2:00 PM, 2:00 PM – 10:00 PM and 10:00 PM – 6:00 AM. You'd like to average the variable `x`

every 8 hours to match each shift duration. However, the synthetic engine, by default, starts computing at 00:00:00 and not at the beginning of every shift. Here's when the last parameter, `offset`

, comes in handy. Making `offset=6`

indicates the engine to start computing at 06:00:00 in periods of 8 hours, rendering 3 values per day timestamped at 06:00:00, 14:00:00 and 22:00:00.

**4.**

`mean(x, "8H", position="end", offset=6)`

Based on the same example as in **3.** we have now make use of all the parameters. In this case, the synthetic will compute starting at 06:00:00 in an 8 hours period basis but because of the `position="end"`

parameter, values will be timestamped at 13:59:59, 21:59:59 and 05:59:59 for the 6:00 AM, 2:00 PM and 10:00 PM shifts, respectively.

**IMPORTANT NOTE:*** *`offset`

parameter works only with values that evenly subdivide 1 day.

## 4. Examples of special Synthetic Functions

There are many instances where a Synthetic Expression involves 2 or more variable time-series, and for those cases, because of the way the Synthetic Variables engine is built, each of the values to be computed must have the same timestamp, otherwise, the engine won't compute or output unexpected results. Nevertheless, not always, if not most of the cases, the variables time-series won't comply to this rule. For such instances, it exists the `fill_missing()`

function, which basically fills the gaps where there's a missing value in either of the variables time-series used within the expression. This function follows the below default syntax:

`fill_missing(expression, first_fill="ffill", last_fill="None", fill_value="None")`

where the only mandatory argument is `expression`

. The others, `first_fill`

, `last_fill`

and `fill_value`

are optional. Furthermore, the optional arguments `first_fill`

and `last_fill`

can take either of 3 values: `ffill`

(forward_fill), `bfill`

(back_fill) or `None`

, and `fill_value`

can be `None`

or any other `numerical value`

.

To better understand this function, let's suppose you have a set of variables that you want to sum up together. As you can see in the below table, the variables A, B, C and D don't have data in all of the timestamps, so `fill_missing()`

function is needed to fill the gaps and ultimately perform the calculation. The examples from 1 to 4 are based on said table:

**Examples**

**1.**

`fill_missing(A + B + C + D)`

By default, the `fill_missing()`

expression makes the `first_fill`

parameter as `ffill`

, meaning that the function will fill the gaps forward, starting in a point where it has enough data to fill the gaps for all the involved series. The result would be as follows:

**2.**

`fill_missing(A + B + C + D, first_fill="bfill")`

On the other hand, the `first_fill`

can also be set as a `bfill.`

The gaps would be filled as shown below:

**3.**

`fill_missing(A + B + C + D, first_fill="ffill", last_fill="bfill")`

Let's suppose you need to fill all the gaps of the variables, in that case you'd need to set both `first_fill`

and `last_fill`

in the expression, rendering the following:

**4.**

`fill_missing(A + B + C + D, fill_value=0)`

Additionally, you can fill the gaps with a value. For this case, the function sets a "0" in all the gaps the variables have.

## Other users also found helpful...

- Ubidots Events Engine
- Analytics: Synthetic Variables Basics
- UbiFunctions to create your own API