Arrow example

This example illustrates how to convert Arrow data that contains power-grid-model data to NumPy structured arrays, which the power-grid-model requests.

It is by no means intended to provide complete documentation on the topic, but only to show how such conversions could be done.

In particular, this example restricts itself to PyArrow Tables, but for more advanced cases, RecordBatches obviously are a better solution.

NOTE: To run this example, the optional examples dependencies are required:

pip install .[examples]
%%capture cap --no-stderr
from IPython.display import display

from power_grid_model import PowerGridModel, initialize_array, CalculationMethod
import pyarrow as pa
import pandas as pd
import numpy as np

Model

For clarity, a simple network is created. More complex cases work similarly and can be found in the other examples:

node 1 ---- line 4 ---- node 2 ----line 5 ---- node 3
   |                       |                      |
source 6               sym_load 7             sym_load 8

Single symmetric calculations

Construct the input data for the model and construct the actual model.

Arrow uses a columnar data format while the power-grid-model uses a row-based data format with continuous memory. Because of that, at least one copy is required.

List the power-grid-model data types

See which attributes exist for a given component and which data types are used

node_input_dtype = initialize_array("input", "node", 0).dtype
line_input_dtype = initialize_array("input", "line", 0).dtype
source_input_dtype = initialize_array("input", "source", 0).dtype
asym_load_input_dtype = initialize_array("input", "asym_load", 0).dtype
print("node:", node_input_dtype)
print("line:", line_input_dtype)
print("source:", source_input_dtype)
print("asym_load:", asym_load_input_dtype)
node: {'names': ['id', 'u_rated'], 'formats': ['<i4', '<f8'], 'offsets': [0, 8], 'itemsize': 16, 'aligned': True}
line: {'names': ['id', 'from_node', 'to_node', 'from_status', 'to_status', 'r1', 'x1', 'c1', 'tan1', 'r0', 'x0', 'c0', 'tan0', 'i_n'], 'formats': ['<i4', '<i4', '<i4', 'i1', 'i1', '<f8', '<f8', '<f8', '<f8', '<f8', '<f8', '<f8', '<f8', '<f8'], 'offsets': [0, 4, 8, 12, 13, 16, 24, 32, 40, 48, 56, 64, 72, 80], 'itemsize': 88, 'aligned': True}
source: {'names': ['id', 'node', 'status', 'u_ref', 'u_ref_angle', 'sk', 'rx_ratio', 'z01_ratio'], 'formats': ['<i4', '<i4', 'i1', '<f8', '<f8', '<f8', '<f8', '<f8'], 'offsets': [0, 4, 8, 16, 24, 32, 40, 48], 'itemsize': 56, 'aligned': True}
asym_load: {'names': ['id', 'node', 'status', 'type', 'p_specified', 'q_specified'], 'formats': ['<i4', '<i4', 'i1', 'i1', ('<f8', (3,)), ('<f8', (3,))], 'offsets': [0, 4, 8, 12, 16, 40], 'itemsize': 64, 'aligned': True}

Create the grid using Arrow tables

The power-grid-model documentation on Components provides documentation on which components are required and which ones are optional.

Construct the Arrow data as a table with the correct headers and data types.

nodes = pa.table(
    [
        pa.array([1, 2, 3], type=pa.int32()),  # id
        pa.array([10500.0, 10500.0, 10500.0], type=pa.float64()),
    ],
    names=("id", "u_rated"),
)
lines = pa.table(
    [
        pa.array([4, 5], type=pa.int32()),  # id
        pa.array([1, 2], type=pa.int32()),  # from_node
        pa.array([2, 3], type=pa.int32()),  # to_node
        pa.array([1, 1], type=pa.int8()),  # from_status
        pa.array([1, 1], type=pa.int8()),  # to_status
        pa.array([0.11, 0.15], type=pa.float64()),  # r1
        pa.array([0.12, 0.16], type=pa.float64()),  # x1
        pa.array([4.1e-05, 5.4e-05], type=pa.float64()),  # c1
        pa.array([0.1, 0.1], type=pa.float64()),  # tan1
        pa.array([0.01, 0.05], type=pa.float64()),  # r0
        pa.array([0.22, 0.06], type=pa.float64()),  # x0
        pa.array([4.1e-05, 5.4e-05], type=pa.float64()),  # c0
        pa.array([0.4, 0.1], type=pa.float64()),  # tan0
    ],
    names=("id", "from_node", "to_node", "from_status", "to_status", "r1", "x1", "c1", "tan1", "r0", "x0", "c0", "tan0"),
)
sources = pa.table(
    [
        pa.array([6], type=pa.int32()),  # id
        pa.array([1], type=pa.int32()),  # node
        pa.array([1], type=pa.int8()),  # status
        pa.array([1.0], type=pa.float64()),  # u_ref
    ],
    names=("id", "node", "status", "u_ref"),
)
sym_loads = pa.table(
    [
        pa.array([7, 8], type=pa.int32()),  # id
        pa.array([2, 3], type=pa.int32()),  # node
        pa.array([1, 1], type=pa.int8()),  # status
        pa.array([0, 0], type=pa.int8()),  # type
        pa.array([1.0, 2.0], type=pa.float64()),  # p_specified
        pa.array([0.5, 1.5], type=pa.float64()),  # q_specified
    ],
    names=("id", "node", "status", "type", "p_specified", "q_specified"),
)

nodes
# the tables of the other components can be printed similarly
pyarrow.Table
id: int32
u_rated: double
----
id: [[1,2,3]]
u_rated: [[10500,10500,10500]]

Convert the Arrow data to power-grid-model input data

No direct conversion from Arrow Tables to NumPy exists and a copy is always required.

To ensure support for optional arguments and to prevent version lock, it is recommended to create an empty power-grid-model data set using initialize_array and then fill it with the Arrow data.

def arrow_to_numpy(data: pa.lib.Table, data_type: str, component: str) -> np.ndarray:
    """Convert Arrow data to NumPy data."""
    result = initialize_array(data_type, component, len(data))
    for name, column in zip(data.column_names, data.columns):
        if name in result.dtype.names:
            result[name] = column.to_numpy()
    return result


node_input = arrow_to_numpy(nodes, "input", "node")
line_input = arrow_to_numpy(lines, "input", "line")
source_input = arrow_to_numpy(sources, "input", "source")
sym_load_input = arrow_to_numpy(sym_loads, "input", "sym_load")

node_input
array([(1, 10500.), (2, 10500.), (3, 10500.)],
      dtype={'names': ['id', 'u_rated'], 'formats': ['<i4', '<f8'], 'offsets': [0, 8], 'itemsize': 16, 'aligned': True})

Construct the complete input data structure

input_data = {
    "node": node_input,
    "line": line_input,
    "source": source_input,
    "sym_load": sym_load_input,
}

input_data
{'node': array([(1, 10500.), (2, 10500.), (3, 10500.)],
       dtype={'names': ['id', 'u_rated'], 'formats': ['<i4', '<f8'], 'offsets': [0, 8], 'itemsize': 16, 'aligned': True}),
 'line': array([(4, 1, 2, 1, 1, 0.11, 0.12, 4.1e-05, 0.1, 0.01, 0.22, 4.1e-05, 0.4, nan),
        (5, 2, 3, 1, 1, 0.15, 0.16, 5.4e-05, 0.1, 0.05, 0.06, 5.4e-05, 0.1, nan)],
       dtype={'names': ['id', 'from_node', 'to_node', 'from_status', 'to_status', 'r1', 'x1', 'c1', 'tan1', 'r0', 'x0', 'c0', 'tan0', 'i_n'], 'formats': ['<i4', '<i4', '<i4', 'i1', 'i1', '<f8', '<f8', '<f8', '<f8', '<f8', '<f8', '<f8', '<f8', '<f8'], 'offsets': [0, 4, 8, 12, 13, 16, 24, 32, 40, 48, 56, 64, 72, 80], 'itemsize': 88, 'aligned': True}),
 'source': array([(6, 1, 1, 1., nan, nan, nan, nan)],
       dtype={'names': ['id', 'node', 'status', 'u_ref', 'u_ref_angle', 'sk', 'rx_ratio', 'z01_ratio'], 'formats': ['<i4', '<i4', 'i1', '<f8', '<f8', '<f8', '<f8', '<f8'], 'offsets': [0, 4, 8, 16, 24, 32, 40, 48], 'itemsize': 56, 'aligned': True}),
 'sym_load': array([(7, 2, 1, 0, 1., 0.5), (8, 3, 1, 0, 2., 1.5)],
       dtype={'names': ['id', 'node', 'status', 'type', 'p_specified', 'q_specified'], 'formats': ['<i4', '<i4', 'i1', 'i1', '<f8', '<f8'], 'offsets': [0, 4, 8, 12, 16, 24], 'itemsize': 32, 'aligned': True})}
# Optional: validate the input data
from power_grid_model.validation import validate_input_data

validate_input_data(input_data)

Use the power-grid-model

For more extensive examples, visit the power-grid-model documentation.

# construct the model
model = PowerGridModel(input_data=input_data, system_frequency=50)

# run the calculation
sym_result = model.calculate_power_flow()

# use pandas to tabulate and display the results
pd_sym_node_result = pd.DataFrame(sym_result["node"])
display(pd_sym_node_result)
id energized u_pu u u_angle p q
0 1 1 1.000325 10503.410670 -0.000067 338777.246279 -3.299419e+06
1 2 1 1.002879 10530.228073 -0.002932 -1.000000 -5.000000e-01
2 3 1 1.004113 10543.184974 -0.004342 -2.000000 -1.500000e+00

Convert power-grid-model output data to Arrow output data

Using Pandas DataFrames as an intermediate type, constructing Arrow data formats is straightfoward

pa_sym_node_result = pa.table(pd_sym_node_result)

# and similar for other components

pa_sym_node_result
pyarrow.Table
id: int32
energized: int8
u_pu: double
u: double
u_angle: double
p: double
q: double
----
id: [[1,2,3]]
energized: [[1,1,1]]
u_pu: [[1.000324825742982,1.0028788641128947,1.004112854674026]]
u: [[10503.410670301311,10530.228073185395,10543.184974077272]]
u_angle: [[-0.00006651843181519333,-0.0029317915196014274,-0.004341587216862399]]
p: [[338777.2462788447,-1.0000001549705169,-1.9999999440349978]]
q: [[-3299418.6613065186,-0.4999999565008232,-1.4999999075367236]]

Single asymmetric calculations

Asymmetric calculations have a tuple of values for some of the attributes and are not easily convertible to pandas data frames. Instead, one can have a look at the individual components of those attributes and/or flatten the arrays to access all components.

Asymmetric input

To illustrate the conversion, let’s consider a similar grid but with asymmetric loads.

node 1 ---- line 4 ---- node 2 ----line 5 ---- node 3
   |                       |                      |
source 6              asym_load 7            asym_load 8
asym_load_input_dtype = initialize_array("input", "asym_load", 0).dtype
print("asym_load:", asym_load_input_dtype)

asym_loads = pa.table(
    [
        pa.array([7, 8], type=pa.int32()),  # id
        pa.array([2, 3], type=pa.int32()),  # node
        pa.array([1, 1], type=pa.int8()),  # status
        pa.array([0, 0], type=pa.int8()),  # type
        pa.array([1.0, 2.0], type=pa.float64()),  # p_specified_a
        pa.array([1.0e-2, 2.5], type=pa.float64()),  # p_specified_b
        pa.array([1.1e-2, 4.5e2], type=pa.float64()),  # p_specified_c
        pa.array([0.5, 1.5], type=pa.float64()),  # q_specified_a
        pa.array([1.5e3, 2.5], type=pa.float64()),  # q_specified_b
        pa.array([0.1, 1.5e3], type=pa.float64()),  # q_specified_c
    ],
    names=("id", "node", "status", "type", "p_specified_a", "p_specified_b", "p_specified_c", "q_specified_a", "q_specified_b", "q_specified_c"),
)

asym_loads
asym_load: {'names': ['id', 'node', 'status', 'type', 'p_specified', 'q_specified'], 'formats': ['<i4', '<i4', 'i1', 'i1', ('<f8', (3,)), ('<f8', (3,))], 'offsets': [0, 4, 8, 12, 16, 40], 'itemsize': 64, 'aligned': True}
pyarrow.Table
id: int32
node: int32
status: int8
type: int8
p_specified_a: double
p_specified_b: double
p_specified_c: double
q_specified_a: double
q_specified_b: double
q_specified_c: double
----
id: [[7,8]]
node: [[2,3]]
status: [[1,1]]
type: [[0,0]]
p_specified_a: [[1,2]]
p_specified_b: [[0.01,2.5]]
p_specified_c: [[0.011,450]]
q_specified_a: [[0.5,1.5]]
q_specified_b: [[1500,2.5]]
q_specified_c: [[0.1,1500]]
def arrow_to_numpy_asym(data: pa.lib.Table, data_type: str, component: str) -> np.ndarray:
    """Convert asymmetric Arrow data to NumPy data.
    
    This function is similar to the arrow_to_numpy function, but also supports asymmetric data."""
    result = initialize_array(data_type, component, len(data))
    phases = ("a", "b", "c")
    for name, (dtype, _) in result.dtype.fields.items():
        if len(dtype.shape) == 0:
            # simple or symmetric data type
            if name in data.column_names:
                result[name] = data.column(name).to_numpy()
        else:
            # asymmetric data type
            for phase_index, phase in enumerate(phases):
                phase_name = f"{name}_{phase}"

                if phase_name in data.column_names:
                    result[name][:, phase_index] = data.column(phase_name).to_numpy()

    return result

asym_load_input = arrow_to_numpy_asym(asym_loads, "input", "asym_load")

asym_load_input
array([(7, 2, 1, 0, [1.0e+00, 1.0e-02, 1.1e-02], [5.0e-01, 1.5e+03, 1.0e-01]),
       (8, 3, 1, 0, [2.0e+00, 2.5e+00, 4.5e+02], [1.5e+00, 2.5e+00, 1.5e+03])],
      dtype={'names': ['id', 'node', 'status', 'type', 'p_specified', 'q_specified'], 'formats': ['<i4', '<i4', 'i1', 'i1', ('<f8', (3,)), ('<f8', (3,))], 'offsets': [0, 4, 8, 12, 16, 40], 'itemsize': 64, 'aligned': True})

Use the power-grid-model in asymmetric calculations

asym_input_data = {
    "node": node_input,
    "line": line_input,
    "source": source_input,
    "asym_load": asym_load_input,
}

validate_input_data(asym_input_data, symmetric=False)

# construct the model
asym_model = PowerGridModel(input_data=asym_input_data, system_frequency=50)

# run the calculation
asym_result = asym_model.calculate_power_flow(symmetric=False)

# use pandas to display the results, but beware the data types
pd.DataFrame(asym_result["node"]["u_angle"])
0 1 2
0 -0.000067 -2.094462 2.094328
1 -0.002930 -2.097322 2.091464
2 -0.004338 -2.098733 2.090057

Convert asymmetric power-grid-model output data to Arrow output data

def numpy_to_arrow(data: np.ndarray) -> pa.lib.table:
    """Convert NumPy data to Arrow data."""
    simple_data_types = []
    multi_value_data_types = []

    for name, (dtype, _) in data.dtype.fields.items():
        if len(dtype.shape) == 0:
            simple_data_types.append(name)
        else:
            multi_value_data_types.append(name)

    result = pa.table(pd.DataFrame(data[simple_data_types]))

    phases = ("a", "b", "c")
    for name in multi_value_data_types:
        column = data[name]

        assert column.shape[1] == len(phases), "Asymmetric data has 3 phase output"

        for phase_index, phase in enumerate(phases):
            sub_column = column[:, phase_index]
            result = result.append_column(f"{name}_{phase}", [pd.Series(sub_column)])

    return result


pa_asym_node_result = numpy_to_arrow(asym_result["node"])

pa_asym_node_result
pyarrow.Table
id: int32
energized: int8
u_pu_a: double
u_pu_b: double
u_pu_c: double
u_a: double
u_b: double
u_c: double
u_angle_a: double
u_angle_b: double
u_angle_c: double
p_a: double
p_b: double
p_c: double
q_a: double
q_b: double
q_c: double
----
id: [[1,2,3]]
energized: [[1,1,1]]
u_pu_a: [[1.0003248257977395,1.0028803762176168,1.004114300817404]]
u_pu_b: [[1.000324376948685,1.0028710993140397,1.0041033583077168]]
u_pu_c: [[1.00032436416241,1.002873078902152,1.0041004935738533]]
u_a: [[6064.146978239599,6079.639179329459,6087.119449677851]]
u_b: [[6064.144257236812,6079.582941090295,6087.053114238259]]
u_c: [[6064.1441797241405,6079.594941705455,6087.035747712152]]
u_angle_a: [[-0.00006651848125692708,-0.0029298831864833634,-0.004337685507209539]]
u_angle_b: [[-2.0944615736658134,-2.0973219974462594,-2.098732840554144]]
...

Batch data

power-grid-model supports batch calculations by providing an update_data argument, as shown in this example.

Both the update_data and the output result are similar to the input_data and output data in the above, except that they have another dimension representing the batch index: the first index in the NumPy structured arrays.

This extra index can be represented in Arrow using a RecordBatch or using any other multi-index data format.