Integration - Power BI
The Power BI setup consists of two PBI projects, one report and one dataset. The report imports is straightforward and imports the dataset. The dataset is saved as .pbit file and is included in the repo.
Partitioning
To ensure quick refreshes of the dataset, large tables are split into multiple partitions. For this purpose there is a Python mixin that can be included in SQLAlchemy models. This mixins automatically adds a column called Partition to the model. Furthermore, two indices are added to quickly load one partition and to detect partitions with changes.
The mixin can be found at app/database/models/mixins/has_partitions.py and can be used as follows:
from app.database.models import Base
from app.database.models.mixins.has_partitions import HasPartitionsMixin
from ... import ...
class ModelA(Base, HasPartitionMixin):
__tablename__ = "table_a"
FieldA: Mapped[int]
FieldB: ...
...
def set_partition(self):
"""
Sets the Partition column of the
"""
self.Partition = <PARTITION_LOGIC>
return self
@staticmethod
def partition_sql() -> str:
"""
SQL clause to update the Partition column directly in the database.
It is inserted in: UPDATE table_a SET Partition = <INSERTED_HERE>;
"""
return f"<PARTITION_LOGIC_IN_SQL_FORM>"
@staticmethod
def all_partitions() -> list[str] | None:
"""
All possible partitions
"""
return ["partition1", "partition2", ...]
Some common implementation have been written in:
PartitionedByUUIDMixin@app/database/models/mixins/partition_by_uuid.pyPartitionedByFinancialYearPeriodMixin@app/integrations/exact/models/mixins/partition_by_finyearperiod.py
Power BI API
The authenticate with the Power BI API we make use of a service principal, see this page for more info. The implementation of the communication with the Power BI API can be found in the PowerBIAPI class located in app/integrations/powerbi/api.py.
Refresh dataset
To refresh the dataset (or or subset of tables/partitions) it is important to have enough resources. In this table the amount of available RAM and amount of parallel table/partition refreshes is listed per SKU:
| SKU | RAM (GB) | Parallel processes |
|---|---|---|
| F2 | 3 | 1 |
| F4 | 3 | 2 |
| F8 | 3 | 5 |
| F16 | 5 | 10 |
| F32 | 10 | 20 |
| F64 | 25 | 40 |
| Source |
Note that the amount RAM needs to be roughly 3 times the size of the database, 1 part for the current model, 1 part for the new data and 1 part for any report views that occur at the same time.
Generate embed token
To get access to the report and apply Row Level Security (RLS) to the data the portal needs a Power BI Embed token. This token can be requested via the /api/v1/pbi/embed-token?division=<CODE> endpoint. The endpoint than requests a embed token from the Power BI API, to handle RLS, it adds the following identity in the request:
{
"username": <CODE>,
"roles": "Division",
"datasets": [<DATASETID>],
}
XMLA
To deploy changes in the .pbip project we make use of the XMLA endpoint. Communication with the XMLA endpoints is coded in Powershell scripts located in /app/integrations/powerbi/xmla/powershell_scripts with Python wrappers in app/integrations/powerbi/xmla. The powershell scripts are basically wrappers around the Microsoft.AnalysisServices library.
Dataset deployment
To deploy the Power BI dataset to Power BI Service run the following command: baa powerbi xmla deploy. Note this command prompts for confirmation and thus requires a tty.
Sometimes after the dataset is deployed some tables or partitions need to be refreshed. To get a list of all tables/partitions that need a refresh use the command: baa powerbi list-tables -n.
To actually refresh all tables that need a refresh, run: baa powerbi xmla list-tables -r