BigQuery Authorisation Best Practices: Secure Connection Guide
This document outlines the best practices for securely connecting Wondaris to your BigQuery data warehouse. Wondaris connects to BigQuery using Google Cloud Service accounts - using either Wondaris' core Service accounts or user provided service accounts, using a key stored in Google Cloud's Secret Manager within a customer-specific, isolated project. This guide emphasises data isolation and access control to ensure the security and integrity of your data.
Data Isolation
Data isolation is crucial for maintaining the confidentiality and integrity of your data. By implementing proper isolation techniques, you can minimise the risk of unauthorised access and ensure that Wondaris only accesses the data it needs.
Data Minimisation
The principle of data minimisation states that you should only collect and store the data that is absolutely necessary. Before exposing any data to Wondaris, carefully consider which data is required for your specific use case. Avoid granting access to entire tables or databases if only a subset of data is needed.
Views
The primary (and often easiest) method for data isolation with Wondaris is through the use of authorised views in BigQuery. Create views that specifically select only the necessary columns and rows from your underlying tables. This allows you to restrict the data that Wondaris can access without modifying your base tables.
In this data model, you will typically create a view on a single customer table and views on any number of event tables (like a transaction table). Wondaris handles the joining of these tables (e.g., customer and event tables) using primary/foreign key relationships. It is important that both the customer and event tables have primary keys defined.
Example:
Customer Table:
customer_tablewith columns likecustomer_id,email,loyalty_tier, etc.customer_idshould be the primary key.Event Tables:
event_table_1(e.g.,page_views) with columns likeevent_id,customer_id,event_time,page_url, etc., andevent_table_2(e.g.,transactions) with columns liketransaction_id,customer_id,transaction_timestamp,amount,product_id, etc.event_idandtransaction_idshould be the primary keys for their respective tables, andcustomer_idis the foreign key.
You would create a customer view:
CREATE VIEW customer_view AS SELECT customer_id, email, loyalty_tier FROM customer_table;And event views, selecting the relevant columns:
CREATE VIEW page_views_view AS SELECT event_id, -- Primary Key customer_id, event_time, page_url FROM event_table_1; CREATE VIEW transaction_view AS SELECT transaction_id, -- Primary Key customer_id, transaction_timestamp, amount, product_id FROM event_table_2;
You will need to ensure the views (or dataset which houses the views) is authorised to access the underlying data - this can be done with Authorised Views in BigQuery.
Column-Level Security ADVANCED
For even finer-grained control, consider using BigQuery’s column-level security features (e.g., masking policies). This allows you to mask or redact sensitive data within the view itself, further protecting it from unauthorised access.
Logical Data Segmentation
You can organise your data in BigQuery in a way that supports logical segmentation. This can involve using separate Google Cloud Projects, datasets or tables to store different types of data. By segmenting your data, you can more easily control access and ensure that Wondaris only has access to the relevant segments.
Separate Projects/Datasets: Consider creating a dedicated project or dataset in Google Cloud / BigQuery specifically for the data that will be accessed by Wondaris. This provides a clear separation of data and simplifies access control.
Regular Audits: Regularly review the data that is exposed to Wondaris to ensure that it is still necessary and that the access controls are still appropriate. Data requirements may change over time, so it's important to update your views and permissions accordingly.
Write Access
If you are wanting to use Wondaris enrichments (except for RFM), Wondaris will need somewhere to write data to. Ideally you would create a specific dataset JUST for this purpose and provide write access to the service accounts for this purpose.
For data that is sharded in BigQuery, you must create a view, dynamically selecting the shards you wish to include.
Data Access Permissions
Ensure data access is limited for all your data assets in GCP
If using Wondaris' service account for access:
Assign the service account only the permissions it needs (outlined below)
If using your own service account for access:
Make a dedicated service account ONLY for the purpose of allowing Wondaris to access the data
When creating a key
MAKE SURE YOU KEEP IT SAFE
Share it only via secure channels - preferably directly into Wondaris (as this is stored in Google Cloud’s secrets manager)
When assigning permissions the service account (either choice):
Assign least privilege permissions to achieve what is required
Depending on what is trying to be achieved, assign the following:
Read Only:
Permissions:
BigQuery Data ViewerBigQuery Job UserBigQuery Read Session User
Use cases:
Audience segmentation & activation
Audience data exploration
RFM Enrichments
Read & Write:
Permissions:
BigQuery Data EditorBigQuery Job UserBigQuery Read Session User
Use cases:
ML Enrichments
DBT Enrichments
Depending on the setup above in Data Isolation, you may assign the service accounts
BigQuery Data ViewerorBigQuery Data Editorrole to:The project you have isolated the data within
The Datasets you have isolated the data within
The tables or views you have isolated the data within
Note: if using authorised views, you may need to run through an authorisation process again if the view is changed
The
BigQuery Job UserMUST be assigned to the Project level (see: https://cloud.google.com/bigquery/docs/access-control#bigquery.jobUser )The
BigQuery Read Session UserMUST be assigned to the Project level (see: https://cloud.google.com/bigquery/docs/access-control#bigquery.readSessionUser )