BigQuery Authorisation Best Practices: Secure Connection Guide

BigQuery Authorisation Best Practices: Secure Connection Guide

This document outlines the best practices for securely connecting Wondaris to your BigQuery data warehouse. Wondaris connects to BigQuery using Google Cloud Service accounts - using either Wondaris' core Service accounts or user provided service accounts, using a key stored in Google Cloud's Secret Manager within a customer-specific, isolated project. This guide emphasises data isolation and access control to ensure the security and integrity of your data.

Data Isolation

Data isolation is crucial for maintaining the confidentiality and integrity of your data. By implementing proper isolation techniques, you can minimise the risk of unauthorised access and ensure that Wondaris only accesses the data it needs.

Data Minimisation

The principle of data minimisation states that you should only collect and store the data that is absolutely necessary. Before exposing any data to Wondaris, carefully consider which data is required for your specific use case. Avoid granting access to entire tables or databases if only a subset of data is needed.

Views

The primary (and often easiest) method for data isolation with Wondaris is through the use of authorised views in BigQuery. Create views that specifically select only the necessary columns and rows from your underlying tables. This allows you to restrict the data that Wondaris can access without modifying your base tables.

In this data model, you will typically create a view on a single customer table and views on any number of event tables (like a transaction table). Wondaris handles the joining of these tables (e.g., customer and event tables) using primary/foreign key relationships. It is important that both the customer and event tables have primary keys defined.

  • Example:

    • Customer Table: customer_table with columns like customer_id, email, loyalty_tier, etc. customer_id should be the primary key.

    • Event Tables: event_table_1 (e.g., page_views) with columns like event_id, customer_id, event_time, page_url, etc., and event_table_2 (e.g., transactions) with columns like transaction_id, customer_id, transaction_timestamp, amount, product_id, etc. event_id and transaction_id should be the primary keys for their respective tables, and customer_id is the foreign key.

    You would create a customer view:

    CREATE VIEW customer_view AS SELECT customer_id, email, loyalty_tier FROM customer_table;

    And event views, selecting the relevant columns:

    CREATE VIEW page_views_view AS SELECT event_id, -- Primary Key customer_id, event_time, page_url FROM event_table_1; CREATE VIEW transaction_view AS SELECT transaction_id, -- Primary Key customer_id, transaction_timestamp, amount, product_id FROM event_table_2;

You will need to ensure the views (or dataset which houses the views) is authorised to access the underlying data - this can be done with Authorised Views in BigQuery.

Column-Level Security ADVANCED

For even finer-grained control, consider using BigQuery’s column-level security features (e.g., masking policies). This allows you to mask or redact sensitive data within the view itself, further protecting it from unauthorised access.

Logical Data Segmentation

You can organise your data in BigQuery in a way that supports logical segmentation. This can involve using separate Google Cloud Projects, datasets or tables to store different types of data. By segmenting your data, you can more easily control access and ensure that Wondaris only has access to the relevant segments.

  • Separate Projects/Datasets: Consider creating a dedicated project or dataset in Google Cloud / BigQuery specifically for the data that will be accessed by Wondaris. This provides a clear separation of data and simplifies access control.

  • Regular Audits: Regularly review the data that is exposed to Wondaris to ensure that it is still necessary and that the access controls are still appropriate. Data requirements may change over time, so it's important to update your views and permissions accordingly.

Write Access

If you are wanting to use Wondaris enrichments (except for RFM), Wondaris will need somewhere to write data to. Ideally you would create a specific dataset JUST for this purpose and provide write access to the service accounts for this purpose.

For data that is sharded in BigQuery, you must create a view, dynamically selecting the shards you wish to include.

Data Access Permissions

Ensure data access is limited for all your data assets in GCP

If using Wondaris' service account for access:

  • Assign the service account only the permissions it needs (outlined below)

If using your own service account for access:

  • Make a dedicated service account ONLY for the purpose of allowing Wondaris to access the data

  • When creating a key

    • MAKE SURE YOU KEEP IT SAFE

    • Share it only via secure channels - preferably directly into Wondaris (as this is stored in Google Cloud’s secrets manager)

 

When assigning permissions the service account (either choice):

  • Assign least privilege permissions to achieve what is required

  • Depending on what is trying to be achieved, assign the following:

    • Read Only:

      • Permissions:

        • BigQuery Data Viewer

        • BigQuery Job User

        • BigQuery Read Session User

      • Use cases:

        • Audience segmentation & activation

        • Audience data exploration

        • RFM Enrichments

    • Read & Write:

      • Permissions:

        • BigQuery Data Editor

        • BigQuery Job User

        • BigQuery Read Session User

      • Use cases:

        • ML Enrichments

        • DBT Enrichments

  • Depending on the setup above in Data Isolation, you may assign the service accounts BigQuery Data Viewer or BigQuery Data Editor role to:

    • The project you have isolated the data within

    • The Datasets you have isolated the data within

    • The tables or views you have isolated the data within
      Note: if using authorised views, you may need to run through an authorisation process again if the view is changed

  • The BigQuery Job User MUST be assigned to the Project level (see: https://cloud.google.com/bigquery/docs/access-control#bigquery.jobUser )

  • The BigQuery Read Session User MUST be assigned to the Project level (see: https://cloud.google.com/bigquery/docs/access-control#bigquery.readSessionUser )