Copying your Skio BigQuery dataset into your own GCP project

Prev Next

Skio delivers your data into a BigQuery dataset inside Skio's GCP project (largedata-380204) with read-only access. This is intentional — you don't pay storage costs, and Skio maintains the source of truth. If you want to join Skio data with your own sources (Shopify, Klaviyo, ad platforms, etc.), build custom dashboards, or run unrestricted SQL, you'll need to copy the dataset into a project you own.

There are two supported ways to do this:

Method

Best for

Min. refresh

Effort

Method A: BigQuery Data Transfer Service (recommended)

Full dataset replication, auto-picks up new tables, minimal maintenance

Every 24 hours (12 hours for some same-region setups)

~5 minutes, no SQL

Method B: Scheduled Query

Refreshes more frequent than every 12 hours, specific tables only, or transformations on copy

Every 15 minutes

~15 minutes, requires SQL

Skio refreshes your dataset every ~4 hours. If you need your copy to stay close to that cadence, use Method B. If daily is fine, Method A is simpler.


Before you start

You should have received the following from your Skio point of contact. If not, reach out before proceeding.

  • Source project ID: largedata-380204

  • Source dataset name: skio_<your_merchant_slug> (e.g., skio_acme_brand)

  • Read access confirmed for either a user/group email you own or a service account in your GCP project

Note: BigQuery doesn't show shared datasets in the Explorer until you star the project. Learn how to do this here.

Source dataset region

Skio's source dataset is in us-west1. Your destination dataset region affects cost and schedule options:

  • Same-region copy (destination also in us-west1) — cheapest, no egress fees, supports the fastest schedules

  • Cross-region copy (e.g., US multi-region, EU) — fully supported, but Google charges network egress and some schedule minimums apply

Unless you have a strong reason to choose another region, create your destination dataset in us-west1.


Understanding permissions

BigQuery's cross-project permission model has two sides. Both must be in place before a copy job will succeed.

What

Where it's granted

Who grants it

BigQuery Data Viewer on skio_<your_slug>

Skio's project (largedata-380204)

Skio — already set up when you were onboarded

BigQuery Job User on your GCP project

Your GCP project

You

The copy job runs from your project, so the identity running it needs BigQuery Job User in your project. It reads from Skio's project, where Skio has already granted Data Viewer access.

How to grant BigQuery Job User in your project

  1. In your GCP console, go to IAM & Admin > IAM.

  2. Click Grant Access.

  3. Add the user email or service account that will run the transfer.

  4. Assign the BigQuery Job User role (or BigQuery User, which includes Job User).

  5. Click Save.

If you're using a service account, make sure it's the same one Skio has whitelisted on the dataset. If not, send your Skio contact the service account email (<name>@<project-id>.iam.gserviceaccount.com) and they'll add it.


Method A: BigQuery Data Transfer Service

Use this method if daily refresh is acceptable, you want a one-time setup that automatically picks up any new tables Skio adds, and you don't want to write SQL.

Step 1: Create your destination dataset

  1. Open BigQuery Studio in your GCP console.

  2. Click the three-dot menu next to your project name and select Create dataset.

  3. Fill in the following:

    • Dataset ID — e.g., skio_replica or skio_data

    • Location typeRegionus-west1

  4. Click Create dataset.

Step 2: Create the data transfer

  1. In the BigQuery left sidebar, click Data transfers.

  2. Click + Create Transfer.

  3. Configure the source:

    • Source type: Dataset Copy

    • Display name: Skio Replication (or any descriptive label)

    • Repeat frequency: Daily — or Customevery 12 hours if your setup supports it

    • Start time: Any time works. If you want the freshest daily snapshot, schedule it for early morning in your timezone.

  4. Configure the data source:

    • Source project: largedata-380204

    • Source dataset: skio_<your_merchant_slug>

    • Destination dataset: the dataset you created in Step 1

  5. Check Overwrite destination tables — this keeps your copy in sync rather than appending duplicate rows.

  6. Click Save.

Step 3: Verify

The first run kicks off immediately (or on your schedule). Go to Data transfers, click your transfer, and check the run history. Once complete, confirm it worked:

SELECT COUNT(*) FROM `<your-project>.skio_replica.Subscription`;

Limitations

  • Minimum schedule is 24 hours for most configurations. Some same-region setups support 12-hour schedules, but BigQuery may block more frequent options. Use Method B if you need more frequent refreshes.

  • Schema changes on existing tables may require you to delete and recreate the destination table the first time. Checking Overwrite destination tables handles most cases automatically.


Method B: Scheduled Query

Use this method if you need refreshes more frequently than every 12 hours, you only want specific tables copied, or you want to transform the data (filter rows, rename columns, join sources) on the way in.

Step 1: Create your destination dataset

Same as Method A, Step 1. Create a dataset in your project — this guide uses skio_replica in us-west1.

Step 2: Write the query

In BigQuery Studio, open a new SQL editor tab and paste a statement like the one below. Replace skio_<your_merchant_slug> and the table list with your actual values.

-- Replicate Skio tables into your own dataset.
-- CREATE OR REPLACE rebuilds each table on every run,
-- keeping your copy in sync with Skio's source.
CREATE OR REPLACE TABLE `<your-project>.skio_replica.Site` AS (
  SELECT * FROM `largedata-380204.skio_<your_merchant_slug>.Site`
);
CREATE OR REPLACE TABLE `<your-project>.skio_replica.Subscription` AS (
  SELECT * FROM `largedata-380204.skio_<your_merchant_slug>.Subscription`
);
CREATE OR REPLACE TABLE `<your-project>.skio_replica.SubscriptionLineItem` AS (
  SELECT * FROM `largedata-380204.skio_<your_merchant_slug>.SubscriptionLineItem`
);
-- Add one CREATE OR REPLACE block per table you want to copy.
-- Ask Skio for the full table list if needed.

Run the query manually once to confirm it works before scheduling it. If you hit a permission error, re-check the permissions section above.

Step 3: Schedule the query

  1. With your query in the editor, click Schedule (top right).

  2. Click Create new scheduled query.

  3. Fill in:

    • Name: Skio Replication

    • Repeat frequency: Hours → every 4 hours (matches Skio's refresh cadence). You can go as low as every 15 minutes, but there's no data benefit below 4 hours.

    • Destination for query results: leave blank — the CREATE OR REPLACE TABLE statements handle writes explicitly.

    • Advanced options → Service account: (recommended) use a service account in your project that has BigQuery Data Viewer on the Skio dataset and BigQuery Job User in your project. This keeps the schedule working even if your user account changes.

  4. Click Save.

Step 4: Verify

Go to Scheduled queries in the left sidebar, click your query, and check the run history. Force a manual run to confirm it works end-to-end.

Limitations

  • You have to maintain the SQL. If Skio adds a new table and you want it copied, you'll need to add a new CREATE OR REPLACE TABLE block. Ask Skio for the current table list periodically, or use Method A if you want automatic table discovery.

  • Full table rewrites are fine at typical Skio data volumes, but can increase compute costs on very large tables. If you're watching BigQuery costs, your Skio contact can advise on incremental merge patterns using _PARTITIONTIME or updatedAt.


Which method should I use?

  • You want a copy of all the data, refreshed daily, with minimal setup → Method A

  • You need data refreshed more frequently than every 12 hours → Method B

  • You want to filter or transform data on the way in → Method B

  • You want new tables Skio adds to show up automatically → Method A

When in doubt, start with Method A. You can always switch to Method B later.


Troubleshooting

"Access Denied: Dataset largedata-380204:skio_xxx. Permission bigquery.datasets.get denied"

The identity running the job doesn't have Data Viewer on Skio's dataset. Confirm with Skio that the exact email or service account address is on the whitelist. If you added it recently, permissions sync can take up to a few hours — retry before escalating.

"User does not have permission to query table largedata-380204:skio_xxx.YYY"

Usually the same root cause as above. Also confirm whether Skio has you listed as a user vs. a group — Google treats these as distinct principal types, and a mismatch will cause the permission to silently fail.

"Permission bigquery.jobs.create denied in project"

The identity running the job is missing BigQuery Job User on your project. See the permissions section above.

The dataset doesn't appear in my BigQuery Explorer

Log in as the authorized user and go to console.cloud.google.com/bigquery?project=largedata-380204. Click the star next to largedata-380204. The project will now appear in your Explorer sidebar.

My Dataset Copy transfer won't let me schedule more frequently than every 12 or 24 hours

This is a BigQuery Data Transfer Service limitation. Switch to Method B if you need a tighter cadence.

My copy is stale — Skio's data has rows that mine doesn't

Check your transfer or scheduled query run history for failures. A failed run leaves the previous snapshot in place. Common causes: revoked permissions, a schema change on a Skio table, or paused GCP billing.


FAQ


How often does Skio refresh the source dataset?

Every ~4 hours, with about 2 hours of latency end-to-end. Data from 6:00 PM, for example, is typically queryable around 8:00 PM.

Will copying all the data cost a lot?

Storage in BigQuery is cheap — fractions of a cent per GB per month. The main variable cost is query compute (~$6.25 per TB scanned on on-demand pricing). A daily full-dataset copy is typically negligible unless your dataset is very large.

Can I query Skio's dataset directly without copying?

BigQuery supports cross-project queries when permissions are configured correctly. In practice, most BI tools and warehouse integrations assume everything lives in one project, which is why copying is recommended. If direct querying would work for your use case, reach out to your Skio contact.

Can Skio write data directly into my GCP project?

Not currently. Skio owns the source storage to maintain data integrity and cost control. The copy methods above are the supported integration paths.

What if I use Fivetran, Airbyte, or Hightouch?

These tools typically need write access to the dataset, which Skio doesn't grant by default. The supported paths are Method A and Method B. If you have a specific ETL tool in mind, flag it with your Skio contact and they can advise on compatibility.