This quick guide shows you how to split a column in BigQuery
When working with text data, quite often you need to split a column or trim excess characters to clean the data
This quick guide shows you how to do it. Using URL data from Google Analytics GA4 to create separate columns for the Clean URL and the URL parameter
We show how to split the column by the delimiter. The delimiter is the specific character to use for the split
Initial Set up
Lets start with a list of URLs from our web traffic
You can copy this code to try it
WITH URLs AS (SELECT 'https://www.selectdistinct.co.uk/?utm_campaign=directory&utm_medium=referral&utm_source=themanifest.com' as page_location UNION ALL SELECT 'https://www.selectdistinct.co.uk/business-intelligence-expert-services/power-bi-mentoring/?gclid=EAIaIQobChMI1I3E3aa_ggMVJz4GAB3jzgg-EAAYASAAEgKnwPD_BwE' as page_location UNION ALL SELECT 'https://www.selectdistinct.co.uk/2023/06/28/topn-in-power-bi/?fbclid=IwAR2WKv315a8lxzS3kBxuGd3sZSr7VXcZDkxxFOVDxI0fcevx-I506qLfYew' as page_location UNION ALL SELECT 'https://www.selectdistinct.co.uk/2023/04/27/rolling-averages-and-rolling-sums-in-power-bi/?fbclid=IwAR2_o-1lnKTuNynhgdSrBZkGLzRH5Sa3aNpqtDjFwH4mq2Z7ii0U0RRDO0c' as page_location UNION ALL SELECT 'https://www.selectdistinct.co.uk' as page_location ) select URLs.* from URLs
After running this code you will see 5 rows of URL data for testing
Step 1
The first step is to find the position of the delimiter, In our case it’s a question mark
select URLs.*, strpos(page_location, ‘?’) as String_Position
STRPOS tells us the character position of the question mark
Step 2
Next,
Find the length of the page location field
length(page_location) as URL_length from URLs
The LENGTH command tells us the total number of characters in the full name
adding this as a column to our SQL gives us
select URLs.*
, strpos(page_location, ‘?’) as String_Position
, length(page_location) as URL_length from URLs
Step 3
The next step is to find the characters that make up the clean url before the url parameter, by finding everything to the left of the question mark
case when strpos(page_location, ‘?’) = 0 then page_location
else left(page_location, strpos(page_location, ‘?’) -1) end as clean_url
Here we need to first check that the url data has as url parameter by checking for the question mark, if it does not have a question mark then the url is good to use. But if the data does contain a question mark we find everything to the left of that position
This takes the question mark position of 34 in the first example, subtracts 1 to remove the question mark itself, and returns the first 33 characters to give us a clean url
https://www.selectdistinct.co.uk/
Step 4
For the URL parameters,
we want everything to the right of the question mark
case when strpos(page_location, ‘?’) = 0 then NULL else
substring(page_location, strpos(page_location, ‘?’)+1, length(page_location) – strpos(page_location, ‘?’)) end as URL_Parameter
This takes the question mark position of 34 in the first example, adds 1 to remove the question mark itself, and returns the remaining characters by evaluating how many characters remain
This data can then be used for further analysis, linking the URL back to other data , whilst also being able to see the parameter details
With more work we could break down the URL parameter further and provide even more insights
Here is the full SQL code that you should have if you have been following along
WITH URLs AS
(SELECT ‘https://www.selectdistinct.co.uk/?utm_campaign=directory&utm_medium=referral&utm_source=themanifest.com’ as page_location
UNION ALL
SELECT ‘https://www.selectdistinct.co.uk/business-intelligence-expert-services/power-bi-mentoring/?gclid=EAIaIQobChMI1I3E3aa_ggMVJz4GAB3jzgg-EAAYASAAEgKnwPD_BwE’ as page_location
UNION ALL
SELECT ‘https://www.selectdistinct.co.uk/2023/06/28/topn-in-power-bi/?fbclid=IwAR2WKv315a8lxzS3kBxuGd3sZSr7VXcZDkxxFOVDxI0fcevx-I506qLfYew’ as page_location
UNION ALL
SELECT ‘https://www.selectdistinct.co.uk/2023/04/27/rolling-averages-and-rolling-sums-in-power-bi/?fbclid=IwAR2_o-1lnKTuNynhgdSrBZkGLzRH5Sa3aNpqtDjFwH4mq2Z7ii0U0RRDO0c’
as page_location
UNION ALL
SELECT ‘https://www.selectdistinct.co.uk’ as page_location
)
select URLs.*
, strpos(page_location, ‘?’) as String_Position
, length(page_location) as URL_length
, case when strpos(page_location, ‘?’) = 0 then page_location
else left(page_location, strpos(page_location, ‘?’) -1) end as clean_url
, case when strpos(page_location, ‘?’) = 0 then NULL else
substring(page_location, strpos(page_location, ‘?’)+1, length(page_location) – strpos(page_location, ‘?’)) end as URL_Parameter
from URLs
You can apply this same technique to split other types of data, or substitute the space for a different character
Subscribe to our channel to see more tips and timesavers
Select Distinct YouTube Channel
You may be interested in this
https://www.selectdistinct.co.uk/2023/04/13/split-a-column-in-sql-server/
Or find other useful SQL, Power BI or other business analytics timesavers in our Blog
Our Business Analytics Timesavers are selected from our day to day analytics consultancy work. They are the everyday things we see that really help analysts, SQL developers, BI Developers and many more people.
Our blog has something for everyone, from tips for improving your SQL skills to posts about BI tools and techniques. We hope that you find these helpful!