A Guide to Using UNION in SQL: Combining and Manipulating Data
Introduction: Understanding the Purpose and Functionality of UNION in SQL
The SQL UNION operator allows us to merge the results of two or more SELECT statements into a single result set.
It performs a set operation by combining rows from different tables based on their column names and data types. This functionality proves to be immensely useful when we need to consolidate related information from various sources into one cohesive dataset. For example you may need to consolidate sales data from different subsidiaries, each of which may be using different data sources.
By utilizing the UNION operator, we can effectively eliminate duplicate records while merging data from multiple tables. This ensures that our resulting dataset remains clean and concise without any redundant information.
Throughout this article we will explore its syntax, examine various use cases where it can be applied, and understand how it contributes to efficient data manipulation within relational databases.
SQL UNION is an essential tool for combining and merging data in SQL.
Let us start with a simple example
We will use two similar sales data sets, the first covers store numbers 1 to 5 (SalesA)
We also have a second dataset, covering store numbers 5 to 10 (SalesB)
You will notice that store 5 features in both datasets, you will see what happens with the duplicates shortly
Using UNION to Combine Data from Multiple Tables
The syntax of the UNION operator is SELECT columns FROM SalesA UNION SELECT columns FROM SalesB; If we apply this to our sample dataset our SQL code looks like this select [Product_ID], [Store_ID], [qty_sold] from SalesA union select [Product_ID], [Store_ID], [qty_sold] from SalesB
We can run this to get the results we are looking for
The Union operator has merged rows from the separate tables into a single dataset, and if you look carefully you will notice that the duplicate row for store 5 has been resolved
Eliminating rows is a key feature of the UNION operator
Key points to note when using the UNION method
- There must be a matching number of columns in both sides of the union
- The data types must be the same or compatible in the corresponding columns, (e.g. you cant mix text and integers unless you translate the integers values to text in one of queries)
- You must be sure that eliminating duplicate rows with UNION is the correct treatment you need
- You need to be sure that the columns are in the same order. in this example it would be easy to transpose the product ID column with the store ID column. The query would still work as the data types are the same but the data would be wrong and difficult to spot
Applying UNION ALL for Including Duplicate Rows in the Result Set
Eliminating duplicates is all well and good, but what if you need to keep the duplicates.
If we use the UNION ALL operator then all rows are retained including duplicates
The syntax for UNION ALL is essentially the same as UNION
lets see the result
We can clearly see the effect of retaining duplicate rows with UNION ALL
Leveraging the Power of Union to Manipulate and Combine Data Effortlessly in SQL Queries
Now that you understand the power of being able to combine data using UNION, you can go on and efficiently combine data from different sources
Watch this space as we will show how to do the same in Power Query in the coming weeks
Remember to make sure that your columns are in the correct order as lining up the columns incorrectly wouldn't necessarily fail or be identified if the data types are the same One final tip that we find useful if combining datasets is to add a source column, particularly when using UNION ALL, this way you can easily trace which table a row has been returned from Example select [Product_ID], [Store_ID], [qty_sold], 'SalesA' as Source from SalesA union ALL select [Product_ID], [Store_ID], [qty_sold], 'SalesB' from SalesB
Its also worth pointing out that the UNION ALL query inherits its field names from the first query, in this example we don’t need to name the last column in the second query
Subscribe to our channel to see more SQL tips and timesavers
Select Distinct YouTube Channel
Or find other useful SQL, Power BI or other business analytics timesavers in our Blog
Our Business Analytics Timesavers are selected from our day to day analytics consultancy work and are the everyday things we see that really help analysts, SQL developers, BI Developers and many more people. Our blog has something for everyone, from tips for improving your SQL skills to posts about BI tools and techniques. We hope that you find these helpful!