Author Archives: The SQL Mind Team

Why is Data Quality important

In recent years, Data quality is becoming more and more crucial for companies and business around the world, since it directly impacts decision-making, operational efficiency, and business outcomes. High-quality data ensures that businesses make informed decisions based on accurate and reliable information. Poor data can lead to errors, misinterpretations, and costly mistakes. Also, clean and consistent data reduces time spent correcting errors, resolving discrepancies, and reprocessing transactions. This improves productivity and reduces costs.

In regard to the Customer Satisfaction, accurate data helps businesses understand and serve their customers better, leading to improved customer experiences and stronger relationships. In the area of Regulatory Compliances, many industries are subject to strict regulations regarding data usage and reporting. High-quality data ensures compliance and reduces the risk of legal penalties or reputational damage.

Last but not least, the most recent revolution of AI is heavily related to Data Quality as well. AI models and machine learning algorithms depend on quality data for training and prediction. Poor data quality can lead to inaccurate models and flawed outputs.

By investing in data quality, organizations ensure their data is accurate, complete, consistent, and up to date, enabling them to operate efficiently and achieve their goals.

Finding duplicates in SQL Server

In a single column:

To get duplicates on a specific column, you can use the GROUP BY and COUNT(*) statements. For example, to find duplicates on the “ModifiedDate” column of the “Sales.SalesOrderDetails” table from the “AdventureWorks” sample database, you can use the following code:

SELECT ModifiedDate, COUNT(*) duplicates 
 FROM Sales.SalesOrderDetail
GROUP BY ModifiedDate
HAVING COUNT(*) > 1
ORDER BY COUNT(*) desc

Result:

In multiple columns:

Similarly, GROUP BY and COUNT(*) can be used too, but, in this case, we could convert each column to VARCHAR (so that they have all the same data type) and then concatenating them. For example, let’s use the “ModifiedDate” and “ProductID” fields from the same table:

select CONVERT(VARCHAR(100), ModifiedDate) + ' ' + CONVERT(VARCHAR(100),ProductID) columns,
  count(*) duplicates
from Sales.SalesOrderDetail
group by CONVERT(VARCHAR(100), ModifiedDate) + ' ' + CONVERT(VARCHAR(100),ProductID)
having count(*) > 1
order by count(*) desc

Result:

How to search for objects on all the SQL Server databases

You can seach for objects in all the SQL Server databases within one instance by using the built-in stored procedure: sp_MSforeachdb. For example, let say that you are looking for a stored procedure called: uspGetManagerEmployees but you don’t know in what databases it was created. You can use the following 3 lines to find it:

DECLARE @command varchar(1000);
SET @command = 'USE ? select DB_NAME() dbname, p.* from sys.procedures p where name like ''%uspGetManagerEmployees%''';
EXEC sp_MSforeachdb @command;

Similarly, you can search for tables, columns, etc. For example, the following script is looking for a table called SalesPerson:

DECLARE @command varchar(1000);
SET @command = 'USE ? SELECT c.* from INFORMATION_SCHEMA.COLUMNS c WHERE c.TABLE_NAME like ''SalesPerson'''; 
EXEC sp_MSforeachdb @command;

How to find queries, stored procedures and parameter’s values using SQL Profiler

Have you ever faced the problem that you want to see that query or stored procedure that was executed from the front-end application, but you are not able to see it because it’s running in the backend? Well, there is one way: through Microsoft SQL Server Profiler.

Microsoft SQL Server Profiler is a graphical user interface to SQL Trace for monitoring an instance of the Database Engine or Analysis Services. You can capture and save data about each event to a file or table to analyze later.

For example, you can monitor a production environment to see which stored procedures are affecting performance by executing too slowly. SQL Server Profiler is used for activities such as:

  • Stepping through problem queries to find the cause of the problem.
  • Finding and diagnosing slow-running queries.
  • Capturing the series of Transact-SQL statements that lead to a problem. The saved trace can then be used to replicate the problem on a test server where the problem can be diagnosed.
  • Monitoring the performance of SQL Server to tune workloads. For information about tuning the physical database design for database workloads, see Database Engine Tuning Advisor.
  • Correlating performance counters to diagnose problems.

To show you how to use this tool, let’s se the public SQL Server database: AdventureWorks2019.

Once the connection is stablished to the database in SQL Server Management Studio, open SQL Server Profiler by clicking in File -> Sql Server Profiler:

In the Trace Properties windows, click on Events Selection tab, then check Show All events checkbox:

The three most important events you can use to trace queries and stored procedures are:

  • RPC:Completed
  • SP:StmtCompleted
  • SQL:StmtCompleted

To select them, click on the TextData checkboxes:

Next, you can pause the trace to wait for the query/stored procedure execution. We also recommend clearing the traces (The Erase icon below the Replay menu option) to remove all the traces, so you can focus on the ones you are looking for:

For example, let’s run the following query from SSMS:

SELECT b.[ProductAssemblyID], b.[ComponentID], p.[Name], b.[PerAssemblyQty], p.[StandardCost], 
p.[ListPrice], b.[BOMLevel] , b.[StartDate]
FROM [Production].[BillOfMaterials] b
INNER JOIN [Production].[Product] p 
ON b.[ProductAssemblyID] = p.[ProductID] 

Now you can see it in SQL Profiler. (Don’t forget to pause the traces so you can focus on the query ones):

Let’s clear the traces using the erase icon, click on the Start Selected Trace icon and execute the following stored procedure with parameters in SSMS:

exec [dbo].[uspGetWhereUsedProductID] 505, '2010-12-20'

In SQL Server Profiler, you can see the script of the stored procedure:

The next SQL:StmtCompleted even will show the execution statement, including the parameter’s values. This is valuable when you want to see the name of the stored procedure and the list of parameters when it was executed from the application.

How to get the current running queries in SQL Server

To get the list of current running queries, you can use the following script:

SELECT sqltext.TEXT,
req.session_id,
req.status,
req.command,
req.cpu_time,
req.total_elapsed_time,
req.total_elapsed_time/1000 AS total_elapsed_time_in_secs,
req.total_elapsed_time/60000 AS total_elapsed_time_in_mins,
users.name
FROM sys.dm_exec_requests req
INNER JOIN sys.sysusers users on users.uid = req.user_id
CROSS APPLY sys.dm_exec_sql_text(sql_handle) AS sqltext
order by sqltext.text

How to insert into a one column (identity) table

Tables with just one column, which has been specified as identity columns, are not very common, and inserting data into them might be a bit tricky. To do that you need to use DEFAULT value without parenthesis. For example, let’s create the table IdentityTable:

CREATE TABLE IdentityTable
(
tableID INT PRIMARY KEY IDENTITY(1,1) NOT NULL
)

To insert data, you need to use the following statement:

INSERT INTO IdentityTable DEFAULT VALUES