Duplicate rows in SQL Server

Question

I am having duplicate rows with the same storeactivityid show up in my results... This is the primary key, so this should not happen. Is there something wrong with my joins that could cause this? I could use distinct, but that will not solve the issue here. Any tips or advice? There are 3 duplicates showing for each result!

select pd.storeactivityid,e.EMPLOYEENAME,c.ChainName,c.UserCode as ChainNumber,
          s.storenumber,s.StoreNameAndNumber,
          pd.startdatetime,
          pd.enddatetime,
          cast((datediff(s, pd.startdatetime, pd.enddatetime) / 3600.0) as decimal(9,2)) as duration,
          exceptioncodes,pe.Description,isnull(pd.approved, 0) as approved,
          isnull(pd.comment, '') as comment,
          pd.modifieddate
    from payrolldetail pd with (nolock)
          inner join payperiods pp with (nolock) on pd.enddatetime between pp.begindate and pp.enddate and pp.CompanyID = @companyid
          left join stores s with (nolock) on pd.storeid  = s.storeid
          left join chains c with (nolock) on c.chainid = s.chaincode
          left join employees e with (nolock) on pd.employeeid = e.employeeid
          inner join payrollexceptions pe with (nolock) on pd.ExceptionCodes = pe.Code
    where pd.companyid = @companyid 
    and cast(getdate() as date) between pp.begindate and pp.enddate
    and exceptioncodes = @exceptioncodes
    and pd.companyid = @companyid

One of your `JOINS` must have a 1-many relationship which could be returning what look like duplicated rows, when in fact, the entire row should be unique. If you want only a single `storeactivityid`, then how should the other data be displayed? — sgeddes, May 15 '14 at 16:13
Well one of the joined tables has > 1 row that satisfies the join condition. Comment them out one by one or build the query up one at a time and see at what point its stops behaving. — Alex K., May 15 '14 at 16:13
Any one of the joins could be responsible for creating multiple rows with the same storeactivityid. You have to check each relation and test it. Are you sure that for each pd.enddatetime you will get only a single payperiods row? And so on... — Frazz, May 15 '14 at 16:14
Changed the last join to: inner join payrollexceptions pe with (nolock) on pd.ExceptionCodes = pe.Code and pe.CompanyID = @COMPANYID It was not unique enough... That did the trick. I appreciate it everyone! — Tanner_Gram, May 15 '14 at 16:34

score 0 · Answer 1 · answered May 15 '14 at 16:13

i do not know what is in each of the tables, but the easiest way i found to debug something like his is

select [storeactivityid],count([storeactivityid]) as [count]
from [<table>]
<Start adding joins in one at a time>
where [count] > 1
group by [storeactivityid]

score 0 · Accepted Answer · answered May 15 '14 at 16:23

If it is a primary key, you can be certain that in the actual table you do not have duplicate rows with the same storeactivityid.

Your query returns rows with the same storeactivityid because at least one of the joined tables has the matches the condition specified in the join.

My best guess it is due to the followoing join:

inner join payperiods pp with (nolock) on pd.enddatetime between pp.begindate and pp.enddate and pp.CompanyID = @companyid

Is it possible that a company has multiple payrolldetails within the same range of dates specified in the payperiods table?

score 0 · Answer 3 · answered May 15 '14 at 16:26

If you use NOLOCK hint to be able to do 'dirty reads' (read uncommitted) and some modifications are happen in the same time on this tables, this may cause missing or double count of even unique rows! If there is no activity on the server, no updates/inserts/deletes, than there is something inside data of your tables that caused duplicating, as other guys have already said.

Duplicate rows in SQL Server

3 Answers3