Hi
How can i eliminate duplicate records from a single table, when i use a query that links multipla tables together?
There is a surrogate key TABLE1 and i just can't seem to wrap my head around the layout.
TABLE1
ID | memQuestionaireID | QuestscoreID | DateOfAssessment | etc.
TABLE2
memQuestionaireID | MemberID | Answer1 | Answer2 | etc.
TABLE3
MemberID | Name | Surname | IDNo | etc.
t1.memQuestionaireID = t2.memQuestionaireID
t2.MemberID = t3.MemberID
That's how the tables link, obvious enough. How can i delete duplicate records from TABLE1 where MemberID occurs more than once and the t1.DateOfAssessment = MAX(DateOfAssessment)
The ID is unique, but there are duplicate memberID's with different date of assessments, i want to take the last used date of assessment and keep only that one, and remove the rest from the table?
I'm sure the query is simple, i'm just too stupid to see how to solve it!!
Any help would be greatly appreciated.
I'm not very clued up on how to remove records or update records, Please help.
Kind Regards
Carel Greaves
See if this does what you need...
Code Snippet
DELETE t1
FROM Table1 t1
INNER JOIN Table2
on t1.memQuestionaireID = t2.memQuestionaireID
INNER JOIN
(
SELECT t2b.MemberID, max(t1b.DateOfAssessment) as DateOfAssessment
FROM table1 t1b
INNER JOIN table2 t2b
ON t1b.memQuestionaireID = t2b.memQuestionaireID
AND t2b.MemberID in
(
SELECT MemberID
FROM Table3
GROUP BY MemberID
HAVING count(*)>1
)
GROUP BY t2b.MemberID
) as list
ON t1.DateOfAssessment < list.DateOfAssessment
and t2.MemberID = list.MemberID
To view what will be delete w/o actually deleting, just change the first line from DELETE t1 to SELECT t1.*
|||
I'm a bit confused Carel,
Call me dense today, but it seems like the MemberID 'should' be in Table1 since it is the Member that has a memQuestionaireID, has a QuestScoreID and a DateOfAssessment. Then for Table2, it is the Questionaire that has a set of answers, etc. Please help me out here...
It seems like your Table1 has a one-to-many relationship with Table2. I'm assuming that a Member may have multiple instances of the same questionnaire (re-takes, etc.)
Is one questionnaire (MemQuestionnaireID) with a QuestScoreID and a DateOfAssessment directly linked to one or more set(s) of answers in Table2?
Is the QuestScoreID changed upon each re-take? (Shouldn't the score be directly related to a set of answers?)
I ask this because it seems from the design you presented that perhaps you want to also delete rows from Table2...
|||I know the database structure is a bit stupid. Not my design!
I am only trying to remove duplicate entries in the QuestScore table.
Problem:
A member fills in a Questionaire - MemQuestionaires
The Querstionaire's Score is placed in the QuestScore Table.
There are duplicate entries in the QuestScore table with regards to a questionaireID from the memQuestionaires Table
If a duplicate exists, then the members can't log into there profiles anymore, so i need to remove the duplicates so that the members can access their profiles again.
I'm quite new to all of this, so any help would be greatly appreciated
Kind Regards
Carel Greaves.
|||You gave us Table1, Table2, Table3...I'm guessing Table1 is QuestScore?
Did my code above work? If not, what was not quite right?
Are there multiple entries just in Table1, or is it Table3?
Please clarify and maybe post some sample data...
|||Okay, so i'm a bito f a moron :-) he he
Here is the code i used to find the duplicates
SELECT me.MemberID, /*me.QuestionaireID,*/ q.memQuestionaireID, q.*, me.*
FROM questscore q, memQuestionaires m, Members me
WHERE q.memquestionaireID = m.memquestionaireID
AND m.MemberiD = me.memberid
AND me.ClientID = 124
ANd m.questionaireid = 85
AND q.sectionid >= 1000000
ORDER BY me.memberID
Here is my results:
i have marked the duplicate values in red that i want to eliminate.
I need to eliminate only the record with the oldest date, and keep the newer one.
Hopefully when i've gone for my SQL course then i will only be half a moron :-) he he
Thanks for helping me out guys.
Kind Regards
Carel Greaves
|||Wow, talk about a 'bolixed' up database...
I'm still confused, in your query above, you have a WHERE clause that indicates the results should be filtered down to QuestionaireID = 85, but your duplicate records are for memQuestionaireID 11129. What is the difference between QuestionaireID and memQuestionaireID? How did that happen?
Also, the DateCompleted is obviously not a date, but a number of hours. Is that the number of hours since some event? How are the hours determined?
Which row is the 'newer one'?
It would be very helpful, and keep us from wasting our time, if you would post the DDL for each of the tables, as well as some sample data for each table (in the form of INSERT statements.)
As you seen, you have folks sincerely attempting to help you, and to this point wasting our time because we just don't have the 'full' picture. You've inherited a very 'odd' database, and you need to help us so we can better help you.
|||Here is sample data in the Tables.
Member Table
memQuestionaires Table
QuestScore Table
When i executed this statement, i only used these three tables.
SELECT me.MemberID, /*me.QuestionaireID,*/ q.memQuestionaireID, q.*, me.*
FROM questscore q, memQuestionaires m, Members me
WHERE q.memquestionaireID = m.memquestionaireID
AND m.MemberiD = me.memberid
AND me.ClientID = 124
ANd m.questionaireid = 85
AND q.sectionid >= 1000000
ORDER BY me.memberID
I am only trying to remove the duplicate records for where ClientID = 124
AND QuestionaireID = 85
as for the 1000000, there it is just to narrow down the search for myself, there are a lot of duplicates.
There are a lot of members -> Members Table
Each member can fill in a number of questionaires -> QuestionaireID on memQuestionaires Table in this case 85
And all the questionaire's data is stored in the QuestScore Table which is linked to the memQuestionaires table using the memQuestionairesID.
There seems to be duplicate values in the QuestScore Table referring to the members.
The date completed is an actual DATETIME field in the database, i used excel to transfer the data to here. The newer date will be a actual date with the time in the database i.e. 2007/05/23 02:05:11
All the fields get inserted into the database via a application which i haven't seen.
I'm new to the company and am trying to figure a lot of it out for myself too.
If there is any more information you need, please don't hesitate to ask.
|||try this one,DECLARE @.QuestScore TABLE (
QuestScoreID int
, memQuestionaireID int
, DateCompleted smalldatetime
)
DECLARE @.MemQuestionaire TABLE (
MemQuestionaireID int
, MemberID int
, QuestionaireID int
, UpdateDate smalldatetime
)
INSERT
INTO @.QuestScore
SELECT 1, 1, dateadd(day,-1, getdate()) UNION ALL
SELECT 2, 2, dateadd(day,-1, getdate()) UNION ALL
SELECT 3, 3, dateadd(day,-3, getdate())
INSERT
INTO @.MemQuestionaire
SELECT 1, 1, 4, dateadd(day,-1, getdate()) UNION ALL
SELECT 2, 1, 4, dateadd(day,-1, getdate()) UNION ALL
SELECT 3, 2, 5, dateadd(day,-3, getdate())
select *
from @.memquestionaire
select *
from @.questscore
declare @.tobedeleted table( -- create a table var for the data to be deleted
memquestionaireid int
, datecompleted smalldatetime
)
insert
into @.tobedeleted -- the memquestionaireid and datecompleted to be deleted
select MIN(qq.memquestionaireid) as memquestionaireid -- just in case if they have the same date, get the lowest id
, mq.datecompleted
from @.questscore qq inner join
(
select m.memberid
, m.questionaireid
, MIN(q.datecompleted) as datecompleted -- delete the lowest date
from @.memquestionaire m inner join
@.questscore q on m.memquestionaireid = q.memquestionaireid
group by
m.memberid
, m.questionaireid
having count(m.memberid) > 1 -- member having more than 1 quest score
) mq on qq.datecompleted = mq.datecompleted
group by
mq.datecompleted
delete @.questscore -- delete in questscore
from @.questscore q inner join
@.tobedeleted dq on q.memquestionaireid = dq.memquestionaireid
and q.datecompleted = dq.datecompleted
delete @.memquestionaire -- delete in memquestionaire
from @.memquestionaire m inner join
@.tobedeleted dq on m.memquestionaireid = dq.memquestionaireid
select *
from @.questscore
select *
from @.memquestionaire|||
Thanks, that's awesome, that's why i come to you guys for help.
Another quick question, that i think it might be easier to do, i just can't do it. (I just thought of it now)
If i added memQuestionaireID and SectionID together to create a unique field, and then filter out the duplicates by taking out the old dates within the duplicates it might be easier, but how would i go about accomplishing this?
I don't know how to compare the dates against itself to get keep the new date and get rid of the old date?
sorry for all of this i'm just trying to learn from all the things myself?
|||Thanks, that was a start.
Here is an example of the 'best' form for DDL and sample data. In this fashion, each person that wants to try and help you can just run this code and have 'your' problem to work with. Without this, folks are turned away because it takes so much time and effort to duplicate your situation -and helps to keep us from running off on tangents that don't really help you.
Now if you would only add to the sample data so that examples of the duplicate records you want to find and delete are represented, we 'should' be able to help you.
Code Snippet
DECLARE @.Members table
( med_aid int,
MemberID int,
ClientID int,
Name varchar(20),
Surname varchar(20),
Username varchar(100),
Password varchar(20)
)
INSERT INTO @.Members VALUES ( NULL, 10000000, 1, 'NiftyName', 'NiftySurname', 'his', 'test' )
INSERT INTO @.Members VALUES ( NULL, 10000001, 5, 'NiftyTest2', 'Surname2', 'nifty@.ybo.oa', 'JBCYWO' )
INSERT INTO @.Members VALUES ( NULL, 10000002, 5, 'KeithTest', 'WilsonTest', 'willi', 'willi' )
DECLARE @.memQuestionaires table
( memQuestionaireID int,
MemberID int,
QuestionaireID int,
LastPage int,
Paused int,
Complete int,
MaxPages int,
Sent int,
LastSaved int,
InitMail int,
ReceiveMail int,
UpdateDate datetime
)
INSERT INTO @.memQuestionaires VALUES ( 871, 10000000, 85, 1, 0, 0, 1, 1, 0, 1, 1, '2007/02/25 06:23' )
INSERT INTO @.memQuestionaires VALUES ( 872, 10000001, 85, 1, 0, 0, 1, 1, 0, 1, 1, '2007/02/25 06:23' )
INSERT INTO @.memQuestionaires VALUES ( 873, 10000002, 85, 1, 0, 0, 1, 1, 0, 1, 1, '2007/02/25 06:23' )
DECLARE @.QuestScore table
( QuestScoreID int,
memQuestionaireID int,
AnswerID int,
AnswerValue int,
SectionID int,
Page int,
QuestType int,
AnswerText int,
AnswerShort int,
Updatedate int,
DefaultValue int,
DateCompleted datetime
)
INSERT INTO @.QuestScore VALUES ( 4641, 871, 0, -9, 361, 1, 9, '', '', NULL, NULL, '2007/02/07 22:31' )
INSERT INTO @.QuestScore VALUES ( 4642, 872, 0, -5, 362, 1, 6, '', '', NULL, NULL, '2007/02/07 22:31' )
INSERT INTO @.QuestScore VALUES ( 4643, 873, 0, 0, 345, 1, 2, '', '', NULL, NULL, '2007/02/07 22:31' )
Sorry i'm sending you guys on a wild goose chase like this, and thanks a lot Arnie, i'd buy you a case of beer if i knew where you were.
I'm really learning a lot from this!
I just thought that it might be easier, if i had to add the memQuestionaireID and SectionID together in the QuestScore table to get a single unique value and remove the duplicates based on the newer unique value according to the Date, i.e. Only removing the old dates.
Sorry about this, i only saw it now.
That way i'm only using a single table.
|||Thanks,
And my desk is under the stuff somewhere, or so it was a few months ago...
And UPS serves Portland, OR! But it would be much more enjoyable to share, so if you can wrangle it, start lobbying management to let you come to the PASS (Professional Association for SQL Server) Conference in September. (http://sqlpass.org/)
|||Cummon guys, i'm a newbie at SQL Server in South Africa and it's late at night and my CEO's say i can't go home until i get this problem sorted out.
Please could you guys help me out quick?
No comments:
Post a Comment