Showing posts with label eliminate. Show all posts
Showing posts with label eliminate. Show all posts

Sunday, February 26, 2012

Eliminating rows from select results

Hi,

is there any way to eliminate a row from the results returned from a select statement?

I'm using some aggregate functions and doing some division and occassionally I get a divide by zero error. Is there a way I can leave the row out if the divisor is zero? Briefly, my query looks like this:

select sum(a*b)/sum(b), c
from TableA
group by c

If sum(b) is zero I want to leave that row out.

Thanks for your help!

WallaceDoes your database engine supporr the HAVING clause?

-PatP|||This is what HAVING is doing essentially:

SELECT sum_a / sum_b, c
FROM ( SELECT SUM(a*b), SUM(b), c
FROM tableA
GROUP BY c ) AS t(sum_a, sum_b, c)
WHERE sum_b <> 0|||Having should do the trick.

Thanks for your help!|||This is what HAVING is doing essentially:

SELECT sum_a / sum_b, c
FROM ( SELECT SUM(a*b), SUM(b), c
FROM tableA
GROUP BY c ) AS t(sum_a, sum_b, c)
WHERE sum_b <> 0

It is answers like the one you provided above that encourage poor software development, in this instance, database development.

There is no need to create an inline view and filter the result set by applying a where clause to the outer query, when in this instance, the exact same result can be produced by using the Having clause.

Using the having clause is the preferred method for applying criteria to aggregate results, it was designed for this exact purpose. Although the where clause can be used in certain situations to filter aggregates, it was not designed so.

To follow your example, perhaps when I provide examples I should replace all column names with some arbitrary name in an attempt to make the solution appear complex and worthy of admiration, when in fact these types of responses would only cause unnecessary confusion to the reader.|||robert, take it easy, he was showing what the HAVING clause does, not suggesting that you avoid using it|||I do apologise if my response appears rude and overtly frank. However, I consider it to be totally inappropriate to hijack a thread with the intention to be a "clever dick" and publish a random solution with complete disregard for the original question.

stolze posted an unnecessary solution to a simple question, which could have easily been answered using a HAVING clause, but did not consider it proper to provide in addition, a solution using the Having clause.

Stolze: The poster was not asking for ways to replace the functionality of the Having clause, but instead the question asked how to remove rows from a set, and in this particular instance, the condition was to be applied after the aggregation. In future, perhaps you could assist the poster with their question before trying to prove a point with respect to your technical ability.

You wouldn't speak to your fellow colleagues in such a patronising tone, so why do it here?|||You wouldn't speak to your fellow colleagues in such a patronising tone, so why do it here?so why are you doing it?|||My comments were not said with a patronising tone.|||perhaps you did not intend them to be, but that is how they came across|||I apologise to all those who may find my comments in this thread rude and patronising. My intent was just to point out that if a simple and easy to understand solution exists, then it should be provided first, before suggesting other less obvious methods.

That's all.|||select sum(a*b)/sum(b), c
from TableA
group by c

If sum(b) is zero I want to leave that row out.

As already suggested, the following is indeed what you need:SELECT sum(a*b)/sum(b), c
FROM TableA
GROUP BY c
HAVING sum(b) <> 0
Note that this also avoids zero division, i.e., the expression sum(a*b)/sum(b) is never executed when sum(b) is 0.|||I do apologise if my response appears rude and overtly frank. However, I consider it to be totally inappropriate to hijack a thread with the intention to be a "clever dick" and publish a random solution with complete disregard for the original question.

stolze posted an unnecessary solution to a simple question, which could have easily been answered using a HAVING clause, but did not consider it proper to provide in addition, a solution using the Having clause.

I just gave an explanation on how HAVING works. That's all...

Stolze: The poster was not asking for ways to replace the functionality of the Having clause, but instead the question asked how to remove rows from a set, and in this particular instance, the condition was to be applied after the aggregation. In future, perhaps you could assist the poster with their question before trying to prove a point with respect to your technical ability.

I have no idea why you are reacting so aggressively.

The OP apparently didn't know about the existence of HAVING and what it does. While the answer given by Pat was correct, I felt it is a good idea to provide more background information.

I had the chance to give some database courses at universities in the past. One thing I learned there is that students (they being university students or professionals doesn't matter in this respect) first have to learn basic concepts. Later you can use those basic concepts to explain other things. For example, once someone understands sub-selects, he/she will grasp the idea of having-clauses right away if you can show such a reformulated query. And that was my intention here as well. (I don't know where this was "patronizing".)

Also, I believe it is essential that people working with an RDBMS understand what is going on internally in order to avoid bad queries or to know what can be done for efficiently and what may not be such a great idea. For example, just look at the comments here: http://us2.php.net/odbc_num_rows All those solutions show a distinct lack of understanding of relational database systems. What's my point? It is that a good idea would be that people implement their own simple DBMS because it helps tremendously to understand how a DBMS works internally or how queries can be rephrased. A simple approach to deal with HAVING is to internally rewrite a query to the construct I posted - without loosing any functionality. (At the end of the day, the HAVING clause is very helpful but it just remains a bit syntactic sugar in SQL.) However, I'm fully aware that not everyone has the time or access to the necessary expertise to implement their own small DBMS...

I suggest that you read a few more threads in this forum and other newsgroup. You will find that questions range from very basic stuff on transactions, the relational model, etc. to the meaning of specific options or error messages and their possible causes. I found that additional explanations are a good way to answer question to the extent that the poster knows how to proceed.|||I just gave an explanation on how HAVING works. That's all...

A very handy explanation... I knew how HAVING works, and I'm sure if someone had asked me I could have shown how it's non-primitive, but I never actually broke it down like that.

I have no idea why you are reacting so aggressively.

Even if you could figure out why people say what they do, who cares? It's just words.

Eliminating Duplicate Records

Hi

How can i eliminate duplicate records from a single table, when i use a query that links multipla tables together?

There is a surrogate key TABLE1 and i just can't seem to wrap my head around the layout.

TABLE1

ID | memQuestionaireID | QuestscoreID | DateOfAssessment | etc.

TABLE2
memQuestionaireID | MemberID | Answer1 | Answer2 | etc.

TABLE3

MemberID | Name | Surname | IDNo | etc.

t1.memQuestionaireID = t2.memQuestionaireID

t2.MemberID = t3.MemberID

That's how the tables link, obvious enough. How can i delete duplicate records from TABLE1 where MemberID occurs more than once and the t1.DateOfAssessment = MAX(DateOfAssessment)

The ID is unique, but there are duplicate memberID's with different date of assessments, i want to take the last used date of assessment and keep only that one, and remove the rest from the table?

I'm sure the query is simple, i'm just too stupid to see how to solve it!!

Any help would be greatly appreciated.

I'm not very clued up on how to remove records or update records, Please help.

Kind Regards
Carel Greaves

See if this does what you need...

Code Snippet

DELETE t1

FROM Table1 t1

INNER JOIN Table2

on t1.memQuestionaireID = t2.memQuestionaireID

INNER JOIN

(

SELECT t2b.MemberID, max(t1b.DateOfAssessment) as DateOfAssessment

FROM table1 t1b

INNER JOIN table2 t2b

ON t1b.memQuestionaireID = t2b.memQuestionaireID

AND t2b.MemberID in

(

SELECT MemberID

FROM Table3

GROUP BY MemberID

HAVING count(*)>1

)

GROUP BY t2b.MemberID

) as list

ON t1.DateOfAssessment < list.DateOfAssessment

and t2.MemberID = list.MemberID

To view what will be delete w/o actually deleting, just change the first line from DELETE t1 to SELECT t1.*

|||

I'm a bit confused Carel,

Call me dense today, but it seems like the MemberID 'should' be in Table1 since it is the Member that has a memQuestionaireID, has a QuestScoreID and a DateOfAssessment. Then for Table2, it is the Questionaire that has a set of answers, etc. Please help me out here...

It seems like your Table1 has a one-to-many relationship with Table2. I'm assuming that a Member may have multiple instances of the same questionnaire (re-takes, etc.)

Is one questionnaire (MemQuestionnaireID) with a QuestScoreID and a DateOfAssessment directly linked to one or more set(s) of answers in Table2?

Is the QuestScoreID changed upon each re-take? (Shouldn't the score be directly related to a set of answers?)

I ask this because it seems from the design you presented that perhaps you want to also delete rows from Table2...

|||

I know the database structure is a bit stupid. Not my design!

I am only trying to remove duplicate entries in the QuestScore table.

Problem:

A member fills in a Questionaire - MemQuestionaires

The Querstionaire's Score is placed in the QuestScore Table.

There are duplicate entries in the QuestScore table with regards to a questionaireID from the memQuestionaires Table

If a duplicate exists, then the members can't log into there profiles anymore, so i need to remove the duplicates so that the members can access their profiles again.

I'm quite new to all of this, so any help would be greatly appreciated

Kind Regards

Carel Greaves.

|||

You gave us Table1, Table2, Table3...I'm guessing Table1 is QuestScore?

Did my code above work? If not, what was not quite right?

Are there multiple entries just in Table1, or is it Table3?

Please clarify and maybe post some sample data...

|||

Okay, so i'm a bito f a moron :-) he he

Here is the code i used to find the duplicates

SELECT me.MemberID, /*me.QuestionaireID,*/ q.memQuestionaireID, q.*, me.*

FROM questscore q, memQuestionaires m, Members me

WHERE q.memquestionaireID = m.memquestionaireID

AND m.MemberiD = me.memberid

AND me.ClientID = 124

ANd m.questionaireid = 85

AND q.sectionid >= 1000000

ORDER BY me.memberID

Here is my results:

MemberID memQuestionaireID QuestScoreID AnswerID AnswerValue DateCompleted 10005772 7615 273308 0 -500 50:59.7 10005789 11124 267771 0 -500 25:16.0 10005798 11129 259150 0 -500 44:41.1 10005798 11129 261032 0 -500 01:21.3 10005807 8214 237464 0 -500 59:18.1 10005819 11142 293317 0 -500 43:09.4

i have marked the duplicate values in red that i want to eliminate.

I need to eliminate only the record with the oldest date, and keep the newer one.

Hopefully when i've gone for my SQL course then i will only be half a moron :-) he he

Thanks for helping me out guys.

Kind Regards

Carel Greaves

|||

Wow, talk about a 'bolixed' up database...

I'm still confused, in your query above, you have a WHERE clause that indicates the results should be filtered down to QuestionaireID = 85, but your duplicate records are for memQuestionaireID 11129. What is the difference between QuestionaireID and memQuestionaireID? How did that happen?

Also, the DateCompleted is obviously not a date, but a number of hours. Is that the number of hours since some event? How are the hours determined?

Which row is the 'newer one'?

It would be very helpful, and keep us from wasting our time, if you would post the DDL for each of the tables, as well as some sample data for each table (in the form of INSERT statements.)

As you seen, you have folks sincerely attempting to help you, and to this point wasting our time because we just don't have the 'full' picture. You've inherited a very 'odd' database, and you need to help us so we can better help you.

|||

Here is sample data in the Tables.

Member Table

med_aid MemberID ClientID Name Surname Username Password NULL 10000000 1 NiftyName NiftySurname his test NULL 10000001 5 NiftyTest2 Surname2 nifty@.ybo.oa JBCYWO NULL 10000002 5 KeithTest WilsonTest willi willi

memQuestionaires Table

memQuestionaireID MemberID QuestionaireID LastPage Paused Complete MaxPages Sent LastSaved InitMail ReceiveMail UpdateDate 871 10000000 85 1 0 0 1 1 0 1 1 2007/02/2506:23 872 10000001 85 1 0 0 1 1 0 1 1 2007/02/2506:23 873 10000002 85 1 0 0 1 1 0 1 1 2007/02/2506:23

QuestScore Table

QuestScoreID memQuestionaireID AnswerID AnswerValue SectionID Page QuestType AnswerText AnswerShort Updatedate DefaultValue DateCompleted 4641 871 0 -9 361 1 9 NULL NULL

2007/02/0722:31

4642 872 0 -5 362 1 6 NULL NULL

2007/02/0722:31

4643 873 0 0 345 1 2 NULL NULL

2007/02/0722:31

When i executed this statement, i only used these three tables.

SELECT me.MemberID, /*me.QuestionaireID,*/ q.memQuestionaireID, q.*, me.*
FROM questscore q, memQuestionaires m, Members me
WHERE q.memquestionaireID = m.memquestionaireID
AND m.MemberiD = me.memberid
AND me.ClientID = 124
ANd m.questionaireid = 85
AND q.sectionid >= 1000000
ORDER BY me.memberID

I am only trying to remove the duplicate records for where ClientID = 124

AND QuestionaireID = 85

as for the 1000000, there it is just to narrow down the search for myself, there are a lot of duplicates.

There are a lot of members -> Members Table

Each member can fill in a number of questionaires -> QuestionaireID on memQuestionaires Table in this case 85

And all the questionaire's data is stored in the QuestScore Table which is linked to the memQuestionaires table using the memQuestionairesID.

There seems to be duplicate values in the QuestScore Table referring to the members.

The date completed is an actual DATETIME field in the database, i used excel to transfer the data to here. The newer date will be a actual date with the time in the database i.e. 2007/05/23 02:05:11

All the fields get inserted into the database via a application which i haven't seen.

I'm new to the company and am trying to figure a lot of it out for myself too.

If there is any more information you need, please don't hesitate to ask.

|||try this one,

DECLARE @.QuestScore TABLE (
QuestScoreID int
, memQuestionaireID int
, DateCompleted smalldatetime
)

DECLARE @.MemQuestionaire TABLE (
MemQuestionaireID int
, MemberID int
, QuestionaireID int
, UpdateDate smalldatetime
)

INSERT
INTO @.QuestScore
SELECT 1, 1, dateadd(day,-1, getdate()) UNION ALL
SELECT 2, 2, dateadd(day,-1, getdate()) UNION ALL
SELECT 3, 3, dateadd(day,-3, getdate())

INSERT
INTO @.MemQuestionaire
SELECT 1, 1, 4, dateadd(day,-1, getdate()) UNION ALL
SELECT 2, 1, 4, dateadd(day,-1, getdate()) UNION ALL
SELECT 3, 2, 5, dateadd(day,-3, getdate())

select *
from @.memquestionaire

select *
from @.questscore

declare @.tobedeleted table( -- create a table var for the data to be deleted
memquestionaireid int
, datecompleted smalldatetime
)

insert
into @.tobedeleted -- the memquestionaireid and datecompleted to be deleted
select MIN(qq.memquestionaireid) as memquestionaireid -- just in case if they have the same date, get the lowest id
, mq.datecompleted
from @.questscore qq inner join
(
select m.memberid
, m.questionaireid
, MIN(q.datecompleted) as datecompleted -- delete the lowest date
from @.memquestionaire m inner join
@.questscore q on m.memquestionaireid = q.memquestionaireid
group by
m.memberid
, m.questionaireid
having count(m.memberid) > 1 -- member having more than 1 quest score
) mq on qq.datecompleted = mq.datecompleted
group by
mq.datecompleted

delete @.questscore -- delete in questscore
from @.questscore q inner join
@.tobedeleted dq on q.memquestionaireid = dq.memquestionaireid
and q.datecompleted = dq.datecompleted

delete @.memquestionaire -- delete in memquestionaire
from @.memquestionaire m inner join
@.tobedeleted dq on m.memquestionaireid = dq.memquestionaireid

select *
from @.questscore

select *
from @.memquestionaire|||

Thanks, that's awesome, that's why i come to you guys for help.

Another quick question, that i think it might be easier to do, i just can't do it. (I just thought of it now)

If i added memQuestionaireID and SectionID together to create a unique field, and then filter out the duplicates by taking out the old dates within the duplicates it might be easier, but how would i go about accomplishing this?

I don't know how to compare the dates against itself to get keep the new date and get rid of the old date?

sorry for all of this i'm just trying to learn from all the things myself?

|||

Thanks, that was a start.

Here is an example of the 'best' form for DDL and sample data. In this fashion, each person that wants to try and help you can just run this code and have 'your' problem to work with. Without this, folks are turned away because it takes so much time and effort to duplicate your situation -and helps to keep us from running off on tangents that don't really help you.

Now if you would only add to the sample data so that examples of the duplicate records you want to find and delete are represented, we 'should' be able to help you.

Code Snippet


DECLARE @.Members table
( med_aid int,
MemberID int,
ClientID int,
Name varchar(20),
Surname varchar(20),
Username varchar(100),
Password varchar(20)
)


INSERT INTO @.Members VALUES ( NULL, 10000000, 1, 'NiftyName', 'NiftySurname', 'his', 'test' )
INSERT INTO @.Members VALUES ( NULL, 10000001, 5, 'NiftyTest2', 'Surname2', 'nifty@.ybo.oa', 'JBCYWO' )
INSERT INTO @.Members VALUES ( NULL, 10000002, 5, 'KeithTest', 'WilsonTest', 'willi', 'willi' )


DECLARE @.memQuestionaires table
( memQuestionaireID int,
MemberID int,
QuestionaireID int,
LastPage int,
Paused int,
Complete int,
MaxPages int,
Sent int,
LastSaved int,
InitMail int,
ReceiveMail int,
UpdateDate datetime
)


INSERT INTO @.memQuestionaires VALUES ( 871, 10000000, 85, 1, 0, 0, 1, 1, 0, 1, 1, '2007/02/25 06:23' )
INSERT INTO @.memQuestionaires VALUES ( 872, 10000001, 85, 1, 0, 0, 1, 1, 0, 1, 1, '2007/02/25 06:23' )
INSERT INTO @.memQuestionaires VALUES ( 873, 10000002, 85, 1, 0, 0, 1, 1, 0, 1, 1, '2007/02/25 06:23' )


DECLARE @.QuestScore table
( QuestScoreID int,
memQuestionaireID int,
AnswerID int,
AnswerValue int,
SectionID int,
Page int,
QuestType int,
AnswerText int,
AnswerShort int,
Updatedate int,
DefaultValue int,
DateCompleted datetime
)


INSERT INTO @.QuestScore VALUES ( 4641, 871, 0, -9, 361, 1, 9, '', '', NULL, NULL, '2007/02/07 22:31' )
INSERT INTO @.QuestScore VALUES ( 4642, 872, 0, -5, 362, 1, 6, '', '', NULL, NULL, '2007/02/07 22:31' )
INSERT INTO @.QuestScore VALUES ( 4643, 873, 0, 0, 345, 1, 2, '', '', NULL, NULL, '2007/02/07 22:31' )

|||very clean example arnie.. i wonder how your desk looks like |||

Sorry i'm sending you guys on a wild goose chase like this, and thanks a lot Arnie, i'd buy you a case of beer if i knew where you were.

I'm really learning a lot from this!

I just thought that it might be easier, if i had to add the memQuestionaireID and SectionID together in the QuestScore table to get a single unique value and remove the duplicates based on the newer unique value according to the Date, i.e. Only removing the old dates.

Sorry about this, i only saw it now.

That way i'm only using a single table.

|||

Thanks,

And my desk is under the stuff somewhere, or so it was a few months ago...

And UPS serves Portland, OR! But it would be much more enjoyable to share, so if you can wrangle it, start lobbying management to let you come to the PASS (Professional Association for SQL Server) Conference in September. (http://sqlpass.org/)

|||

Cummon guys, i'm a newbie at SQL Server in South Africa and it's late at night and my CEO's say i can't go home until i get this problem sorted out.

Please could you guys help me out quick?

Eliminating (x row(s) affected) message

I am sure this has been asked before. But, in searching for this message I
get a number of returns.
How do I eliminate the (x row(s) affected) message from queries in a stored
procedure?
Thank you.Kevin wrote on Fri, 2 Jun 2006 07:39:02 -0700:

> I am sure this has been asked before. But, in searching for this message I
> get a number of returns.
> How do I eliminate the (x row(s) affected) message from queries in a
> stored procedure?
> Thank you.
Use SET NOCOUNT ON at the start of the stored proc.
Dan|||Try:
SET NOCOUNT ON
... at the beginning of each proc.
Tom
----
Thomas A. Moreau, BSc, PhD, MCSE, MCDBA
SQL Server MVP
Toronto, ON Canada
.
"Kevin Burton" <KevinBurton@.discussions.microsoft.com> wrote in message
news:741BD8D2-F958-471B-95FD-F694ACFDD0AD@.microsoft.com...
I am sure this has been asked before. But, in searching for this message I
get a number of returns.
How do I eliminate the (x row(s) affected) message from queries in a stored
procedure?
Thank you.

Eliminating (x row(s) affected) message

I am sure this has been asked before. But, in searching for this message I
get a number of returns.
How do I eliminate the (x row(s) affected) message from queries in a stored
procedure?
Thank you.Kevin wrote on Fri, 2 Jun 2006 07:39:02 -0700:
> I am sure this has been asked before. But, in searching for this message I
> get a number of returns.
> How do I eliminate the (x row(s) affected) message from queries in a
> stored procedure?
> Thank you.
Use SET NOCOUNT ON at the start of the stored proc.
Dan|||Try:
SET NOCOUNT ON
... at the beginning of each proc.
--
Tom
----
Thomas A. Moreau, BSc, PhD, MCSE, MCDBA
SQL Server MVP
Toronto, ON Canada
.
"Kevin Burton" <KevinBurton@.discussions.microsoft.com> wrote in message
news:741BD8D2-F958-471B-95FD-F694ACFDD0AD@.microsoft.com...
I am sure this has been asked before. But, in searching for this message I
get a number of returns.
How do I eliminate the (x row(s) affected) message from queries in a stored
procedure?
Thank you.

Eliminating "Nan" in reports

I'd like to eliminate "Nan" in my reports and I've tried various versions of
IIF, but keep getting an error message.
This is what I want/have:
= Sum(Fields!SubWin.Value, "WinRate1_All")+ Sum(Fields!PrimeWin.Value,
"WinRate1_All")/ Sum(Fields!SubWin.Value, "WinRate1_All")+
Sum(Fields!PrimeWin.Value, "WinRate1_All")+ Sum(Fields!SubLoss.Value,
"WinRate1_All")+ Sum(Fields!PrimeLoss.Value, "WinRate1_All")
Thanks for your help!You are probably dividing by zero. Try something like
=IIF(Sum(Fields!SubWin.Value, "WinRate1_All")+
Sum(Fields!PrimeWin.Value, "WinRate1_All")+ Sum(Fields!SubLoss.Value,
"WinRate1_All")+ Sum(Fields!PrimeLoss.Value, "WinRate1_All") <> 0,
Sum(Fields!SubWin.Value, "WinRate1_All")+ Sum(Fields!PrimeWin.Value,
"WinRate1_All")/ Sum(Fields!SubWin.Value, "WinRate1_All")+
Sum(Fields!PrimeWin.Value, "WinRate1_All")+ Sum(Fields!SubLoss.Value,
"WinRate1_All")+ Sum(Fields!PrimeLoss.Value, "WinRate1_All"),0)
This way you are only performing the division if the denominator is not
zero. If the denominator is zero, then return a zero.|||That worked perfectly!
Thank you so very much for your response!
"dba56" wrote:
> You are probably dividing by zero. Try something like
> =IIF(Sum(Fields!SubWin.Value, "WinRate1_All")+
> Sum(Fields!PrimeWin.Value, "WinRate1_All")+ Sum(Fields!SubLoss.Value,
> "WinRate1_All")+ Sum(Fields!PrimeLoss.Value, "WinRate1_All") <> 0,
> Sum(Fields!SubWin.Value, "WinRate1_All")+ Sum(Fields!PrimeWin.Value,
> "WinRate1_All")/ Sum(Fields!SubWin.Value, "WinRate1_All")+
> Sum(Fields!PrimeWin.Value, "WinRate1_All")+ Sum(Fields!SubLoss.Value,
> "WinRate1_All")+ Sum(Fields!PrimeLoss.Value, "WinRate1_All"),0)
> This way you are only performing the division if the denominator is not
> zero. If the denominator is zero, then return a zero.
>|||That was perfect!
Thank you very much for your response!
"dba56" wrote:
> You are probably dividing by zero. Try something like
> =IIF(Sum(Fields!SubWin.Value, "WinRate1_All")+
> Sum(Fields!PrimeWin.Value, "WinRate1_All")+ Sum(Fields!SubLoss.Value,
> "WinRate1_All")+ Sum(Fields!PrimeLoss.Value, "WinRate1_All") <> 0,
> Sum(Fields!SubWin.Value, "WinRate1_All")+ Sum(Fields!PrimeWin.Value,
> "WinRate1_All")/ Sum(Fields!SubWin.Value, "WinRate1_All")+
> Sum(Fields!PrimeWin.Value, "WinRate1_All")+ Sum(Fields!SubLoss.Value,
> "WinRate1_All")+ Sum(Fields!PrimeLoss.Value, "WinRate1_All"),0)
> This way you are only performing the division if the denominator is not
> zero. If the denominator is zero, then return a zero.
>

Eliminate top and bottom percent of records

I have a temp table that is being populated and below is a representation of the data.Basically, for each mile_code and rep combination there will be ‘X’ number of records.(In the example below there are 30 records.)What I need to do is eliminate the top and bottom 10% (This could be any percentage and I want to be able to change that easily if I have to.) of DeltaT for each mile_code and rep combination and then average the remaining values in DeltaT for each mile_code and rep combination.

I’ve tried a bunch of different approaches and none give me what I’m looking for.Any ideas or solutions?

mile_code DeltaT rep close_date

Approach 1 Coleman 6/5/2007 16:52

Approach 1 COLEMAN 6/5/2007 14:32

Approach 1 COLEMAN 6/5/2007 9:26

Approach 21 COLEMAN 5/31/2007 14:07

Approach 14 COLEMAN 5/31/2007 9:53

Approach 7 Coleman 5/30/2007 16:12

Approach 36 COLEMAN 5/29/2007 15:53

Approach 8 COLEMAN 5/29/2007 13:54

Approach 8 COLEMAN 5/29/2007 13:23

Approach 8 COLEMAN 5/29/2007 9:00

… (20 more records)

Interview 1 Coleman 6/5/2007 16:52

Interview 1 COLEMAN 6/5/2007 14:32

Interview 1 COLEMAN 6/5/2007 9:26

Interview 8 COLEMAN 5/29/2007 13:54

Interview 8 COLEMAN 5/29/2007 9:00

Interview 14 COLEMAN 5/25/2007 8:37

Interview 61 COLEMAN 5/24/2007 10:57

Interview 14 COLEMAN 5/23/2007 9:32

Interview 26 COLEMAN 5/18/2007 9:39

Interview 23 Coleman 5/14/2007 13:25

… (20 more records)

Followed by 30 each of the other 3 milecodes for Coleman

Approach 0 DOLAN 6/6/2007 9:27

Approach 1 DOLAN 6/5/2007 13:39

Approach 1 DOLAN 6/5/2007 12:20

Approach 1 DOLAN 6/5/2007 11:37

Approach 92 DOLAN 6/5/2007 11:12

Approach 91 DOLAN 6/5/2007 9:44

Approach 2 Dolan 6/4/2007 11:53

Approach 2 DOLAN 6/4/2007 11:11

Approach 5 Dolan 6/1/2007 16:07

Approach 177 DOLAN 6/1/2007 10:58

… (20 more records)

Interview 96 DOLAN 6/5/2007 16:38

Interview 167 DOLAN 6/5/2007 14:24

Interview 2 Dolan 6/4/2007 14:54

Interview 2 DOLAN 6/4/2007 11:11

Interview 5 Dolan 6/1/2007 16:07

Interview 27 DOLAN 6/1/2007 14:48

Interview 323 DOLAN 6/1/2007 13:49

Interview 15 Dolan 6/1/2007 11:48

Interview 224 DOLAN 5/31/2007 16:40

Interview 126 Dolan 5/30/2007 15:02

… (20 more records)

Here is one way.

Code Snippet

select *

from tb t1

where t1.deltat in (select top 90 percent t2.deltat from tb t2 where t2.mile_code=t1.mile_code order by t2.deltat asc)

and t1.deltat in (select top 90 percent t3.deltat from tb t3 where t3.mile_code=t1.mile_code order by t3.deltat desc)

|||

What should be the criteria to eliminate the top and bottom 10 % of [DeltaT], other than [mile_code] and [rep]?

Should we sort by [mile_code], [rep] and [DeltaT] and then eliminate the top and bottom 10%?

AMB

|||I thought of this on the way to work this morning and realized the way I described what I wanted was incorrect. If the data is sorted by DeltaT for each mile_code and rep combination then in this example (30 records for each group and 10% gone from top to bottom) the first 3 records and last 3 records should be removed. So it doesn't matter what the values of DeltaT are. If there were fractions caused by the number of records chosen and/or the percentage then I'd want it to just round to the nearest whole number.|||

Try:

Code Snippet

createtable #t (

mile_code varchar(25)notnull,

rep varchar(25)notnull,

deltat intnotnull

)

go

declare @.i int, @.j int

declare @.mile_code varchar(25)

declare @.rep varchar(25)

set @.j = 1

while @.j <= 2

begin

set @.rep =

case

when @.j = 1 then'COLEMAN'

when @.j = 2 then'DOLAN '

end

set @.i = 1

while @.i <= 60

begin

set @.mile_code =

case

when @.i between 1 and 30 then'Approach'

when @.i between 31 and 60 then'Interview'

end

insertinto #t values(@.mile_code, @.rep,(abs(checksum(newid()))% 100)+ 1)

set @.i = @.i + 1

end

set @.j = @.j + 1

end

go

declare @.percent int

set @.percent = 10

;with cte

as

(

select

mile_code,

rep,

deltat,

((row_number()over(partitionby mile_code, rep orderby deltat))* 100.00)/(count(*)over(partitionby mile_code, rep))as pct

from

#t

)

select

mile_code,

rep,

avg(deltat)as avg_deltat

from

cte

where

pct between @.percent and (100.00 - @.percent)

groupby

mile_code,

rep

orderby

mile_code,

rep

go

droptable #t

go

AMB

|||

I am not perfectly clear about what you are trying to do, but here is a solution that removes the top 10 percent using a CTE and a couple of aggregates. Note this will only work in 2005...

;with rowCounts as (

select row_number() over (partition by mile_code,rep order by closeDate) as rowNum, --gives a row number per group, ordered by close date
count(*) over (partition by mile_code,rep) as totalRows, --gets total rows by mile_code and rep. This is how I do the work later
mile_code,rep, deltaT, closeDate
from #t) --this is the same as his #t, plus the close date. I will repeat it after this explanation

select mile_code,rep, deltaT, closeDate, rowNum, totalRows
from rowCounts
where rowNum > totalRows * (@.percent / 100.0) --removes the bottom 10 percent
and rowNum < totalRows - (totalRows * (@.percent / 100.0)) -- removes the top

The whole code (thanks to hunchback):

set nocount on
create table #t (
mile_code varchar(25) not null,
rep varchar(25) not null,
deltat int not null,
closeDate datetime
)
go
declare @.i int, @.j int
declare @.mile_code varchar(25)
declare @.rep varchar(25)
declare @.closeDate datetime
set @.j = 1
while @.j <= 2
begin
set @.rep =
case
when @.j = 1 then 'COLEMAN'
when @.j = 2 then 'DOLAN '
end
set @.i = 1

while @.i <= 60
begin
set @.mile_code =
case
when @.i between 1 and 22 then 'Approach'
when @.i between 23 and 60 then 'Interview'
end
insert into #t values(@.mile_code, @.rep, (abs(checksum(newid())) % 100) + 1,
dateadd(day,(abs(checksum(newid())) % 100) + 1,'20070101'))
set @.i = @.i + 1
end
set @.j = @.j + 1
end

go

declare @.percent int
set @.percent = 10
;with rowCounts as (
select row_number() over (partition by mile_code,rep order by closeDate) as rowNum,
count(*) over (partition by mile_code,rep) as totalRows,
mile_code,rep, deltaT, closeDate
from #t)
select mile_code,rep, deltaT, closeDate, rowNum, totalRows
from rowCounts
where rowNum > totalRows * (@.percent / 100.0)
and rowNum < totalRows - (totalRows * (@.percent / 100.0))

drop table #t

go

|||here's another alternative,

again, thanks to hunchback for the test data

create table #t (
mile_code varchar(25) not null,
rep varchar(25) not null,
deltat int not null
)
go

declare @.i int, @.j int
declare @.mile_code varchar(25)
declare @.rep varchar(25)
set @.j = 1

while @.j <= 2
begin
set @.rep =
case
when @.j = 1 then 'COLEMAN'
when @.j = 2 then 'DOLAN '
end

set @.i = 1

while @.i <= 60
begin
set @.mile_code =
case
when @.i between 1 and 30 then 'Approach'
when @.i between 31 and 60 then 'Interview'
end

insert into #t values(@.mile_code, @.rep, (abs(checksum(newid())) % 100) + 1)
set @.i = @.i + 1
end

set @.j = @.j + 1
end
go

SELECT IDENTITY(int,1,1) AS row_number
, *
INTO #a
FROM #t
ORDER BY
mile_code
, rep
, deltat

DECLARE @.percnt int
SET @.percnt = 10

SELECT a.mile_code
, a.rep
, a.deltat
, row_number - minrow AS grp_row_number
, maxrow * (@.percnt/100.0) as countpergrp
FROM #a a INNER JOIN
(
SELECT mile_code
, rep
, MIN(row_number) - 1 AS minrow
, COUNT(*) AS maxrow
FROM #a
GROUP BY
mile_code
, rep
) b on a.mile_code = b.mile_code
and a.rep = b.rep
WHERE row_number - minrow between (maxrow * (@.percnt/100.0)) and (maxrow - (maxrow * (@.percnt/100.0)))

DROP TABLE #t
DROP TABLE #a
GO|||

Thanks everyone. Still have a little more work to do to get the final result before putting it into producton (Have to add some other calculations but will do that by joining in another temp table or something.) but all of your input helped a lot. Here is what I ended up with thanks to your help:

DECLARE @.percent int,

@.recentindex int

SET @.percent = 10

SET @.recentindex = 45

Then crate temp table #miledata to get the last 'x' number of records for each rep in each mile_code that they have completed. Do this based off of the @.recentindex variable.

Then use either of the following to get results. Haven't decided which I'll put into production yet.

OPTION A

/*This creates a common table expression to run our aggregate query against.*/

;WITH cte_AveRecords AS

(

SELECT row_number() OVER(partition BY mile_code, rep ORDER BY deltaT) AS rowNum, --gives a row number per group, ordered by deltaT

COUNT(*) OVER (partition BY mile_code, rep) AS totalRows, --gets total rows by mile_code and rep.

mile_code,

rep,

deltaT,

recentindex

FROM#miledata

)

/*This deterimines the average per mile_code and rep by selecting the rows that fall above the low percent and less than or equal to the upper percent.*/

SELECT c.mile_code,

c.rep,

CAST(AVG(CAST(a.deltaT AS Decimal(5,1))) AS Decimal(5,1)) AS AvgTime

FROM

(

SELECT mile_code, rep, deltaT, recentindex, rowNum, totalRows

FROM cte_AveRecords

WHERE rowNum > totalRows * (@.percent / 100.0) --removes the bottom 'x' percent

AND rowNum < totalRows - (totalRows * (@.percent / 100.0)) -- removes the top

)a

JOIN cte_AveRecords c ON a.rep = c.rep AND a.mile_code = c.mile_code

GROUP BY c.mile_code,

c.rep

ORDER BY c.rep,

c.mile_code

OPTION B

/*This creates a common table expression to run our aggregate query against.*/

;WITH cte_AveRecords AS

(

SELECT

mile_code,

rep,

deltat,

/*This assigns a percentage to each row number in the mile_code and rep group.See explaination A below*/

((row_number() over(partition by mile_code, rep order by deltat)) * 100.00) / (count(*) over(partition by mile_code, rep)) as pct

FROM #miledata

)

/*This deterimines the average per mile_code and rep based on what we populated the CTE with..*/

SELECT mile_code,

rep,

CAST(AVG(CAST(deltat AS Decimal(5,1))) AS Decimal(5,1)) AS avg_deltat

FROM cte_AveRecords

WHERE pct > @.percent and pct < (100.00 - @.percent)

GROUP BY mile_code,

rep

ORDER BY rep,

mile_code

GO

eliminate the Return charactor in a column

Hi,
We have a column defined as ntext for a column and we need to transfer this
column to oralce platform. Before transferring, we convert the data to
varchar. Some of data content "RETURN" charactor, and it caused issue for
providing report or printing.
Please suggest a way to eliminate the "RETURN" charactor in this column and
we can keep all the data in the nice order. Thanks.
Best Regards,
Lynnhttp://www.aspfaq.com/2445
"Lynn" <Lynn@.discussions.microsoft.com> wrote in message
news:3F6073C5-BF70-4DFA-BAE0-DD4EFA30A8AA@.microsoft.com...
> Hi,
> We have a column defined as ntext for a column and we need to transfer
> this
> column to oralce platform. Before transferring, we convert the data to
> varchar. Some of data content "RETURN" charactor, and it caused issue for
> providing report or printing.
> Please suggest a way to eliminate the "RETURN" charactor in this column
> and
> we can keep all the data in the nice order. Thanks.
> --
> Best Regards,
> Lynn

Friday, February 24, 2012

Eliminate Rows with Redundant Columns

I would like my query to return the KeyID from row 4 but from only one of
the first 3 rows where the address data is redundant. Since use of the key
eliminates the DISTINCT operator, is there another method?
DECLARE @.tAddress TABLE (KeyID int, Address varchar(10), City varchar(10),
ST varchar(2))
INSERT @.tAddress
SELECT 1, '100 Main', 'Boston', 'MA'
UNION ALL
SELECT 2, '100 Main', 'Boston', 'MA'
UNION ALL
SELECT 3, '100 Main', 'Boston', 'MA'
UNION ALL
SELECT 4, '200 Main', 'Boston', 'MA'
Thanks!SELECT Address, City, ST, MIN(KeyID)
FROM @.tAddress
GROUP BY Address, City, ST
Or, if you really only wanted the ID column:
SELECT MIN(KeyID)
FROM @.tAddress
GROUP BY Address, City, ST
Roy Harvey
Beacon Falls, CT
On Wed, 14 Jun 2006 09:47:32 -0700, "Mike Harbinger"
<MikeH@.Cybervillage.net> wrote:

>I would like my query to return the KeyID from row 4 but from only one of
>the first 3 rows where the address data is redundant. Since use of the key
>eliminates the DISTINCT operator, is there another method?
>DECLARE @.tAddress TABLE (KeyID int, Address varchar(10), City varchar(10),
>ST varchar(2))
>INSERT @.tAddress
>SELECT 1, '100 Main', 'Boston', 'MA'
>UNION ALL
>SELECT 2, '100 Main', 'Boston', 'MA'
>UNION ALL
>SELECT 3, '100 Main', 'Boston', 'MA'
>UNION ALL
>SELECT 4, '200 Main', 'Boston', 'MA'
>Thanks!
>|||Simple and elegant; I should have seen that. Many thanks Roy!
"Roy Harvey" <roy_harvey@.snet.net> wrote in message
news:csf092h8em6sabqjre5jf7jsbmrhroovtc@.
4ax.com...
> SELECT Address, City, ST, MIN(KeyID)
> FROM @.tAddress
> GROUP BY Address, City, ST
> Or, if you really only wanted the ID column:
> SELECT MIN(KeyID)
> FROM @.tAddress
> GROUP BY Address, City, ST
> Roy Harvey
> Beacon Falls, CT
> On Wed, 14 Jun 2006 09:47:32 -0700, "Mike Harbinger"
> <MikeH@.Cybervillage.net> wrote:
>

Eliminate records

Hello,
The output of the following query returns two identical
records, i need to eliminate all records that are
identical but if i use the "distinct" before the substring
function its returned one error.
This is the query that i'm using:
select EvClassDesc,
STime,
substring(Textdata,patindex('%exec%',Tex
tdata),217)
as [TextData],
objid
into DestinationT
from OriginalT
where objid = 1111111 and EvClassDesc = 'SP:Star'
Thanks,
Best regardsDid you try using:
Select DISTINCT Derived.* into DestinationT FROM
( select EvClassDesc,
STime,
substring(Textdata,patindex('%exec%',Tex
tdata),217) as [TextData],
objid
from OriginalT
where objid = 1111111 and EvClassDesc = 'SP:Star') Derived
... (untested)
WIll this help?
HTH,
Vinod Kumar
MCSE, DBA, MCAD, MCSD
http://www.extremeexperts.com
Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp
"CC&JM" <anonymous@.discussions.microsoft.com> wrote in message
news:17fe001c44a19$f48dfb80$a301280a@.phx
.gbl...
> Hello,
> The output of the following query returns two identical
> records, i need to eliminate all records that are
> identical but if i use the "distinct" before the substring
> function its returned one error.
> This is the query that i'm using:
> select EvClassDesc,
> STime,
> substring(Textdata,patindex('%exec%',Tex
tdata),217)
> as [TextData],
> objid
> into DestinationT
> from OriginalT
> where objid = 1111111 and EvClassDesc = 'SP:Star'
> Thanks,
> Best regards|||Thanks Vinod

>--Original Message--
>Did you try using:
>Select DISTINCT Derived.* into DestinationT FROM
>( select EvClassDesc,
> STime,
> substring(Textdata,patindex('%
exec%',Textdata),217) as [TextData],
> objid
> from OriginalT
> where objid = 1111111 and EvClassDesc = 'SP:Star')
Derived
>... (untested)
>WIll this help?
>--
>HTH,
>Vinod Kumar
>MCSE, DBA, MCAD, MCSD
>http://www.extremeexperts.com
>Books Online for SQL Server SP3 at
>http://www.microsoft.com/sql/techin...ctdoc/2000/book
s.asp
>
>"CC&JM" <anonymous@.discussions.microsoft.com> wrote in
message
> news:17fe001c44a19$f48dfb80$a301280a@.phx
.gbl...
substring[vbcol=seagreen]
exec%',Textdata),217)[vbcol=seagreen]
>
>.
>

Eliminate records

Hello,
The output of the following query returns two identical
records, i need to eliminate all records that are
identical but if i use the "distinct" before the substring
function its returned one error.
This is the query that i'm using:
select EvClassDesc,
STime,
substring(Textdata,patindex('%exec%',Textdata),217 )
as [TextData],
objid
into DestinationT
from OriginalT
where objid = 1111111 and EvClassDesc = 'SP:Star'
Thanks,
Best regards
Did you try using:
Select DISTINCT Derived.* into DestinationT FROM
( select EvClassDesc,
STime,
substring(Textdata,patindex('%exec%',Textdata),217 ) as [TextData],
objid
from OriginalT
where objid = 1111111 and EvClassDesc = 'SP:Star') Derived
... (untested)
WIll this help?
HTH,
Vinod Kumar
MCSE, DBA, MCAD, MCSD
http://www.extremeexperts.com
Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
"CC&JM" <anonymous@.discussions.microsoft.com> wrote in message
news:17fe001c44a19$f48dfb80$a301280a@.phx.gbl...
> Hello,
> The output of the following query returns two identical
> records, i need to eliminate all records that are
> identical but if i use the "distinct" before the substring
> function its returned one error.
> This is the query that i'm using:
> select EvClassDesc,
> STime,
> substring(Textdata,patindex('%exec%',Textdata),217 )
> as [TextData],
> objid
> into DestinationT
> from OriginalT
> where objid = 1111111 and EvClassDesc = 'SP:Star'
> Thanks,
> Best regards
|||Thanks Vinod

>--Original Message--
>Did you try using:
>Select DISTINCT Derived.* into DestinationT FROM
>( select EvClassDesc,
> STime,
> substring(Textdata,patindex('%
exec%',Textdata),217) as [TextData],
> objid
> from OriginalT
> where objid = 1111111 and EvClassDesc = 'SP:Star')
Derived
>... (untested)
>WIll this help?
>--
>HTH,
>Vinod Kumar
>MCSE, DBA, MCAD, MCSD
>http://www.extremeexperts.com
>Books Online for SQL Server SP3 at
>http://www.microsoft.com/sql/techinf...tdoc/2000/book
s.asp
>
>"CC&JM" <anonymous@.discussions.microsoft.com> wrote in
message[vbcol=seagreen]
>news:17fe001c44a19$f48dfb80$a301280a@.phx.gbl...
substring[vbcol=seagreen]
exec%',Textdata),217)
>
>.
>

Eliminate records

Hello,
The output of the following query returns two identical
records, i need to eliminate all records that are
identical but if i use the "distinct" before the substring
function its returned one error.
This is the query that i'm using:
select EvClassDesc,
STime,
substring(Textdata,patindex('%exec%',Textdata),217)
as [TextData],
objid
into DestinationT
from OriginalT
where objid = 1111111 and EvClassDesc = 'SP:Star'
Thanks,
Best regardsDid you try using:
Select DISTINCT Derived.* into DestinationT FROM
( select EvClassDesc,
STime,
substring(Textdata,patindex('%exec%',Textdata),217) as [TextData],
objid
from OriginalT
where objid = 1111111 and EvClassDesc = 'SP:Star') Derived
... (untested)
WIll this help?
--
HTH,
Vinod Kumar
MCSE, DBA, MCAD, MCSD
http://www.extremeexperts.com
Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinfo/productdoc/2000/books.asp
"CC&JM" <anonymous@.discussions.microsoft.com> wrote in message
news:17fe001c44a19$f48dfb80$a301280a@.phx.gbl...
> Hello,
> The output of the following query returns two identical
> records, i need to eliminate all records that are
> identical but if i use the "distinct" before the substring
> function its returned one error.
> This is the query that i'm using:
> select EvClassDesc,
> STime,
> substring(Textdata,patindex('%exec%',Textdata),217)
> as [TextData],
> objid
> into DestinationT
> from OriginalT
> where objid = 1111111 and EvClassDesc = 'SP:Star'
> Thanks,
> Best regards|||Thanks Vinod
>--Original Message--
>Did you try using:
>Select DISTINCT Derived.* into DestinationT FROM
>( select EvClassDesc,
> STime,
> substring(Textdata,patindex('%
exec%',Textdata),217) as [TextData],
> objid
> from OriginalT
> where objid = 1111111 and EvClassDesc = 'SP:Star')
Derived
>... (untested)
>WIll this help?
>--
>HTH,
>Vinod Kumar
>MCSE, DBA, MCAD, MCSD
>http://www.extremeexperts.com
>Books Online for SQL Server SP3 at
>http://www.microsoft.com/sql/techinfo/productdoc/2000/book
s.asp
>
>"CC&JM" <anonymous@.discussions.microsoft.com> wrote in
message
>news:17fe001c44a19$f48dfb80$a301280a@.phx.gbl...
>> Hello,
>> The output of the following query returns two identical
>> records, i need to eliminate all records that are
>> identical but if i use the "distinct" before the
substring
>> function its returned one error.
>> This is the query that i'm using:
>> select EvClassDesc,
>> STime,
>> substring(Textdata,patindex('%
exec%',Textdata),217)
>> as [TextData],
>> objid
>> into DestinationT
>> from OriginalT
>> where objid = 1111111 and EvClassDesc = 'SP:Star'
>> Thanks,
>> Best regards
>
>.
>

Eliminate duplicates from only one column

--
Randycan you expand your query...'
or try this...
select distinct(one_column) from table_with_one_column|||GROUP BY clause
"randy1200" <randy1200@.newsgroups.nospam> wrote in message
news:A063AE9C-6B3A-4FB8-B6D4-2BEFC7A5FF41@.microsoft.com...
> --
> Randy

Eliminate duplicates from datasource

Hi All,

I have a SP, which i run inside a for loop.

I am running the SP for all the products in a listbox.

So for each product i am having the feature extracted through the SP

But some features are the same for 2, 3 products.

So in the datatable, i am getting the featrues repeated.

IS there any way to eliminate the duplicates from datatable, from server side?

Hope i am not confusing.

Eg: product1 -- test1, test2, test3

product2 -- test2, test4

so the datatable has -- test1, test2, test3, test2, test4

-- i have to eliminate one test2 from this.

Any ideas?

Thanks

Can you just use a SELECT DISTINCT in your stored procedure or whatever you generate your datatable from?

Jeff

eliminate duplicate results

Hi,

Here is a brief description of a problem I'm currently experiencing:

We have a phone line that employees can call to report their status. There is no limit on how often they can call. Each time a call is completed a record is inserted into a table with the following info: the employees unique id, theire current status, and the location of an audio file should they have chosen to leave a message. When I query this table I'd like to only return the most recent call for each employee, disregarding all the rest.

So...

When I do a simple query like:

SELECT MAX(DISTINCT r.ircDateStamp),
r.entityId
FROM tblInboundRollCall r
WHERE (r.ircLineId = 13) AND (r.ircDateStamp BETWEEN startDate AND endDate)
GROUP BY r.entityId

I get only one record, the most recent one, for each employee who has called. However as soon as I try to get another field I get one record for each different value in that field.

a second query:

SELECT MAX(DISTINCT r.ircDateStamp),
r.entityId,
r.ircStatus
FROM tblInboundRollCall r
WHERE (r.ircLineId = 13) AND (r.ircDateStamp BETWEEN startDate AND endDate)
GROUP BY r.entityId, r.ircStatus

This returns one record for each status value that each employee has.

I guess my question is how do I retrieve all of the fields in a record with the most recent date for each unique employee.

Thanks for the help.Read this thread

http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=31147

And change MIN to MAX|||Thanks for the quick reply.

Which post in that thread am I looking at?|||Just got it working. With a little tweaking of the second last post in the linked thread everything is perfect

Thanks Brett!

Eliminate Duplicate Records

Hi

How can i eliminate duplicate records from a single table?

There is a surrogate key in the table and i just can't seem to wrap my head around the layout.

ID | MemberID | Name | Surname | DateOfAssessment | etc.

The ID is unique, but there are duplicate memberID's with different date of assessments, i want to take the last used date of assessment and keep only that one, and remove the rest from the table?

I'm sure the query is simple, i'm just too stupid to see how to solve it!!

Any help would be greatly appreciated.

Kind Regards
Carel Greaves

This should give you an idea of one method to accomplish your goal.

Code Snippet


SET NOCOUNT ON


DECLARE @.MyTable table
( [ID] int IDENTITY,
MemberID int,
[Name] varchar(20),
SurName varchar(20),
DateOfAssessment datetime
)


INSERT INTO @.MyTable VALUES ( 2, 'Bill', 'Jones', '2007/05/25' )
INSERT INTO @.MyTable VALUES ( 3, 'Mary', 'Smith', '2007/05/26' )
INSERT INTO @.MyTable VALUES ( 4, 'Susy', 'Williams', '2007/05/22' )
INSERT INTO @.MyTable VALUES ( 2, 'Bill', 'Jones', '2007/05/26' )
INSERT INTO @.MyTable VALUES ( 2, 'Bill', 'Jones', '2007/04/26' )


SELECT *
FROM @.MyTable


DELETE @.MyTable
FROM @.MyTable m
JOIN (SELECT
MemberID,
DateOfAssessment = max( DateOfAssessment )
FROM @.MyTable
GROUP BY MemberID ) dt
ON ( m.MemberID = dt.MemberID
AND m.DateOfAssessment <> dt.DateOfAssessment
)


SELECT *
FROM @.MyTable

ID MemberID Name SurName DateOfAssessment
--
2 3 Mary Smith 2007-05-26 00:00:00.000
3 4 Susy Williams 2007-05-22 00:00:00.000
4 2 Bill Jones 2007-05-26 00:00:00.000

|||
Thanks you very much.

Eliminate duplicate records

department table got duplicate data......

when i insert data from department to employee table, i want to eliminate duplicate and insert unique data.

department

emp_id emp_name emp_address

10 mary melville,ny

10 mary longisland,ny

11 linsy sugarland,tx

12 sam fairfax,va

12 sam dice,va

i want result in employee table....How can i get it with fastest query...i got tons of record to insert.

emp_id emp_name emp_address

10 mary melville,ny

11 linsy sugarland,tx

12 sam fairfax,va

insert into employee(emp_id,emp_name,emp_address)
select emp_id, max(emp_name), max(emp_address)
from department

where emp_id is not null

If you are using SQL Server 2005, give a look to the ROW_NUMBER() functon in books online. One technique is to create a "sequence number" for each grouping using the ROW_NUMBER() function and to select the records that have a generated "sequence number" of 1.

Also, something like this can be done:

Code Snippet

declare @.mockup table
( emp_id integer,
emp_name varchar(10),
emp_address varchar(25)
)
insert into @.mockup
select 10, 'mary', 'melville,ny' union all
select 10, 'mary', 'longisland,ny' union all
select 11, 'linsy', 'sugarland,tx' union all
select 12, 'sam', 'fairfax,va' union all
select 12, 'sam', 'dice,va'
--select * from @.mockup

select emp_id,
emp_name,
( select max(emp_address)
from @.mockup b
where a.emp_id = b.emp_id
) as emp_address
from @.mockup a
group by emp_id, emp_name

/*
emp_id emp_name emp_address
-- - -
10 mary melville,ny
11 linsy sugarland,tx
12 sam fairfax,va
*/

|||

For SS2005:

Code Snippet

create table #dept( emp_id int, emp_name varchar(50), emp_address varchar(50) )

insert into #dept

select 10, 'mary', 'melville,ny'

union all select 10, 'mary', 'longisland,ny'

union all select 11, 'linsy', 'sugarland,tx'

union all select 12, 'sam', 'fairfax,va'

union all select 12, 'sam', 'dice,va'

select emp_id, emp_name, emp_address

from

(

select *, row_number() over (partition by emp_id order by emp_id, emp_name desc, emp_address desc) as rn

from #dept

) emp

where rn = 1

For SS2000:

Code Snippet

select emp_id, max(emp_name) as emp_name, max(emp_address) as emp_address

from #dept

group by emp_id

|||

The trick here is to get a list of the records you want to keep. However, using MAX on character fields isn't the way to go (unless you don't care which record to keep). Hopefully you've got a column which gives some sort of order to the data (eg datecreated) so you can choose the most recent one, otherwise its a guessing game as to which one is "keepable"

So assuming you have one... ;-)

INSERT INTO employee (emp_id, name, emp_address)

SELECT d.emp_id, d.name, d.emp_address
FROM department d

INNER JOIN

(SELECT emp_id, MAX(datecreated) AS dt

FROM department

GROUP BY emp_id) AS dist

ON d.emp_id = dist.emp_id and d.datecreated = dist.dt

HTH!

|||

when i use max function....i get this error....

i get an error like this.....
Server: Msg 8152, Level 16, State 9, Line 1
String or binary data would be truncated.
The statement has been terminated.

|||When you try the insert? Check that your column definitions in the department and employee tables match. You may have a smaller field in the destination table...|||

The problem isn't with max(), it's a mismatch between column definitions in your source and your target.

|||

Getting this error...

Server: Msg 207, Level 16, State 3, Line 1
Invalid column name 'datecreated'.
Server: Msg 207, Level 16, State 1, Line 1
Invalid column name 'datecreated'.

|||

I guess you don't have a column matching that in your table ;-)

My example was based upon the presumption that you had a field such as that in your table. You won't be able to just copy and paste it and get results. It was based upon the following DDL

CREATE TABLE departments
(emp_id int, emp_name varchar(100), emp_address varchar(100), datecreated datetime)


Basically, if you do have a field (in my example datecreated) that tells you which record is the one you want to keep then you can use a variation on my example. If you don't have one and you're happy for the choice to be made based arbitarilly then go with Dales option. His can be used straight out of the box.

Sorry for any confusion.

|||

Is there any internal field of table record i can use?.

like in oracle, do we have system variable/ records created along with insert of data in a row in sql server?.

so can i use that date created field?.

|||

I'm afraid not. If you don't have an explicit date field you can't use my example.


If the ddl of your table is the same as in Dales example, you should go with one of his excellent suggestions.

|||

i used dale solution...max thing.

select emp_id, max(emp_name) as emp_name, max(emp_address) as emp_address

from #dept

group by emp_id

works fine today.

Eliminate Duplicate ID's in this StoredProcedure

I have a stored procedure that I use for Monthly Billing

delete from BillingCurrent

insert into BillingCurrent([Name],Address,City,State,Zip,InvoiceID,CustomerID,[Date],InvoiceTotal)

SELECT Customers.Name,Customers.Address,Customers.City,Customers.State,Customers.Zip,Invoices.InvoiceID,Customers.CustomerID,Invoices.Date,Invoices.InvoiceTotal

FROM Invoices INNER JOIN

Customers ON Invoices.CustomerID = Customers.CustomerID

WHERE CONVERT(varchar(15), Invoices.Date, 112) Between CONVERT(varchar(15),dateadd (d,-30,GETDATE()), 112)and CONVERT(varchar(15), GETDATE(), 112)

This works great, but if a customer has more than one invoice open it adds that Customer again (for each invoice that is not 0.00. How can I change this SP to only add each Customer once regardless of how many invoices they have?

In your query you tried to work with Invoice.Date and Invoice.InvoiceTotal fields.

If you have following data in your Invoices table for CustomerId=1:

Date InvoiceTotal

1.1.2006 1000

1.1.2007 2000

What do you want to save into BillingCurrent?

Which SQL Server version do you use?

|||

Cammyr:

You state that you want to reduce the number of rows to 1 whenever a customer has more than one invoice during a month. However, what I don't understand is what we are supposed to put in these three fields if we are returning only one row: (1) InvoiceID, (2) Date, and (3) InvoiceTotal. I have therefore put together two methods to return only one row per customer. The first method simply returns this information for the last invoice posted for a given customer. There is a good chance that this is not what you want but would rather have the total of all invoices summed into the InvoiceTotal column:

--
-- This method uses uses (1) CROSS APPLY and (2) ROW_NUMBER() to
-- find the "last" invoice posted to a particular customers account to
-- display that information with the customer information.
--

SELECT c.Name,
c.Address,
c.City,
c.State,
c.Zip,
i.InvoiceID,
c.CustomerID,
i.Date,
i.InvoiceTotal
FROM Customers c
cross apply
( select invoiceId,
row_number ()
over ( order by Date desc,
invoiceId desc
)
as seq,
Date,
InvoiceTotal
from Invoices p
where p.customerId = c.customerId
and CONVERT(varchar(15), p.Date, 112)
Between CONVERT(varchar(15),dateadd (d,-30,GETDATE()), 112)
and CONVERT(varchar(15), GETDATE(), 112)
) i
where i.seq = 1

-- Name Address City State Zip InvoiceID CustomerID Date InvoiceTotal
-- -- -- -- -- --
-- Dave Mugambo 1318 Mockingbird Lane Munster In 46321 4 1 2007-02-28 00:00:00.000 14.92


--
-- This method instead posts the SUM of all invoices into the InvoiceTotal
-- column and chooses the highest InvoiceID for the InvoiceID column and
-- the highest Date for the Date column. It is not clear what is requried
-- for these fields
--

SELECT c.Name,
c.Address,
c.City,
c.State,
c.Zip,
max (i.InvoiceID) as InvoiceID,
c.CustomerID,
max (i.Date) as Date,
sum (i.InvoiceTotal) as InvoiceTotal
FROM Invoices i
INNER JOIN Customers c
ON i.CustomerID = c.CustomerID
WHERE CONVERT(varchar(15), i.Date, 112)
Between CONVERT(varchar(15),dateadd (d,-30,GETDATE()), 112)
and CONVERT(varchar(15), GETDATE(), 112)
group by c.CustomerId,
c.Name,
c.Address,
c.City,
c.State,
c.Zip

-- Name Address City State Zip InvoiceID CustomerID Date InvoiceTotal
-- -- -- -- -- --
-- Dave Mugambo 1318 Mockingbird Lane Munster In 46321 4 1 2007-02-28 00:00:00.000 94.76

( I completely agree with Konstantin's questions. )

|||

I want to save Name,Address,City,State,Zip,InvoiceID,CustomerID in BillingCurrent

But really I guess I don't need InvoiceID or InvoiceTotal as I have a relationship between BillingCurrent and InvoiceDetails On CustomerID Column that will get all the InvoiceDetails for each CustomerID.

I'm using vb.express with sql.express

|||

You are both right there were quite a few fields in my insert statement that were not needed. All I really needed was CustomerID,Name,Address,City and Zip because I can use the getchildrows method to get all of the corresponding invoicedetails for each customerID.

This is what I came up with, it seems to be working fine.

insert into BillingCurrent (CustomerID,[Name],Address,City,State,Zip)

SELECT Distinct Customers.CustomerID, Customers.Name,Customers.Address, Customers.City, Customers.State, Customers.Zip

FROM Invoices INNER JOIN

Customers ON Invoices.CustomerID = Customers.CustomerID

WHERE (CONVERT(varchar(15), Invoices.Date, 112) BETWEEN CONVERT(varchar(15), DATEADD(d, - 30, GETDATE()), 112) AND CONVERT(varchar(15),

GETDATE(), 112))

I'm sure yours works for that situation Kent, so I'm marking yours as the answer.

Thanks