department table got duplicate data......
when i insert data from department to employee table, i want to eliminate duplicate and insert unique data.
department
emp_id emp_name emp_address
10 mary melville,ny
10 mary longisland,ny
11 linsy sugarland,tx
12 sam fairfax,va
12 sam dice,va
i want result in employee table....How can i get it with fastest query...i got tons of record to insert.
emp_id emp_name emp_address
10 mary melville,ny
11 linsy sugarland,tx
12 sam fairfax,va
insert into employee(emp_id,emp_name,emp_address)
select emp_id, max(emp_name), max(emp_address)
from department
where emp_id is not null
If you are using SQL Server 2005, give a look to the ROW_NUMBER() functon in books online. One technique is to create a "sequence number" for each grouping using the ROW_NUMBER() function and to select the records that have a generated "sequence number" of 1.
Also, something like this can be done:
Code Snippet
declare @.mockup table
( emp_id integer,
emp_name varchar(10),
emp_address varchar(25)
)
insert into @.mockup
select 10, 'mary', 'melville,ny' union all
select 10, 'mary', 'longisland,ny' union all
select 11, 'linsy', 'sugarland,tx' union all
select 12, 'sam', 'fairfax,va' union all
select 12, 'sam', 'dice,va'
--select * from @.mockup
select emp_id,
emp_name,
( select max(emp_address)
from @.mockup b
where a.emp_id = b.emp_id
) as emp_address
from @.mockup a
group by emp_id, emp_name
/*
emp_id emp_name emp_address
-- - -
10 mary melville,ny
11 linsy sugarland,tx
12 sam fairfax,va
*/
For SS2005:
Code Snippet
create table #dept( emp_id int, emp_name varchar(50), emp_address varchar(50) )
insert into #dept
select 10, 'mary', 'melville,ny'
union all select 10, 'mary', 'longisland,ny'
union all select 11, 'linsy', 'sugarland,tx'
union all select 12, 'sam', 'fairfax,va'
union all select 12, 'sam', 'dice,va'
select emp_id, emp_name, emp_address
from
(
select *, row_number() over (partition by emp_id order by emp_id, emp_name desc, emp_address desc) as rn
from #dept
) emp
where rn = 1
For SS2000:
Code Snippet
select emp_id, max(emp_name) as emp_name, max(emp_address) as emp_address
from #dept
group by emp_id
|||
The trick here is to get a list of the records you want to keep. However, using MAX on character fields isn't the way to go (unless you don't care which record to keep). Hopefully you've got a column which gives some sort of order to the data (eg datecreated) so you can choose the most recent one, otherwise its a guessing game as to which one is "keepable"
So assuming you have one... ;-)
INSERT INTO employee (emp_id, name, emp_address)
SELECT d.emp_id, d.name, d.emp_address
FROM department d
INNER JOIN
(SELECT emp_id, MAX(datecreated) AS dt
FROM department
GROUP BY emp_id) AS dist
ON d.emp_id = dist.emp_id and d.datecreated = dist.dt
HTH!
|||when i use max function....i get this error....
i get an error like this.....
Server: Msg 8152, Level 16, State 9, Line 1
String or binary data would be truncated.
The statement has been terminated.
The problem isn't with max(), it's a mismatch between column definitions in your source and your target.
|||Getting this error...
Server: Msg 207, Level 16, State 3, Line 1
Invalid column name 'datecreated'.
Server: Msg 207, Level 16, State 1, Line 1
Invalid column name 'datecreated'.
I guess you don't have a column matching that in your table ;-)
My example was based upon the presumption that you had a field such as that in your table. You won't be able to just copy and paste it and get results. It was based upon the following DDL
CREATE TABLE departments
(emp_id int, emp_name varchar(100), emp_address varchar(100), datecreated datetime)
Basically, if you do have a field (in my example datecreated) that tells you which record is the one you want to keep then you can use a variation on my example. If you don't have one and you're happy for the choice to be made based arbitarilly then go with Dales option. His can be used straight out of the box.
Sorry for any confusion.
|||Is there any internal field of table record i can use?.
like in oracle, do we have system variable/ records created along with insert of data in a row in sql server?.
so can i use that date created field?.
|||I'm afraid not. If you don't have an explicit date field you can't use my example.
If the ddl of your table is the same as in Dales example, you should go with one of his excellent suggestions.
i used dale solution...max thing.
select emp_id, max(emp_name) as emp_name, max(emp_address) as emp_address
from #dept
group by emp_id
works fine today.
No comments:
Post a Comment