请高手过招

shiyiming · 发表于 2012-2-16 09:23:20

假设一个数据集有两个变量：child 和 parent
DATA sample;
INPUT child $ parent $;
DATALINES;
a b
c b
d c
e g
f e
;
run;

例如:
a 的 parent 是 b,
c 的 parent 是 b,
d 的 parent 是 c,
e 的 parent 是 g,
f 的 parent 是 e,
等等。。。

现在需要创建第三个变量highest_parent, 使得最终的数据集成为：
DATA sample;
INPUT child $ parent $ highest_parent $;
DATALINES;
a b b
c b b
d c b
e g g
f e g
;
run;

a 的 parent 是 b,highest_parent 是 b
c 的 parent 是 b,highest_parent 是 b
d 的 parent 是 c,但 c 的 parent 是 b, 所以highest_parent 是 b，
相似的
e 的 parent 是 g,highest_parent 是 g
f 的 parent 是 e,highest_parent 是 g
等等。。。
如何用SAS实现?

shiyiming · 发表于 2012-2-16 13:59:25

[code:3a6kqgig]
proc sql noprint;
create table sample2 as
select s1.child,s1.parent,coalesce(s2.hp,s1.parent) as hp
from sample as s1 left join sample(rename=(parent=hp)) as s2
on s1.parent=s2.child
order by 1;
quit;
[/code:3a6kqgig]

shiyiming · 发表于 2012-2-16 16:40:51

过招似乎是比武大擂台的意思？

shiyiming · 发表于 2012-2-16 16:43:36

1. Use breadth first algorithm until you find the deepest.
2. you don't have to rename at first inorder to avoid name conflict and data overwite in sql. this method is only used in data step.
[quote="zhangzachary":13uv3f5j][code:13uv3f5j]
proc sql noprint;
create table sample2 as
select s1.child,s1.parent,coalesce(s2.hp,s1.parent) as hp
from sample as s1 left join sample(rename=(parent=hp)) as s2
on s1.parent=s2.child
order by 1;
quit;
[/code:13uv3f5j][/quote:13uv3f5j]

shiyiming · 发表于 2012-2-16 17:21:21

[quote:cv0ke04g]e 的 parent 是 c,
...
e 的 parent 是 g, [/quote:cv0ke04g]
e的parent是g还是b?

shiyiming · 发表于 2012-2-16 22:40:07

[quote="jeozu":39plzahb]1. Use breadth first algorithm until you find the deepest.
2. you don't have to rename at first inorder to avoid name conflict and data overwite in sql. this method is only used in data step.
[quote="zhangzachary":39plzahb][code:39plzahb]
proc sql noprint;
create table sample2 as
select s1.child,s1.parent,coalesce(s2.hp,s1.parent) as hp
from sample as s1 left join sample(rename=(parent=hp)) as s2
on s1.parent=s2.child
order by 1;
quit;
[/code:39plzahb][/quote:39plzahb][/quote:39plzahb]

嗯，对，好像不用rename，直接用s2.parent就行……我比较习惯用data step, 所以在想用SQL解决一些问题的时候容易混淆一些东西……
对于第一条，意思是要找更好的算法？还是要研究透了把最优的东西写出来？~ 我只是刚入门，第一目的是解决问题，优化算法等还需要多多学习啊~~

shiyiming · 发表于 2012-2-17 03:12:01

[quote="hopewell":tfwi7vu5][quote:tfwi7vu5]e 的 parent 是 c,
...
e 的 parent 是 g, [/quote:tfwi7vu5]
e的parent是g还是b?[/quote:tfwi7vu5]

sorry for your confusion. corrections have been made. actually the sample data set contains a hierarchy structure

shiyiming · 发表于 2012-2-17 03:21:43

[quote="zhangzachary":2ic1wsjg][code:2ic1wsjg]
proc sql noprint;
create table sample2 as
select s1.child,s1.parent,coalesce(s2.hp,s1.parent) as hp
from sample as s1 left join sample(rename=(parent=hp)) as s2
on s1.parent=s2.child
order by 1;
quit;
[/code:2ic1wsjg][/quote:2ic1wsjg]

After you add two more data lines, for example,
DATA sample;
INPUT child $ parent $;
DATALINES;
a b
c b
d c
e g
f e
[b:2ic1wsjg]t d
s t[/b:2ic1wsjg];
run;

it doesn't work.
For the last row, s->t , t->d, d->c, c->b, so the highest_parent for s is b.

shiyiming · 发表于 2012-2-17 10:04:01

[code:kmi4n4ym]data regular;
set sample;
rename child=c parent=highest_parent;
run;
data out(drop=rc c);
length c highest_parent $8;
if _n_=1 then do;
      declare hash h(dataset:'regular');
      rc=h.defineKey('c');
      rc=h.defineData('highest_parent');
      rc=h.defineDone();
      call missing(c,highest_parent);
end;
set sample;
highest_parent=parent;
do while(h.find(key:highest_parent)=0);
end;
run;[/code:kmi4n4ym]

shiyiming · 发表于 2012-2-17 13:47:17

[quote="jerrya00":3dy4xkum][quote="zhangzachary":3dy4xkum][code:3dy4xkum]
proc sql noprint;
create table sample2 as
select s1.child,s1.parent,coalesce(s2.hp,s1.parent) as hp
from sample as s1 left join sample(rename=(parent=hp)) as s2
on s1.parent=s2.child
order by 1;
quit;
[/code:3dy4xkum][/quote:3dy4xkum]

After you add two more data lines, for example,
DATA sample;
INPUT child $ parent $;
DATALINES;
a b
c b
d c
e g
f e
[b:3dy4xkum]t d
s t[/b:3dy4xkum];
run;

it doesn't work.
For the last row, s->t , t->d, d->c, c->b, so the highest_parent for s is b.[/quote:3dy4xkum]
好吧~ 的确没有考虑到位~ 以后写程序的确要了解透问题，考虑全面才行啊~
hopewell大大的 do while 用得很精彩，学习了！~

		自动登录	找回密码
密码			立即注册

请高手过招

请高手过招

Re: 请高手过招

Re: 请高手过招

Re: 请高手过招

Re: 请高手过招

Re: 请高手过招

Re: 请高手过招

Re: 请高手过招

Re: 请高手过招

Re: 请高手过招