SAS中文论坛

标题: Count Data Models in SAS® [打印本页]

作者: shiyiming    时间: 2012-3-14 04:21
标题: Count Data Models in SAS®
From LCChien's blog on blogspot

原文載點:[<a href="http://www.google.com/url?sa=t&amp;rct=j&amp;q=&amp;esrc=s&amp;source=web&amp;cd=1&amp;cts=1331653599525&amp;ved=0CCkQFjAA&amp;url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.176.1766%26rep%3Drep1%26type%3Dpdf&amp;ei=jGVfT4CNM6mnsAKy19j7Bw&amp;usg=AFQjCNHGgPrkhBj4iPh30Ig5Hh251j5jQg&amp;sig2=lxnDgMUyWfiKn0hC1L4vvw">LINK</a>]<br /><br />在統計應用領域,我們經常會遇到所謂的"計數資料"(count data),比方說就診人數,死亡人數,顧客人數等等。在分析這些計數資料時,最常使用的模型莫過於卜瓦松模型(Poisson model)。但在實際的應用上,使用者經常需要面對兩大問題:一是過度離散(overdispersion)問題,二是觀測值有太多零(excess zeros)的問題。這些問題通常都會導致卜瓦松模型估計的結果有偏誤。因此 WenSui Liu 和 Jimmy Cela 在 2008 年 SAS Global Forum 發表了一篇技術文件,介紹一些其他比較適合處理計數資料的模型。<br /><br /><a name='more'></a><b><span style="font-size: large;">。資料</span></b><br />該文使用的資料是 Deb and Trivedi (1997) 所發表的一篇關於醫療保健在老人上的利用研究所使用的資料。裡面包含了:<br />應變數&nbsp;-<br />1. HOSP = 住院次數<br />因變數 -<br />2. EXCLHLTH = 自我健康評量(1=極佳,0=其他)<br />3. POORHLTH = 自我健康評量(1=極差,0=其他)<br />4. NUMCHRON = 慢性病數目<br />5. AGE = 年齡<br />6. MALE = 性別(1=男性,0=女性)<br />7. SCHOOL = 教育程度(以年為單位)<br />8. PRIVINS = 保險(1=有健保,0=沒有健保)<br /><br />由於應變數 HOSP 的變異數是平均值的兩倍,這表示該筆資料有過度離散的問題。此外,4406 份樣本中,有 3541 人沒有任何住院記錄,所以這筆資料也有太多零的問題。我們從下面這張圖也可以看出一些端倪:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://www.diigo.com/item/t/2043829_111676671_6438414" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="344" src="http://www.diigo.com/item/t/2043829_111676671_6438414" width="640" /></a></div><br />這張圖是將樣本的觀測機率(紅線)以及實際的卜瓦松分配的機率密度函數(藍線)拿來比較。我們可以發現,在 HOSP = 1 處,紅線和藍線有著極大的差距。因此,接下來將這筆資料利用數個不同的模型來配飾,來比較估計參數的不同以及觀測機率和預測機率的差異。<br /><br /><b><span style="font-size: large;">。卜瓦松模型</span></b><br />程式:<br /><code>/* METHOD 1: PROC NLMIXED &nbsp;*/<br />proc nlmixed data = tblNMES;<br />&nbsp; parms b0 = 0 b1 = 0 b2 = 0 b3 = 0 b4 = 0 b5 = 0 b6 = 0 b7 = 0;<br />&nbsp; mu = exp(b0 + b1 * EXCLHLTH + b2 * POORHLTH + b3 * NUMCHRON + b4 * AGE + <br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;b5 * MALE + b6 * SCHOOL + b7 * PRIVINS);<br />&nbsp; ll = -mu + HOSP * log(mu) - log(fact(HOSP));<br />&nbsp; model HOSP ~ general(ll);<br />&nbsp; predict mu out = poi_out (rename = (pred = Yhat)); <br />run;<br /><br />/* METHOD 2: PROC COUNTREG */<br />proc countreg data = tblNMES type = poisson; <br />&nbsp; model HOSP = EXCLHLTH POORHLTH NUMCHRON AGE MALE SCHOOL PRIVINS;<br />run; </code><br /><br />原作者使用比較複雜的 PROC NLMIXED 和 PROC GOUNTREG 程序來完成模型配飾。這兩種方法與 PROC GENMOD 做出來的結果應該會一致。簡單來講,在 PROC NLMIXED 程序裡面,要先設定每個參數的初始值(params),然後把還沒加上 link function 前的公式打出來(mu),然後把 log-likelihood 寫好(ll),最後才是配飾模型(model)。而 PROC COUNTREG 程序則比較單純,只要把後面的 type= 設定成 poisson 即可。<br /><br />報表:<br /><code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Model Fit Summary<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Log Likelihood &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;-3046<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; AIC &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;6108<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; SBC &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;6159<br /><br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Parameter Estimates<br /><br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Standard &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Approx<br />&nbsp; &nbsp; &nbsp; &nbsp; Parameter &nbsp; &nbsp; &nbsp; &nbsp;Estimate &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Error &nbsp; &nbsp;t Value &nbsp; &nbsp;Pr &gt; |t|<br />&nbsp; &nbsp; &nbsp; &nbsp; Intercept &nbsp; &nbsp; &nbsp; -3.329044 &nbsp; &nbsp; &nbsp; &nbsp;0.339728 &nbsp; &nbsp; &nbsp;-9.80 &nbsp; &nbsp; &nbsp;&lt;.0001<br />&nbsp; &nbsp; &nbsp; &nbsp; exclhlth &nbsp; &nbsp; &nbsp; &nbsp;-0.723412 &nbsp; &nbsp; &nbsp; &nbsp;0.175644 &nbsp; &nbsp; &nbsp;-4.12 &nbsp; &nbsp; &nbsp;&lt;.0001<br />&nbsp; &nbsp; &nbsp; &nbsp; poorhlth &nbsp; &nbsp; &nbsp; &nbsp; 0.626157 &nbsp; &nbsp; &nbsp; &nbsp;0.067858 &nbsp; &nbsp; &nbsp; 9.23 &nbsp; &nbsp; &nbsp;&lt;.0001<br />&nbsp; &nbsp; &nbsp; &nbsp; numchron &nbsp; &nbsp; &nbsp; &nbsp; 0.264462 &nbsp; &nbsp; &nbsp; &nbsp;0.018277 &nbsp; &nbsp; &nbsp;14.47 &nbsp; &nbsp; &nbsp;&lt;.0001<br />&nbsp; &nbsp; &nbsp; &nbsp; age &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0.186406 &nbsp; &nbsp; &nbsp; &nbsp;0.042014 &nbsp; &nbsp; &nbsp; 4.44 &nbsp; &nbsp; &nbsp;&lt;.0001<br />&nbsp; &nbsp; &nbsp; &nbsp; male &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0.103186 &nbsp; &nbsp; &nbsp; &nbsp;0.056274 &nbsp; &nbsp; &nbsp; 1.83 &nbsp; &nbsp; &nbsp;0.0667<br />&nbsp; &nbsp; &nbsp; &nbsp; school &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;-0.000206 &nbsp; &nbsp; &nbsp; &nbsp;0.007871 &nbsp; &nbsp; &nbsp;-0.03 &nbsp; &nbsp; &nbsp;0.9791<br />&nbsp; &nbsp; &nbsp; &nbsp; privins &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0.108652 &nbsp; &nbsp; &nbsp; &nbsp;0.069251 &nbsp; &nbsp; &nbsp; 1.57 &nbsp; &nbsp; &nbsp;0.1167</code><br /><br /><br />觀測機率與預測機率比較圖:<br /><br /><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"><a href="http://www.diigo.com/item/t/2043829_111695391_6439481" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="336" src="http://www.diigo.com/item/t/2043829_111695391_6439481" width="640" /></a></div><br />毫不意外地,在沒有用任何調整的情況下用卜瓦松模型去估計,則預測機率值在 HOSP = 0 和 HOSP = 1 處明顯地可發現與實際機率值有差異。當然利用這種圖來評量過度離散是比較主觀。作者提到了兩種簡單的方法來檢定。在此不額外說明演算過程。重點是這兩種檢定方法會產生一個 p-value 供判斷使用。只要出現 p-value &lt; 0.05 就表示過度離散的情況存在。<br /><br /><b>(1) Cameron and Trivedi (1996)</b><br />程式:<br /><code>data ols_tmp;<br /><br />&nbsp; set poi_out; <br />&nbsp; dep = ((HOSP - Yhat) ** 2 - HOSP) / Yhat; <br />run;<br /><br />proc reg data = ols_tmp;<br />&nbsp; model dep = Yhat / noint; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;/* FIT A OLS REGRESSION WITHOUT INTERCEPT */<br />run; quit;</code><br /><br />報表:<br /><code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Parameter Estimates<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Parameter &nbsp; &nbsp; Standard<br />&nbsp; Variable &nbsp; Label &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;DF &nbsp; &nbsp; Estimate &nbsp; &nbsp; &nbsp; &nbsp;Error &nbsp;t Value &nbsp;Pr &gt; |t|<br />&nbsp; Yhat &nbsp; &nbsp; &nbsp; Predicted Value &nbsp; 1 &nbsp; &nbsp; &nbsp;1.63419 &nbsp; &nbsp; &nbsp;0.22609 &nbsp; &nbsp; 7.23 &nbsp; &nbsp;<b><span style="color: red;">&lt;.0001</span></b></code><br /><br />用此法檢定出來的 p-value &lt; .0001,所以表示過度離散的現象存在於這筆資料中。<br /><br /><b>(2) Greene (2002)</b><br />程式:<br /><code>proc iml;<br />&nbsp; use poi_out; <br />&nbsp; read all var {HOSP} into y;<br />&nbsp; read all var {Yhat} into yhat;<br />&nbsp; close poisson_out;<br />&nbsp; e &nbsp; &nbsp; &nbsp;= (y - yhat);<br />&nbsp; n &nbsp; &nbsp; &nbsp;= nrow(y);<br />&nbsp; ybar &nbsp; = y`[, :];<br />&nbsp; LM &nbsp; &nbsp; = (e` * e - n * ybar) ** 2 / (2 * yhat` * yhat); <br />&nbsp; Pvalue = 1 - probchi(LM, 1);<br />&nbsp; print LM Pvalue;<br />quit;</code><br /><br />報表:<br /><code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;LM &nbsp; &nbsp;PVALUE<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 794.14707 &nbsp; &nbsp; &nbsp; &nbsp; <b><span style="color: red;">0</span></b></code><br /><br />同樣地,用此法算出來的 p-value 逼近於零,所以最後預測機率值和觀測機率值出現落差是完全不意外。<br /><br /><b><span style="font-size: large;">。負二項模型</span></b><br />負二項模型的 PROC NLMIXED 程序內的寫法跟卜瓦松模型很像,唯一要注意的就是他的 log-likelihood 長得不一樣。看起來是複雜些。所以還是用 PROC COUNTREG 程序來做比較簡單。&nbsp;程式如下:<br /><code>/* METHOD 1: PROC NLMIXED &nbsp;*/<br />proc nlmixed data = tblNMES;<br />&nbsp; parms b0 = 0 b1 = 0 b2 = 0 b3 = 0 b4 = 0 b5 = 0 b6 = 0 b7 = 0;<br />&nbsp; mu = exp(b0 + b1 * EXCLHLTH + b2 * POORHLTH + b3 * NUMCHRON + b4 * AGE + <br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;b5 * MALE + b6 * SCHOOL + b7 * PRIVINS);<br />&nbsp; <span style="color: red;">ll = lgamma(HOSP + 1 / alpha) - lgamma(HOSP + 1) - lgamma(1 / alpha) +<br />&nbsp; &nbsp; &nbsp; &nbsp;HOSP * log(alpha * mu) - <br />&nbsp; &nbsp; &nbsp; &nbsp;(HOSP + 1 / alpha) * log(1 + alpha * mu);</span><br />&nbsp; model HOSP ~ general(ll);<br />&nbsp; predict mu out = nb_out (rename = (pred = Yhat)); <br />run;<br /><br />/* METHOD 2: PROC COUNTREG &nbsp; &nbsp; */<br />proc countreg data = tblNMES type = negativebinom method = qn; <br />&nbsp; model HOSP = EXCLHLTH POORHLTH NUMCHRON AGE MALE SCHOOL PRIVINS;<br />run;</code><br /><br />報表:<br /><code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Model Fit Summary<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Log Likelihood &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;-2857<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; AIC &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;5731<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; SBC &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;5789<br /><br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Parameter Estimates<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Standard &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Approx<br />&nbsp; &nbsp; &nbsp; &nbsp; Parameter &nbsp; &nbsp; &nbsp; &nbsp;Estimate &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Error &nbsp; &nbsp;t Value &nbsp; &nbsp;Pr &gt; |t|<br />&nbsp; &nbsp; &nbsp; &nbsp; Intercept &nbsp; &nbsp; &nbsp; -3.752640 &nbsp; &nbsp; &nbsp; &nbsp;0.446835 &nbsp; &nbsp; &nbsp;-8.40 &nbsp; &nbsp; &nbsp;&lt;.0001<br />&nbsp; &nbsp; &nbsp; &nbsp; exclhlth &nbsp; &nbsp; &nbsp; &nbsp;-0.697875 &nbsp; &nbsp; &nbsp; &nbsp;0.193318 &nbsp; &nbsp; &nbsp;-3.61 &nbsp; &nbsp; &nbsp;0.0003<br />&nbsp; &nbsp; &nbsp; &nbsp; poorhlth &nbsp; &nbsp; &nbsp; &nbsp; 0.613926 &nbsp; &nbsp; &nbsp; &nbsp;0.095392 &nbsp; &nbsp; &nbsp; 6.44 &nbsp; &nbsp; &nbsp;&lt;.0001<br />&nbsp; &nbsp; &nbsp; &nbsp; numchron &nbsp; &nbsp; &nbsp; &nbsp; 0.289418 &nbsp; &nbsp; &nbsp; &nbsp;0.026470 &nbsp; &nbsp; &nbsp;10.93 &nbsp; &nbsp; &nbsp;&lt;.0001<br />&nbsp; &nbsp; &nbsp; &nbsp; age &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0.238444 &nbsp; &nbsp; &nbsp; &nbsp;0.055265 &nbsp; &nbsp; &nbsp; 4.31 &nbsp; &nbsp; &nbsp;&lt;.0001<br />&nbsp; &nbsp; &nbsp; &nbsp; male &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0.153862 &nbsp; &nbsp; &nbsp; &nbsp;0.073033 &nbsp; &nbsp; &nbsp; 2.11 &nbsp; &nbsp; &nbsp;0.0351<br />&nbsp; &nbsp; &nbsp; &nbsp; school &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;-0.002271 &nbsp; &nbsp; &nbsp; &nbsp;0.010203 &nbsp; &nbsp; &nbsp;-0.22 &nbsp; &nbsp; &nbsp;0.8238<br />&nbsp; &nbsp; &nbsp; &nbsp; privins &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0.093922 &nbsp; &nbsp; &nbsp; &nbsp;0.090494 &nbsp; &nbsp; &nbsp; 1.04 &nbsp; &nbsp; &nbsp;0.2993<br />&nbsp; &nbsp; &nbsp; &nbsp; _Alpha &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1.766727 &nbsp; &nbsp; &nbsp; &nbsp;0.160492 &nbsp; &nbsp; &nbsp;11.01 &nbsp; &nbsp; &nbsp;&lt;.0001</code><br /><br />觀測機率與預測機率比較圖:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://www.diigo.com/item/t/2043829_111681693_6438630" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="348" src="http://www.diigo.com/item/t/2043829_111681693_6438630" width="640" /></a></div>從上圖可以明顯發現,用負二項模型來估計顯然是比較好的,因為觀測機率值和預測機率值兩條線幾乎重疊在一起。<br /><br /><b><span style="font-size: large;">。跨欄(Hurdle)模型</span></b><br />跨欄模型是 Dr. Mullahy 於 1986 年研發,他是假設當應變數是零,則服從二項分配,當大於零時,則服從另一個機率密度函數長得很像卜瓦松分配的分配(該分配沒有實際名字)。其機率密度函數可以用下面公式來表示:<br /><br /><div style="text-align: center;"><a href="http://www.codecogs.com/eqnedit.php?latex=\large%20f(Y_{i}|X_{i})=\left\{\begin{matrix}%20\theta_{i}%20&amp;%20for\;%20Y_{i}=0\\%20&amp;%20\frac{(1-\theta_{i})exp(-u_{i})u_{i}^{Y_{i}}}{(1-exp(-u_{i})Y_{i}!)}\;%20for\;%20Y_{i}%3E0%20\end{matrix}\left." target="_blank"><img src="http://latex.codecogs.com/gif.latex?\large f(Y_{i}|X_{i})=\left\{\begin{matrix} \theta_{i} &amp; for\; Y_{i}=0\\ &amp; \frac{(1-\theta_{i})exp(-u_{i})u_{i}^{Y_{i}}}{(1-exp(-u_{i})Y_{i}!)}\; for\; Y_{i}&gt;0 \end{matrix}\left." title="\large f(Y_{i}|X_{i})=\left\{\begin{matrix} \theta_{i} &amp; for\; Y_{i}=0\\ &amp; \frac{(1-\theta_{i})exp(-u_{i})u_{i}^{Y_{i}}}{(1-exp(-u_{i})Y_{i}!)}\; for\; Y_{i}&gt;0 \end{matrix}\left." /></a></div><br />因此其 log-likelihood 就變得非常複雜:<br /><div style="text-align: center;"><a href="http://www.codecogs.com/eqnedit.php?latex=\large%20LL=\sum_{i=1}^{n}[I(Y_{i}=0)log(\theta_{i})@plus;I(Y_{i}%3E0)(log(1-\theta_{i})-u_{i}@plus;Y_{i}log(u_{i})-log(1-exp(-u_{i}))-log(Y_{i}!))]" target="_blank"><img src="http://latex.codecogs.com/gif.latex?\large LL=\sum_{i=1}^{n}[I(Y_{i}=0)log(\theta_{i})+I(Y_{i}&gt;0)(log(1-\theta_{i})-u_{i}+Y_{i}log(u_{i})-log(1-exp(-u_{i}))-log(Y_{i}!))]" title="\large LL=\sum_{i=1}^{n}[I(Y_{i}=0)log(\theta_{i})+I(Y_{i}&gt;0)(log(1-\theta_{i})-u_{i}+Y_{i}log(u_{i})-log(1-exp(-u_{i}))-log(Y_{i}!))]" /></a></div><br />所以這個模型只能用 PROC NLMIXED 來處理。程式如下:<br /><code>/* METHOD 1: PROC NLMIXED &nbsp;*/<br />proc nlmixed data = tblNMES tech = dbldog;<br />&nbsp; parms a0 = 0 a1 = 0 a2 = 0 a3 = 0 a4 = 0 a5 = 0 a6 = 0 a7 = 0<br />&nbsp; &nbsp; &nbsp; &nbsp; b0 = 0 b1 = 0 b2 = 0 b3 = 0 b4 = 0 b5 = 0 b6 = 0 b7 = 0;<br />&nbsp; eta0 = a0 + a1 * EXCLHLTH + a2 * POORHLTH + a3 * NUMCHRON + a4 * AGE + <br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;a5 * MALE + a6 * SCHOOL + a7 * PRIVINS;<br />&nbsp; exp_eta0 = exp(eta0);<br />&nbsp; p0 = exp_eta0 / (1 + exp_eta0);<br />&nbsp; etap = b0 + b1 * EXCLHLTH + b2 * POORHLTH + b3 * NUMCHRON + b4 * AGE + <br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;b5 * MALE + b6 * SCHOOL + b7 * PRIVINS;<br />&nbsp; exp_etap = exp(etap);<br />&nbsp; if HOSP = 0 then ll = log(p0);<br />&nbsp; else ll = log(1 - p0) - exp_etap + HOSP * etap - lgamma(HOSP + 1)<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; - log(1 - exp(-exp_etap));<br />&nbsp; model HOSP ~ general(ll);<br />&nbsp; predict exp_etap out = hdl_out1 (keep = pred HOSP rename = (pred = Yhat));<br />&nbsp; predict p0 out = hdl_out2 (keep = pred rename = (pred = p0));<br />run;</code><br /><br />報表:<br /><code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Fit Statistics<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;-2 Log Likelihood &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 5758.4<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;AIC (smaller is better) &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 5790.4<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;AICC (smaller is better) &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;5790.6<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;BIC (smaller is better) &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 5892.7<br /><br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Parameter Estimates<br /><br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Standard<br />&nbsp;Parameter &nbsp; Estimate &nbsp; &nbsp; &nbsp;Error &nbsp; &nbsp; DF &nbsp; t Value &nbsp; Pr &gt; |t| &nbsp; &nbsp;Alpha<br />&nbsp;a0 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;4.2311 &nbsp; &nbsp; 0.4889 &nbsp; 4406 &nbsp; &nbsp; &nbsp;8.65 &nbsp; &nbsp; &lt;.0001 &nbsp; &nbsp; 0.05<br />&nbsp;a1 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0.5826 &nbsp; &nbsp; 0.1991 &nbsp; 4406 &nbsp; &nbsp; &nbsp;2.93 &nbsp; &nbsp; 0.0035 &nbsp; &nbsp; 0.05<br />&nbsp;a2 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; -0.6953 &nbsp; &nbsp; 0.1073 &nbsp; 4406 &nbsp; &nbsp; -6.48 &nbsp; &nbsp; &lt;.0001 &nbsp; &nbsp; 0.05<br />&nbsp;a3 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; -0.3078 &nbsp; &nbsp;0.02890 &nbsp; 4406 &nbsp; &nbsp;-10.65 &nbsp; &nbsp; &lt;.0001 &nbsp; &nbsp; 0.05<br />&nbsp;a4 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; -0.2752 &nbsp; &nbsp;0.06061 &nbsp; 4406 &nbsp; &nbsp; -4.54 &nbsp; &nbsp; &lt;.0001 &nbsp; &nbsp; 0.05<br />&nbsp;a5 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; -0.1948 &nbsp; &nbsp;0.08008 &nbsp; 4406 &nbsp; &nbsp; -2.43 &nbsp; &nbsp; 0.0151 &nbsp; &nbsp; 0.05<br />&nbsp;a6 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;-0.00593 &nbsp; &nbsp;0.01126 &nbsp; 4406 &nbsp; &nbsp; -0.53 &nbsp; &nbsp; 0.5982 &nbsp; &nbsp; 0.05<br />&nbsp;a7 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;-0.01924 &nbsp; &nbsp;0.09944 &nbsp; 4406 &nbsp; &nbsp; -0.19 &nbsp; &nbsp; 0.8466 &nbsp; &nbsp; 0.05<br />&nbsp;b0 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; -0.4693 &nbsp; &nbsp; 0.5627 &nbsp; 4406 &nbsp; &nbsp; -0.83 &nbsp; &nbsp; 0.4043 &nbsp; &nbsp; 0.05<br />&nbsp;b1 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; -0.9422 &nbsp; &nbsp; 0.4949 &nbsp; 4406 &nbsp; &nbsp; -1.90 &nbsp; &nbsp; 0.0570 &nbsp; &nbsp; 0.05<br />&nbsp;b2 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0.3373 &nbsp; &nbsp; 0.1008 &nbsp; 4406 &nbsp; &nbsp; &nbsp;3.35 &nbsp; &nbsp; 0.0008 &nbsp; &nbsp; 0.05<br />&nbsp;b3 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0.1426 &nbsp; &nbsp;0.02967 &nbsp; 4406 &nbsp; &nbsp; &nbsp;4.81 &nbsp; &nbsp; &lt;.0001 &nbsp; &nbsp; 0.05<br />&nbsp;b4 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;-0.01229 &nbsp; &nbsp;0.06834 &nbsp; 4406 &nbsp; &nbsp; -0.18 &nbsp; &nbsp; 0.8573 &nbsp; &nbsp; 0.05<br />&nbsp;b5 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;-0.03854 &nbsp; &nbsp;0.09227 &nbsp; 4406 &nbsp; &nbsp; -0.42 &nbsp; &nbsp; 0.6762 &nbsp; &nbsp; 0.05<br />&nbsp;b6 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;-0.01815 &nbsp; &nbsp;0.01290 &nbsp; 4406 &nbsp; &nbsp; -1.41 &nbsp; &nbsp; 0.1597 &nbsp; &nbsp; 0.05<br />&nbsp;b7 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0.2589 &nbsp; &nbsp; 0.1139 &nbsp; 4406 &nbsp; &nbsp; &nbsp;2.27 &nbsp; &nbsp; 0.0231 &nbsp; &nbsp; 0.05</code><br /><br /><br />觀測機率與預測機率比較圖:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://www.diigo.com/item/t/2043829_111682160_6438659" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="348" src="http://www.diigo.com/item/t/2043829_111682160_6438659" width="640" /></a></div><br />上圖顯示預測機率值跟觀測機率值也相當接近。<br /><br /><b><span style="font-size: large;">。零膨脹(Zero-inflated: ZIP)模型</span></b><br />以下皆簡稱 ZIP 模型。此模型是專門處理觀測值裡面有太多零的問題。在 PROC NLMIXED 和 PROC COUNTREG 程序裡面都可以處理。特別注意的是,在 PROC COUNTREG 程序裡面,除了要把 type 設定成 zip 外,還要在 model 後面加上額外的指令。如下面紅色程式碼所示。<br /><br />程式:<br /><code>/* METHOD 1: PROC COUNTREG */<br />proc countreg data = tblNMES type = zip;<br />&nbsp; model HOSP = EXCLHLTH POORHLTH NUMCHRON AGE MALE SCHOOL PRIVINS<br />&nbsp; / <span style="color: red;">zi(link = logistic, var = EXCLHLTH POORHLTH NUMCHRON AGE MALE SCHOOL PRIVINS)</span>;<br />run;<br /><br />/* METHOD 2: PROC NLMIXED &nbsp;*/<br />proc nlmixed data = tblNMES tech = dbldog;<br />&nbsp; parms a0 = 0 a1 = 0 a2 = 0 a3 = 0 a4 = 0 a5 = 0 a6 = 0 a7 = 0<br />&nbsp; &nbsp; &nbsp; &nbsp; b0 = 0 b1 = 0 b2 = 0 b3 = 0 b4 = 0 b5 = 0 b6 = 0 b7 = 0;<br />&nbsp; eta0 = a0 + a1 * EXCLHLTH + a2 * POORHLTH + a3 * NUMCHRON + a4 * AGE + <br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;a5 * MALE + a6 * SCHOOL + a7 * PRIVINS;<br />&nbsp; exp_eta0 = exp(eta0);<br />&nbsp; p0 = exp_eta0 / (1 + exp_eta0);<br />&nbsp; etap = b0 + b1 * EXCLHLTH + b2 * POORHLTH + b3 * NUMCHRON + b4 * AGE + <br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;b5 * MALE + b6 * SCHOOL + b7 * PRIVINS;<br />&nbsp; exp_etap = exp(etap);<br />&nbsp; if HOSP = 0 then ll = log(p0 + (1 - p0) * exp(-exp_etap));<br />&nbsp; else ll = log(1 - p0) + HOSP * etap - exp_etap - lgamma(HOSP + 1);<br />&nbsp; model HOSP ~ general(ll);<br />&nbsp; predict exp_etap out = zip_out1 (keep = pred HOSP rename = (pred = Yhat));<br />&nbsp; predict p0 out = zip_out2 (keep = pred rename = (pred = p0));<br />run;</code><br /><br /><br />報表:<br /><code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Model Fit Summary<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Log Likelihood &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;-2878<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;AIC &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;5788<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;SBC &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;5890<br /><br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Parameter Estimates<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Standard &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Approx<br />&nbsp;Parameter &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Estimate &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Error &nbsp; &nbsp;t Value &nbsp; &nbsp;Pr &gt; |t|<br />&nbsp;Intercept &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; -0.366506 &nbsp; &nbsp; &nbsp; &nbsp;0.572032 &nbsp; &nbsp; &nbsp;-0.64 &nbsp; &nbsp; &nbsp;0.5217<br />&nbsp;exclhlth &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;-0.919990 &nbsp; &nbsp; &nbsp; &nbsp;0.458460 &nbsp; &nbsp; &nbsp;-2.01 &nbsp; &nbsp; &nbsp;0.0448<br />&nbsp;poorhlth &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0.324926 &nbsp; &nbsp; &nbsp; &nbsp;0.101157 &nbsp; &nbsp; &nbsp; 3.21 &nbsp; &nbsp; &nbsp;0.0013<br />&nbsp;numchron &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0.127746 &nbsp; &nbsp; &nbsp; &nbsp;0.033867 &nbsp; &nbsp; &nbsp; 3.77 &nbsp; &nbsp; &nbsp;0.0002<br />&nbsp;age &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; -0.024359 &nbsp; &nbsp; &nbsp; &nbsp;0.068806 &nbsp; &nbsp; &nbsp;-0.35 &nbsp; &nbsp; &nbsp;0.7233<br />&nbsp;male &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;-0.059629 &nbsp; &nbsp; &nbsp; &nbsp;0.099133 &nbsp; &nbsp; &nbsp;-0.60 &nbsp; &nbsp; &nbsp;0.5475<br />&nbsp;school &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;-0.012473 &nbsp; &nbsp; &nbsp; &nbsp;0.013520 &nbsp; &nbsp; &nbsp;-0.92 &nbsp; &nbsp; &nbsp;0.3562<br />&nbsp;privins &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0.229208 &nbsp; &nbsp; &nbsp; &nbsp;0.114004 &nbsp; &nbsp; &nbsp; 2.01 &nbsp; &nbsp; &nbsp;0.0444<br />&nbsp;Inf_Intercept &nbsp; &nbsp; &nbsp; &nbsp;4.265976 &nbsp; &nbsp; &nbsp; &nbsp;0.971218 &nbsp; &nbsp; &nbsp; 4.39 &nbsp; &nbsp; &nbsp;&lt;.0001<br />&nbsp;Inf_exclhlth &nbsp; &nbsp; &nbsp; &nbsp;-0.369944 &nbsp; &nbsp; &nbsp; &nbsp;0.717395 &nbsp; &nbsp; &nbsp;-0.52 &nbsp; &nbsp; &nbsp;0.6061<br />&nbsp;Inf_poorhlth &nbsp; &nbsp; &nbsp; &nbsp;-0.589745 &nbsp; &nbsp; &nbsp; &nbsp;0.195174 &nbsp; &nbsp; &nbsp;-3.02 &nbsp; &nbsp; &nbsp;0.0025<br />&nbsp;Inf_numchron &nbsp; &nbsp; &nbsp; &nbsp;-0.280116 &nbsp; &nbsp; &nbsp; &nbsp;0.062396 &nbsp; &nbsp; &nbsp;-4.49 &nbsp; &nbsp; &nbsp;&lt;.0001<br />&nbsp;Inf_age &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; -0.405962 &nbsp; &nbsp; &nbsp; &nbsp;0.119765 &nbsp; &nbsp; &nbsp;-3.39 &nbsp; &nbsp; &nbsp;0.0007<br />&nbsp;Inf_male &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;-0.334773 &nbsp; &nbsp; &nbsp; &nbsp;0.162429 &nbsp; &nbsp; &nbsp;-2.06 &nbsp; &nbsp; &nbsp;0.0393<br />&nbsp;Inf_school &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;-0.019390 &nbsp; &nbsp; &nbsp; &nbsp;0.022126 &nbsp; &nbsp; &nbsp;-0.88 &nbsp; &nbsp; &nbsp;0.3808<br />&nbsp;Inf_privins &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0.224859 &nbsp; &nbsp; &nbsp; &nbsp;0.196133 &nbsp; &nbsp; &nbsp; 1.15 &nbsp; &nbsp; &nbsp;0.2516</code><br /><br /><br />觀測機率與預測機率比較圖:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://www.diigo.com/item/t/2043829_111682382_6438671" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="344" src="http://www.diigo.com/item/t/2043829_111682382_6438671" width="640" /></a></div><br />從上圖看來 ZIP 模型的表現也不錯!<br /><br /><b><span style="font-size: large;">。ZIP-tau 模型</span></b><br />這是上面 ZIP 模型的延伸。他的表現比 ZIP 模型好一點點,因為 AIC 比較小,但也沒有差太多。程式如下:<br /><code>/* METHOD 1: PROC NLMIXED &nbsp;*/<br />proc nlmixed data = tblNMES;<br />&nbsp; parms b0 = 0 b1 = 0 b2 = 0 b3 = 0 b4 = 0 b5 = 0 b6 = 0 b7 = 0 tau = 1;<br />&nbsp; eta0 = tau * (b0 + b1 * EXCLHLTH + b2 * POORHLTH + b3 * NUMCHRON + b4 * AGE + <br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; b5 * MALE + b6 * SCHOOL + b7 * PRIVINS);<br />&nbsp; exp_eta0 = exp(eta0);<br />&nbsp; p0 = exp_eta0 / (1 + exp_eta0);<br />&nbsp; etap = b0 + b1 * EXCLHLTH + b2 * POORHLTH + b3 * NUMCHRON + b4 * AGE + <br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;b5 * MALE + b6 * SCHOOL + b7 * PRIVINS;<br />&nbsp; exp_etap = exp(etap);<br />&nbsp; if HOSP = 0 then ll = log(p0 + (1 - p0) * exp(-exp_etap));<br />&nbsp; else ll = log(1 - p0) + HOSP * etap - exp_etap - lgamma(HOSP + 1);<br />&nbsp; model HOSP ~ general(ll);<br />&nbsp; predict exp_etap out = zip_out1 (keep = pred HOSP rename = (pred = Yhat));<br />&nbsp; predict p0 out = zip_out2 (keep = pred rename = (pred = p0));<br />run; </code><br /><br /><br />報表:<br /><code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Fit Statistics<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;-2 Log Likelihood &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 5768.7<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;AIC (smaller is better) &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 5786.7<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;AICC (smaller is better) &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;5786.7<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;BIC (smaller is better) &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 5844.2<br /><br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Parameter Estimates<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Standard<br />&nbsp;Parameter &nbsp; Estimate &nbsp; &nbsp; &nbsp;Error &nbsp; &nbsp; DF &nbsp; t Value &nbsp; Pr &gt; |t| &nbsp; &nbsp;Alpha<br />&nbsp;b0 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; -1.3944 &nbsp; &nbsp; 0.2698 &nbsp; 4406 &nbsp; &nbsp; -5.17 &nbsp; &nbsp; &lt;.0001 &nbsp; &nbsp; 0.05<br />&nbsp;b1 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; -0.2685 &nbsp; &nbsp;0.09606 &nbsp; 4406 &nbsp; &nbsp; -2.80 &nbsp; &nbsp; 0.0052 &nbsp; &nbsp; 0.05<br />&nbsp;b2 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0.3223 &nbsp; &nbsp;0.05980 &nbsp; 4406 &nbsp; &nbsp; &nbsp;5.39 &nbsp; &nbsp; &lt;.0001 &nbsp; &nbsp; 0.05<br />&nbsp;b3 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0.1391 &nbsp; &nbsp;0.02195 &nbsp; 4406 &nbsp; &nbsp; &nbsp;6.34 &nbsp; &nbsp; &lt;.0001 &nbsp; &nbsp; 0.05<br />&nbsp;b4 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0.1040 &nbsp; &nbsp;0.02789 &nbsp; 4406 &nbsp; &nbsp; &nbsp;3.73 &nbsp; &nbsp; 0.0002 &nbsp; &nbsp; 0.05<br />&nbsp;b5 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0.07254 &nbsp; &nbsp;0.03383 &nbsp; 4406 &nbsp; &nbsp; &nbsp;2.14 &nbsp; &nbsp; 0.0321 &nbsp; &nbsp; 0.05<br />&nbsp;b6 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;-0.00039 &nbsp; 0.004641 &nbsp; 4406 &nbsp; &nbsp; -0.08 &nbsp; &nbsp; 0.9331 &nbsp; &nbsp; 0.05<br />&nbsp;b7 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0.04292 &nbsp; &nbsp;0.04216 &nbsp; 4406 &nbsp; &nbsp; &nbsp;1.02 &nbsp; &nbsp; 0.3087 &nbsp; &nbsp; 0.05<br />&nbsp;tau &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;-1.8406 &nbsp; &nbsp; 0.4585 &nbsp; 4406 &nbsp; &nbsp; -4.01 &nbsp; &nbsp; &lt;.0001 &nbsp; &nbsp; 0.05</code><br /><br />觀測機率與預測機率比較圖:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://www.diigo.com/item/t/2043829_111682570_6438680" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="348" src="http://www.diigo.com/item/t/2043829_111682570_6438680" width="640" /></a></div><br />另外,作者提到一個名為 Vuong test 的檢定方法,可以提供數值化的證據來比較 ZIP 模型和其他模型的差異。最主要的用意是,我們已經可以從數個比較圖發現,無論是負二項模型、跨欄模型還是 ZIP 模型,表現的都比卜瓦松模型要好,但如果有個比較科學的證據來證明,而不是主觀用圖形來判定的話,會比較能夠信服。Vuong test的算法如下:<br /><br /><div style="text-align: center;"><a href="http://www.codecogs.com/eqnedit.php?latex=\large%20m_{i}=log(\frac{P_{1}(Y_{i}|X_{i})}{P_{2}(Y_{i}|X_{i})})" target="_blank"><img src="http://latex.codecogs.com/gif.latex?\large m_{i}=log(\frac{P_{1}(Y_{i}|X_{i})}{P_{2}(Y_{i}|X_{i})})" title="\large m_{i}=log(\frac{P_{1}(Y_{i}|X_{i})}{P_{2}(Y_{i}|X_{i})})" /></a></div><br />上式是把每個觀測值用兩個模型都算出預測值出來,然後相除取對數。之後再用這些m值去計算一個V統計量:<br /><br /><div style="text-align: center;"><a href="http://www.codecogs.com/eqnedit.php?latex=\large%20V=\frac{\sqrt{n}(\frac{1}{n}\sum_{i=1}^{n}m_{i})}{\sqrt{\frac{1}{n}\sum_{i=1}^{n}(m_{i}-\bar{m})^{2}}}" target="_blank"><img src="http://latex.codecogs.com/gif.latex?\large V=\frac{\sqrt{n}(\frac{1}{n}\sum_{i=1}^{n}m_{i})}{\sqrt{\frac{1}{n}\sum_{i=1}^{n}(m_{i}-\bar{m})^{2}}}" title="\large V=\frac{\sqrt{n}(\frac{1}{n}\sum_{i=1}^{n}m_{i})}{\sqrt{\frac{1}{n}\sum_{i=1}^{n}(m_{i}-\bar{m})^{2}}}" /></a></div><br />當 V &gt; 1.96 時,就表示第一個模型比較好,當 V &lt; -1.96 時,就表示第二個模型比較好。若介於 -1.96 和 1.96 之間,就表示兩模型沒有顯著差異。以卜瓦松模型和ZIP模型比較為例,其SAS程式如下:<br /><code>data poi_pred (keep = poi_prob); <br />&nbsp; set poi_out; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; /* OUTPUT FROM POISSON REGRESSION */<br />&nbsp; do i = 0 to 8;<br />&nbsp; &nbsp; poi_prob = pdf('poisson', i , Yhat);<br />&nbsp; &nbsp; if hosp = i then output;<br />&nbsp; end;<br />run;<br /><br />data zip_pred (keep = zip_prob);<br />&nbsp; merge zip_out1 zip_out2; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; /* OUTPUT FROM ZIP REGRESSION */<br />&nbsp; do i = 0 to 8;<br />&nbsp; &nbsp; if i = 0 then zip_prob = p0 + (1 - p0) * pdf('poisson', i, Yhat);<br />&nbsp; &nbsp; else zip_prob = (1 - p0) * pdf('poisson', i, Yhat);<br />&nbsp; &nbsp; if hosp = i then output;<br />&nbsp; end;<br />run;<br /><br />data compare;<br />&nbsp; merge poi_pred zip_pred;<br />&nbsp; m = log(zip_prob / poi_prob);<br />run;<br /><br />proc sql;<br />select<br />&nbsp; mean(m) &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; as mbar,<br />&nbsp; std(m) &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;as s,<br />&nbsp; sqrt(count(*)) * mean(m) / std(m) as v<br />from<br />&nbsp; compare;<br />quit;</code><br /><br />報表:<br /><code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; mbar &nbsp; &nbsp; &nbsp; &nbsp; s &nbsp; &nbsp; &nbsp; &nbsp; v<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0.038138 &nbsp;0.375956 &nbsp;<span style="color: red;">6.733444 </span></code><br /><br />結果顯示 V 統計量為 6.73 大於 1.96,所以表示 ZIP 模型比卜瓦松模型要來的好。<br /><br /><b><span style="font-size: large;">。比較</span></b><br />把所有估計參數值並排在一起比較:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://www.diigo.com/item/t/2043829_111685699_6438913" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="268" src="http://www.diigo.com/item/t/2043829_111685699_6438913" width="640" /></a></div><br />從 AIC 這個最直接的證據來看,都可以顯示出直接用卜瓦松模型去估計參數是最不適合的。而負二項模型擁有最小的 AIC 值,所以最好。<br /><b><br /></b><br /><b>CONTACT INFORMATION&nbsp;</b><br />Wensui Liu, Statistical Project Manager<br />ChoicePoint Precision Marketing, Alpharetta, GA<br />Email: <!-- e --><a href="mailto:wensui.liu@choicepoint.com">wensui.liu@choicepoint.com</a><!-- e --><br /><br />Jimmy Cela, AVP<br />ChoicePoint Precision Marketing, Alpharetta, GA<br />Email: <!-- e --><a href="mailto:jimmy.cela@choicepoint.com">jimmy.cela@choicepoint.com</a><!-- e --><div class="blogger-post-footer"><img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6268919072942670865-7910254849827852366?l=sugiclub.blogspot.com' alt='' /></div>




欢迎光临 SAS中文论坛 (http://www.mysas.net/forum/) Powered by Discuz! X3.2