*Kelsey Drotning *2.20.20 *Stata do file *ATUS2018 to AHTUS Harmonization File version 15.1 capture log close clear all //2018 ATUS respondent file from BLS cd "/Users/kelseydrotning/Box Sync/GA Assignments/IPUMS GA/Harmonization/2018 AHTUS files/atusresp-2018" import delimited "atusresp_2018.dat" svyset, clear keep tucaseid tuyear tumonth tudiarydate tudiaryday tufinlwgt sum tuyear tumonth tudiarydate tudiaryday rename tucaseid pid rename tuyear year rename tumonth month rename tudiaryday diaryday tostring tudiarydate, gen(tudiarydatestr) //extract calendar day from date variable gen cday = substr(tudiarydatestr, 7, 2) tab cday destring cday, replace sort pid keep pid diaryday cday month year tufinlwgt cd "/Users/kelseydrotning/Box Sync/GA Assignments/IPUMS GA/Harmonization" save datewght, replace clear all //2018 ATUS roster file from BLS cd "/Users/kelseydrotning/Box Sync/GA Assignments/IPUMS GA/Harmonization/2018 AHTUS files/atusrost-2018" import delimited "atusrost_2018.dat" drop if terrp > 19 rename tucaseid pid rename teage age rename tesex sex sum sex desc age sort pid keep pid sex age cd "/Users/kelseydrotning/Box Sync/GA Assignments/IPUMS GA/Harmonization" save dem, replace clear all //2018 ATUS roster file from BLS, create flags for children cd "/Users/kelseydrotning/Box Sync/GA Assignments/IPUMS GA/Harmonization/2018 AHTUS files/atusrost-2018" import delimited "atusrost_2018.dat" *Check for missing age or relationship tab teage, miss tab terrp, miss mdesc gen u5=0 replace u5=1 if teage >-1 & teage<5 gen u18=0 replace u18=1 if teage >-1 & teage<18 sum u5 u18 tab u5 u18 rename tucaseid pid sort pid tulineno keep pid tulineno teage u5 u18 cd "/Users/kelseydrotning/Box Sync/GA Assignments/IPUMS GA/Harmonization" save kidflag, replace clear all //Make who else present variables *First bring in 2018 Who file from BLS **# cd "/Users/kelseydrotning/Box Sync/GA Assignments/IPUMS GA/Harmonization/2018 AHTUS files/atuswho-2018" import delimited "atuswho_2018.dat" tab tuwho_code, missing rename tucaseid pid rename tuactivity_n epnum sort pid epnum tulineno *merge with kidflag file cd "/Users/kelseydrotning/Box Sync/GA Assignments/IPUMS GA/Harmonization" merge m:1 pid tulineno using kidflag *note, this creates 3,669 lines that have several missing values. *They are household members who did not participate in an activity *with the respondnet on the diary day. gen alone=0 gen infant=0 //I think this is a poor label. Refers to children under 5. gen child=0 gen sppart=0 gen clsfam=0 gen hhadult=0 gen cowork=0 gen wellknw=0 gen otherp=0 gen unknwp=0 sum alone infant child sppart clsfam hhadult cowork wellknw otherp unknwp //create code for alone replace alone=1 if tuwho_code==18 | tuwho_code==19 tab tuwho_code alone tab3way tuwho_code u5 u18 preserve tempfile alone keep if tuwho_code < 0 sum teage u5 u18 restore *This checks for spouses or respondents who are under 18 so that they are not coded as child. **Probably something to discuss. Shouldn't they still be coded a child if they are the respondent? replace infant=1 if tuwho_code>21 & u5==1 replace child=1 if tuwho_code>21 & u18==1 sum infant child tab child infant tab tuwho_code child tab u18 child tab u5 child *verify that all cases aged <18 without a child code are either the respondent or a spouse preserve tempfile under18 keep if child==0 & u18==1 tab tuwho_code restore //create spouse or unmarried partner present dummy replace sppart=1 if tuwho_code==20 | tuwho_code==21 tab sppart tab tuwho_code sppart //create close family code. See note from CTUR in 2012 harmonization syntax on why we cannot *distinguish between hh close family and non hh close family. replace clsfam=1 if (tuwho_code > 19 & tuwho_code < 26) | tuwho_code==27 tab tuwho_code clsfam //create hh adult. This is not a count of household adults and does not include close family. replace hhadult=1 if tuwho_code==26 | (tuwho_code > 27 & tuwho_code < 40) tab tuwho_code hhadult //create cowork. Note: this includes customers. replace cowork=1 if tuwho_code > 58 tab tuwho_code cowork //create wellknown. Note: this includes acquaintances. replace wellknw=1 if (tuwho_code > 19 & tuwho_code < 57) tab tuwho_code wellknw //create other people code replace otherp=1 if tuwho_code==57 | tuwho_code==58 tab tuwho_code otherp //create unknown, excludes activities where with whom was not asked replace unknwp=1 if trwhona==0 & tuwho_code < 0 tab tuwho_code unknwp sum tuwho_code hhadult cowork wellknw otherp unknwp mdesc sum alone infant child sppart clsfam hhadult cowork wellknw otherp unknwp sum pid epnum mdesc //collapse data across observations collapse (max) alone infant child sppart clsfam hhadult cowork wellknw otherp unknwp, by (pid epnum) sum alone infant child sppart clsfam hhadult cowork wellknw otherp unknwp sum pid epnum mdesc cd "/Users/kelseydrotning/Box Sync/GA Assignments/IPUMS GA/Harmonization" save addwho, replace clear all //2018 ATUS Activity file from BLS -- set up start, stop, and duration cd "/Users/kelseydrotning/Box Sync/GA Assignments/IPUMS GA/Harmonization/2018 AHTUS files/atusact-2018" import delimited "atusact_2018.dat" keep tucaseid tuactivity_n tustarttim tustoptime sort tucaseid tuactivity_n //This part is a lot different than the SPSS code, so check carefully. See command line 210 in CTUR file. rename tucaseid pid rename tuactivity_n epnum gen sthr = substr(tustarttim, 1, 2) gen stmin = substr(tustarttim, 4, 2) gen stsec = substr(tustarttim, 7, 2) gen finhr = substr(tustoptime, 1, 2) gen finmin = substr(tustoptime, 4, 2) gen finsec = substr(tustoptime, 7, 2) destring sthr stmin finhr finmin, replace sort pid epnum gen clockst = (sthr*100) + stmin tab clockst //edit clock to match previous diaries. See CTUR syntax. gen cst=. replace cst=clockst-400 if sthr > 3 replace cst=clockst+2000 if sthr < 4 gen cend=. replace cend = (finhr*100 + finmin) - 400 if finhr > 3 replace cend = (finhr*100 + finmin) + 2000 if finhr < 4 //confirm that there were 9,592 diaries recorded. It worked! egen maxep = max(epnum), by(pid) gen lastep=0 replace lastep=1 if epnum==maxep tab lastep replace cend=2400 if lastep==1 sum cend cst sort pid epnum //I think this is computing duration gen start = trunc(cst/100)*60 + (cst - (trunc(cst/100))*100) gen end = trunc(cend/100)*60 + (cend - (trunc(cend/100))*100) gen time = end - start sum start end time cd "/Users/kelseydrotning/Box Sync/GA Assignments/IPUMS GA/Harmonization" keep pid epnum clockst start end time maxep lastep save sstime, replace //TEST preserve tempfile test keep if lastep==1 tab end //all cases should show 1440 restore preserve tempfile test2 keep if epnum==1 tab start //all cases should show 0 restore clear all //2018 ATUS Activity file from BLS -- process the activity file cd "/Users/kelseydrotning/Box Sync/GA Assignments/IPUMS GA/Harmonization/2018 AHTUS files/atusact-2018" import delimited "atusact_2018.dat" mdesc //no missing values, install mdesc package if not already installed. rename tucaseid pid rename tuactivity_n epnum //generate episode details variables gen main=-5 gen sec=0 gen inout=-5 gen eloc=-5 gen mtrav=-5 gen survey=7 gen wave=15 //this doesn't appear to be in IPUMS except for 1975? cd "/Users/kelseydrotning/Box Sync/GA Assignments/IPUMS GA/Harmonization" merge m:1 pid epnum using sstime, gen(_sstime) mdesc merge m:1 pid using datewght, gen(_datewght) mdesc merge m:1 pid epnum using addwho, gen(_addwho) mdesc //diagnosing problem, when the addwho data are added, creates 2,386 missing cases. //problem is because in the who file, there are people who are in the household, but were not present during an activity. /*preserve tempfile miss keep if _addwho==2 tab epnum tab alone*/ save merge_1, replace **# ******************************************************************************** cd "/Users/kelseydrotning/Box Sync/GA Assignments/IPUMS GA/Harmonization" use merge_1 keep if epnum==. save xtrahhmembers, replace clear all cd "/Users/kelseydrotning/Box Sync/GA Assignments/IPUMS GA/Harmonization" use merge_1 tab epnum, miss drop if epnum==. //There are 2,386 lines where everything is missing. //this is because there are household members who were not present during an episode. //not sure what to do with them yet. But I think they are messing up the edits of locations. //Putting them in seperate file in case they are needed later. tab trcode tewhere //note, in CTUR file this is actcode. Don't see where the recode trcode to actcode. preserve tempfile tewhere keep if tewhere==-2 | tewhere==-1 | tewhere==89 //don't know, blank, unspecified tab trcode //all activity codes that list location as -2, -1, or 89 should be kept next, n=35,809 restore preserve tempfile act keep if trcode==10101 | trcode==10102 | trcode==10199 | trcode==10201 | trcode==10299 | /// trcode==10401 | trcode==10499 | trcode==30111 | trcode==30112 | trcode==30503 | /// trcode==30504 | trcode==40111 | trcode==40112 | trcode==40507 | trcode==40508 | /// trcode==120301 | trcode==130131 | trcode==140102 | trcode==500101 | trcode==500105 | /// trcode==500106 tab trcode //n=43,333 restore *From CTUR: all instances of sleep (10101); sleeplessnes (10102); sleeping (nec) (10199) *washing & dressing (10201); grooming nec (10299); personal care (10401); *personal care nec (10499), 500106 (do not remember), and 500105 *(none of your business) have no recorded location. Recode the *location of these activities (except for the do not remember) *as the location where they are before starting the activity (unless *this is travelling) or as the next location unless the next *location is travelling or the end of the diary. egen newid = group(pid) *Following code is targeting each diaries last episode //there are 9,593 diaries. In the CTUR code, this sets the last episodes are coded -5 for location and activity gen negepnum=-epnum //this is to sort epnum in descending order instead of ascending tsset pid negepnum gen trcode2=trcode //just to be able to clearly see the lag shift gen nextact=-5 gen tewhere2=tewhere gen nextloc=-5 replace nextact=l.trcode if pid==l.pid replace nextloc=l.tewhere if pid==l.pid tab nextact nextloc preserve tempfile nextact keep if nextact==-5 tab lastep restore sort pid epnum gen next=-5 replace next=5 if lastep==1 replace next=1 if nextloc==-1 | nextloc==89 replace next=3 if nextact>500000 & ((nextloc>0 & nextloc<12) | nextloc==30 | nextloc==31 | nextloc==32) replace next=4 if nextact<180000 & ((nextloc>0 & nextloc<12) | nextloc==30 | nextloc==31 | nextloc==32) replace next=2 if ((nextloc>11 & nextloc<30) | nextloc==99 | (nextact>17999 & nextact < 500101) | nextact==500103) #delimit ; label def next 1 "next still misisng" 2 "travel next" 3 "next act missing but next location known" 4 "next act known and not travel and location known" 5 "end of diary and missing location" ; #delimit cr label val next next tab next //2,386 lines still say -5. These are the hh members who were not included in a diary activity. *Following code is targeting each diaries first episode. Each episode 1 should be prevact=-5 gen trcode3=trcode gen prevact=-5 gen tewhere3=tewhere gen prevloc=-5 tsset, clear sort pid epnum tsset pid epnum replace prevact=l.trcode if pid==l.pid replace prevloc=l.tewhere if pid==l.pid tab prevact prevloc //format trcode(f4.0) is supposed to limit to four decimal places but this doesn't really apply to activity codes? preserve tempfile epnum keep if epnum==1 tab prevact restore gen prev=-5 replace prev=5 if epnum==1 replace prev=1 if prevloc==-1 | prevloc==89 replace prev=3 if prevact>500000 & ((prevloc>0 & prevloc<12) | prevloc==30 | prevloc==31 | prevloc==32) replace prev=4 if prevact<180000 & ((prevloc>0 & prevloc<12) | prevloc==30 | prevloc==31 | prevloc==32) replace prev=2 if ((prevloc>11 & prevloc<30) | prevloc==99 | (prevact>17999 & prevact < 500101) | prevact==500103) #delimit ; label def prev 1 "next still missing" 2 "travel next" 3 "next act missing but next location known" 4 "next act known and not travel and location known" 5 "start of diary and missing location" ; #delimit cr label val prev prev tab prev //2,387 lines say -5. preserve tempfile laststep keep if lastep==1 | epnum==1 tab tewhere restore *From CTUR: most of the first and last episodes have missing location. *Seems to still be true for 2018 *****Now, going to use transformations to create complete location coding******* *From CTUR (ln 446): Where will be used to look at the transformations to ensure the adjustments *work as expected. Before I start: * 35,262 (19.15% or cases) = -1 * 4 cases = -2 * 543 (.29% of cases) = 89 * 0 cases = 99 gen where=tewhere gen where2=tewhere tab where *location is misisng, activity is sleeping/napping. impute location as previous/next location. *CTUR line 452 tab where if pid==l.pid & (prev==3 | prev==4) & (trcode==10101 | trcode==10102 | trcode==10201 | trcode==10299 | trcode==10401 | trcode==10499 | trcode==500106) //552 cases replace where=prevloc if pid==l.pid & (prev==3 | prev==4) & (trcode==10101 | trcode==10102 | trcode==10201 | trcode==10299 | trcode==10401 | trcode==10499 | trcode==500106) tab where if pid==l.pid & (next==3 | next==4) & (trcode==10101 | trcode==10102 | trcode==10201 | trcode==10299 | trcode==10401 | trcode==10499 | trcode==500106) //321 cases replace where=nextloc if pid==l.pid & (next==3 | next==4) & (trcode==10101 | trcode==10102 | trcode==10201 | trcode==10299 | trcode==10401 | trcode==10499 | trcode==500106) tab where tab tewhere where *34,411 (18.69% of cases)=-1, 543 (0.3% of cases)=89 gen eptest=3 replace eptest=1 if epnum==1 replace eptest=2 if epnum==2 replace eptest=4 if epnum==maxep - 1 replace eptest=5 if lastep==1 replace eptest=. if epnum==. label def eptest 1 "first episode" 2 "second episode" 3 "middle episode" 4 "next from last episode" 5 "last episode" label val eptest eptest tab eptest preserve tempfile where keep if where==-1 | where==89 tab eptest restore //of those locations which remain missing, this is a lot different than 2012. //25.58% are the first episode //8.70% are second episode //33.09% are a middle episode //6.93% are the penultimate episode //25.70% are the last episode. /*format lastep to eptest (f4.0) [spss]*/ //not sure why this is here gen prev2=0 replace prev2=l.where if where==-1 replace prev2=-5 if where==-1 & _n==1 tsset, clear tsset pid negepnum sort pid negepnum gen next2=0 replace next2=l.where if where==-1 replace next2=-5 if where==-1 & _n==1 tab next2 tsset, clear tsset pid epnum sort pid epnum //pay close attention to ascending vs. descending *From CTUR: second filling-in iteration - where there have been two slots in *a row with the location missing, fill in the location as the *previous or next location so long as the previous or next location *not travelling. tab where if pid==l.pid & (where==-1 | where==89) & ((prev2>0 & prev2<12) | prev2==30 | prev2==31 | prev2==32) //16,925 cases replace where=prev2 if pid==l.pid & (where==-1 | where==89) & ((prev2>0 & prev2<12) | prev2==30 | prev2==31 | prev2==32) tab where if (pid==l.pid | epnum==1) & (where==-1 | where==89) & ((next2>0 & next2<12) | next2==30 | next2==31 | next2==32) replace where=next2 if (pid==l.pid | epnum==1) & (where==-1 | where==89) & ((next2>0 & next2<12) | next2==30 | next2==31 | next2==32) tab where where2 tab where //now 8,375 cases (4.55%) = -1; 543 (0.3%) = 89 gen prev3=0 replace prev3=l.where if where==-1 replace prev3=-5 if where==-1 & _n==1 tab prev3 tsset, clear tsset pid negepnum sort pid negepnum gen next3=0 replace next3=l.where if where==-1 replace next3=-5 if where==-1 & _n==1 tab next3 tsset, clear tsset pid epnum sort pid epnum replace where=prev3 if pid==l.pid & (where==-1 | where==89) & ((prev3>0 & prev3<12) | prev2==30 | prev3==31 | prev3==32) replace where=next3 if (pid==l.pid | epnum==1) & (where==-1 | where==89) & ((next3>0 & next3<12) | next3==30 | next3==31 | next3==32) tab where where2 tab where //4,121 cases (2.24%) = -1; 543 (0.03%) = 89 preserve tempfile eptest keep if where==-1 | where==89 tab eptest restore //31.84% first ep, 20.37% second ep, 23.16% middle ep, 6.58% pen ep, 18.05% last ep gen test=0 replace test=1 if lastep==1 & where==-1 tab test preserve tempfile test keep if epnum==2 & where==-1 tab3way prevact nextact nextloc //not sure why this is helpful restore gen mark=0 //CTUR Syntax, line 579 replace mark=1 if epnum==2 & trcode==10201 & prevact <10405 & (nextact>130103 & next<190000) label def mark 1 "marker of sleeep then wash then cycle/walk or travel" label val mark mark gen mark2=0 replace mark2=1 if lastep==1 & (prevact>180000 & prevact<500000) & where==-1 & trcode<10500 tab mark mark2 gen locs=0 gen locf=0 replace locs=where if epnum==1 replace locf=where if lastep==1 **# cd "/Users/kelseydrotning/Box Sync/GA Assignments/IPUMS GA/Harmonization" save temploc, replace //just in case the egen feature doesn't work the same way as the collapse in CTUR syntax use temploc egen pattern1 = max(mark), by(pid) egen pattern2 = max(mark2), by(pid) egen locstart = max(locs), by(pid) egen locend = max(locf), by(pid) tab3way pattern1 locend pattern2 tab pattern2 locstart replace where=1 if pattern2==1 & lastep==1 & (where==-1 | where==89) & locstart==1 replace where=1 if pattern1==1 & epnum==1 & (where==-1 | where==89) & locend==1 replace where=1 if pattern1==1 & epnum==2 & (where==-1 | where==89) & locend==1 tab where //now 2460 (1.34%) cases have location = -1 and 543 are not remembered. tsset, clear tsset pid negepnum sort pid negepnum capture drop nextloc gen nextloc=-5 replace nextloc=l.where if pid==l.pid tsset, clear tsset pid epnum sort pid epnum cd "/Users/kelseydrotning/Box Sync/GA Assignments/IPUMS GA/Harmonization" save tempfile1, replace clear all **# cd "/Users/kelseydrotning/Box Sync/GA Assignments/IPUMS GA/Harmonization" use tempfile1 tab where capture drop next gen next=-5 replace next=5 if lastep==1 replace next=1 if nextloc==-1 | nextloc==89 replace next=3 if nextact>500000 & ((nextloc>0 & nextloc<12) | nextloc==30 | nextloc==31 | nextloc==32) replace next=4 if nextact<180000 & ((nextloc>0 & nextloc<12) | nextloc==30 | nextloc==31 | nextloc==32) replace next=2 if ((nextloc>11 & nextloc<30) | nextloc==99 | (nextact>17999 & nextact < 500101) | nextact==500103) /* #delimit ; label def next 1 "next still misisng" 2 "travel next" 3 "next act missing but next location known" 4 "next act known and not travel and location known" 5 "end of diary and missing location" ; #delimit cr */ label val next next tab next tab where capture drop prevloc gen prevloc=-5 replace prevloc=l.tewhere if pid==l.pid capture drop prev gen prev=-5 replace prev=5 if epnum==1 replace prev=1 if prevloc==-1 | prevloc==89 replace prev=3 if prevact>500000 & ((prevloc>0 & prevloc<12) | prevloc==30 | prevloc==31 | prevloc==32) replace prev=4 if prevact<180000 & ((prevloc>0 & prevloc<12) | prevloc==30 | prevloc==31 | prevloc==32) replace prev=2 if ((prevloc>11 & prevloc<30) | prevloc==99 | (prevact>17999 & prevact < 500101) | prevact==500103) /* #delimit ; label def prev 1 "previous still missing" 2 "previous travel" 3 "previous act missing but previous location known" 4 "previous act known and not travel and location known" 5 "start of diary and missing location" ; */ label val prev prev tab prev tab where replace where=prevloc if pid==l.pid & (trcode==10101 | trcode==10102 | /// trcode==10201 | trcode==10299 | trcode==10401 | trcode==10499 | trcode==500106) /// & (prev==3 | prev==4) replace where=prevloc if pid==l.pid & (trcode==10101 | trcode==10102 | /// trcode==10201 | trcode==10299 | trcode==10401 | trcode==10499 | trcode==500106) /// & (next==3 | next==4) tab where //this did not change the number of = -1 cases capture drop prev2 gen prev2=0 tsset, clear tsset pid negepnum sort pid negepnum replace prev2=l.where if where==-1 replace prev2=-5 if where==-1 & _n==1 capture drop next2 gen next2=0 tsset, clear tsset pid epnum sort pid epnum replace next2=l.where if where==-1 replace next=-5 if where==-1 & _n==1 replace where=prev2 if pid==l.pid & (where==-1 | where==89) & /// ((prev2>0 & prev2<12) | prev2==30 | prev2==31 | prev2==32) replace where=next2 if (pid==l.pid | epnum==1) & (where==-1 | /// where==89) & ((next2>0 & next2<12) | next2==31 | next==31 | next2==32) tab where //2,057 cases==-1 (1.12 percent) preserve tempfile eptest1 keep if where==89 tab eptest restore //4 second episode diares are don't remember, the rest (539) are middle preserve tempfile eptest2 keep if where==-1 tab eptest restore preserve tempfile look sum prevact trcode nextact prevloc nextloc restore ********seems like this section may be specific to patterns in 2012************* *going to replicate for now, but may need to be adjusted later //cases with location not reported vary, but least for second and penultimate cases //From CTUR: *one common pattern here is travel for child care or adult help(own household & *other household), followed by drop-off or pick up adult/child, followed *by adult help or child care travel. These cases are not at home or at the *workplace, unlikely to be school as school runs are more likely to be *remembered. The AHTUS location codes are less detailed than the ATUS *codes, and the location is going to be away from home and in transit, *so these locations set to other. replace where=11 if where==89 & (prevact>180000 & prevact<190000) & /// (trcode==30111 | trcode==30112 | trcode==30503 | trcode==30504 | trcode==40111 /// | trcode==40112 | trcode==40507 | trcode==40508 ) & (nextact>180000 & nextact<190000) preserve tempfile walking keep if where==89 & trcode==130131 tab prevloc nextloc restore *From CTUR: *nearly half of these cases are walks returning to the same place (more home than *elsewhere), or tansition walks between home or other home and elsewhere. Code such *cases as outdoors away from home. Other cases in between forms of transport - and *walking could be inside, but are not at home or specified place, so code these as *other place. replace where=9 if where==89 & trcode==130131 & ((prevloc> -1 & prevloc<12) | /// (nextloc> -1 & nextloc<12)) replace where=11 if where==89 & trcode==130131 tab where *74 cases now == 89 and 2,057==-1 *From CTUR: *these remaining 61 cases all involve waiting for or dropping off an adult *or child, so set these to other location on same principle as the previous *transformation. replace where=11 if where==89 & (trcode==30111 | trcode==30112 | trcode==30504 /// | trcode==4-112 | trcode==40507) preserve tempfile waiting tab prevloc nextloc restore capture drop test gen test=0 replace test=1 if epnum==1 & where==-1 replace test=l.test+1 if where==-1 & pid==l.pid tab test tab where preserve tempfile four keep if test>3 tab newid restore //11 cases with missing data in 4 or more strings. 328, 1050, 1538, 1809, 2291, 3147, 3861, 5616, 5801, 6253, 8442 //CTUR practice was to visually examine these diaries and fill them in based on what you see. Going to come back to this. save missingstrings, replace //extra saved file to be able to return to this point **# use missingstrings replace where=prevloc if where==-1 & (prevloc>-1 & prevloc<12) & /// (nextloc>-1 & nextloc <12) & prevloc==nextloc //this doesn't change anything preserve tempfile locend keep if where==-1 & epnum<4 tab locstart locend restore //I think this is another place we may want to make edits. All locations for first three episodes coded at home if diary ended at home. replace where=1 if where==-1 & epnum<4 & locend==1 tab where //Now N=1,541 (0.84%) for -1 and N=26 for 89 preserve tempfile negone keep if where==-1 sum prevact nextact trcode restore capture drop test gen test=0 replace test=1 if (trcode==10101 | trcode==10102) & where==-1 preserve keep if test==1 tab epnum lastep restore //673 (252+421) out of 802 cases where sleep remains in missing location are the // first and laast episodes of the day. From CTUR: Visually look at cases. capture drop mark mark2 locs locf gen mark=0 gen mark2=0 gen locs=0 gen locf=0 replace locs=where if epnum==1 replace locf=where if lastep==1 replace mark=1 if where==1 replace mark2=3 if where==3 //where = 3 is someone else's home tab mark2 //substitution for the Aggregate command, line 844 in CTUR code egen anyhome = max(mark), by(pid) egen anyohome = max(mark2), by(pid) egen locstart2 = max(locs), by(pid) egen locend2 = max(locf), by(pid) preserve tempfile test tab locstart2 locend2 restore *From CTUR: where the person starts or ends the day at home and *the location where they are asleep is unknown at the *other end of the diary day, code the sleep to being *at home. This is probably something we want to modify in the future. replace where=1 if test==1 & epnum==1 & locend2==1 replace where=1 if test==1 & lastep==1 & locstart2==1 tab where //1,286 cases for where = -1 capture drop test gen test=0 replace test=1 if (trcode==10101 | trcode==10102) & where==-1 replace where=3 if test==1 & epnum==1 & locend2==3 replace where=3 if test==1 & lastep==1 & locstart2==3 tab where //1,237 cases for where = -1 capture drop test gen test=0 replace test=1 if (trcode==10101 | trcode==10102) & where==-1 preserve tempfile home keep if test==1 tab anyhome anyohome //checking understanding - if case has both their own home and another home, they spent time in both places restore preserve tempfile test keep if test==1 sum time restore capture drop test2=0 gen test2=0 replace test2=1 if time<61 & test==1 //coding naps as sleep less one hour or less and location unknown tab test2 *From CTUR (ln 887): if the person spends any time at home, assume these sleep and nap *episodes are at home. If no time spent at home but time spent at *someone else's home, then the sleeping location set to other home. replace where=1 if anyhome==1 & test==1 replace where=3 if anyhome==0 & anyohome==3 & test==1 tab where //865 cases have where==-1 capture drop test gen test=0 replace test=1 if (trcode==10101 | trcode==10102) & where==-1 tab test //now only 126 cases of sleep with unidentified location. Code these as "other" location. gen atoth=0 replace atoth=1 if where==9 | where==11 egen anyother = max(atoth), by(pid) egen nolocslp = max(test), by(pid) preserve keep if test==1 tab anyother nolocslp restore replace where=11 if where==-1 & anyother==1 tab where //375 cases have where==-1 preserve keep if where==-1 tab prevact tab nextact tab trcode restore //41 cases of sleep with missing location preserve keep if where==-1 tab time restore preserve keep if test==1 tab pid restore *save data before filtering out other cases. //doesn't say whaat cases they are filtering out though so not sure what to do here. //CTUR code line 939 /*select if pid== list each case*/ egen aggtest = max(test), by (pid) sort aggtest pid epnum *sort cases by aggtest pid epnum. *most of the remainder between travel gaps, remainder left missing. cd "/Users/kelseydrotning/Box Sync/GA Assignments/IPUMS GA/Harmonization" save tempfile2, replace *now that the location information filled in wherever possible, *make AHTUS variables. ***************** *AHTUS variables* *****************. cd "/Users/kelseydrotning/Box Sync/GA Assignments/IPUMS GA/Harmonization" use tempfile2 sum newid //verify number of cases tab inout replace inout=1 if where!=18 & where!=19 & (where==9 | where==17 | trcode==20401 | /// trcode==20402 | trcode==20499 | trcode==20501 | trcode==20502 | trcode==20599 | /// trcode==70102 | trcode==40502 | trcode==40105 | trcode==130102 | trcode==130104 | /// trcode==130106 | trcode==130108 | trcode==130110 | trcode==130112 | trcode==130113 | /// trcode==130114 | trcode==130116 | trcode==130118 | trcode==130121 | trcode==130123 | /// trcode==130124 | trcode==130126 | trcode==130127 | trcode==130129 | (where==14 & /// (trcode==30103 | trcode==30105 | trcode==30112 | trcode==30199 | trcode==40109 | /// trcode==40401 | trcode==40199 | trcode==500103 | trcode==500104 | (trcode>180100 & trcode<500101)))) tab inout replace inout=2 if inout==-5 & (where<12 | (where>29 & where<89)) tab inout replace inout=3 if inout==-5 & (where==12 | where==13 | where==15 | where==16 | where==18 | /// where==19 | where==20 | where==21 | where==99 | (trcode==30103 | trcode==40507 | /// trcode==110203 | trcode==150202 | (trcode>180100 & trcode<500101) | trcode==30112 | /// trcode==30503 | trcode==40112)) tab inout replace inout=2 if inout==-5 & (trcode==20101 | trcode==20301 | trcode==20399 | trcode==20104 | /// trcode==150302 | trcode<20000 | trcode==130219 | trcode==130215 | trcode==130207 | /// trcode==130205 | trcode==130119 | trcode==130115 | trcode==130107 | trcode==130105) tab inout replace inout=1 if inout==-5 & where==14 tab inout replace inout=-8 if inout==-5 tab inout preserve keep if inout<0 tab trcode restore replace inout=2 if inout<0 tab inout preserve keep if inout==3 & tewhere==14 tab newid //*If true this gives warning. If not, examine the cases where *the respondent was recorded as walking in a vehicle. newid == 21, newid==8040 restore preserve keep if inout==1 //outside tab trcode tewhere restore preserve keep if inout==2 //inside tab trcode tewhere restore preserve keep if inout==3 //in vehicle tab trcode tewhere restore ****** *eloc* ****** tab eloc replace eloc=1 if where==1 //home replace eloc=2 if where==3 //someone else's home replace eloc=3 if where==2 //work replace eloc=4 if where==8 //school replace eloc=5 if where==6 | where==7 | where==10 | where==30 | where==31 | where==32 //errand type locations replace eloc=6 if where==4 //restaurant replace eloc=7 if where==5 //worship replace eloc=8 if (where>11 & where<22) | (trcode>179999 & trcode<189999) //transportation replace eloc=9 if where==9 | where==11 //outdoors or other replace eloc=-8 if eloc==-5 tab eloc tab where eloc tab eloc inout preserve keep if eloc==-8 & inout==2 tab trcode where restore replace inout=1 if eloc==8 & inout==2 & where==14 replace inout=3 if eloc==8 & inout==2 & where==11 //1 change made preserve keep if eloc==-8 tab trcode restore ******* *mtrav* ******* tab mtrav replace mtrav=1 if where==12 | where==13 | where==19 //in a car or taxi replace mtrav=2 if where==15 | where==16 | where==18 | where==20 //in another vehicle replace mtrav=3 if where==14 | trcode==130131 //walking replace mtrav=4 if where==17 | trcode==130104 //bicycle replace mtrav=5 if mtrav==-5 & (where==21 | inout==3 | eloc==8) //other mode of transportation replace mtrav=-7 if mtrav==-5 tab mtrav tab eloc mtrav tab tewhere mtrav tab mtrav inout *************************** *mtrav and inout checklist* ***************************. *From CTUR: * (1) most cases of mtrav=3(walk) should be inout=1(outside). * (2) mtrav=3(walking) should never be inout=3 (in vehicle). * (3) all mtrav=1(in car) should be inout=3(in vehicle). * (4) vast majority of cycling must be inout=1(outside). * (5) mtrav=5(travel by unknown means) should be inout=3(in a vehicle, almost all of it will be *in a vehicle). preserve keep if mtrav==1 & inout==1 tab trcode //three cases, one is lawn/garden, one is hunting, one is vehicle touring -- I don't agree with these corrections but making them re: CTUR replace inout=3 if mtrav==1 & inout==1 restore preserve keep if mtrav==3 & inout==3 tab trcode where restore //two cases, one is dropping/off picking up a child and one is dropping off/picking up a nhh adult tab mtrav inout replace inout=3 if mtrav==5 & inout==1 // following the code, but not sure why we did this preserve keep if mtrav==-7 & eloc==9 & (trcode>179999 & trcode<=189999) tab trcode tewhere restore preserve keep if eloc==-8 tab3way inout mtrav trcode restore replace eloc=8 if eloc==-8 & inout==3 ******* *child* ******* tab trcode child //this code flags people as having a child if they did an activity that is coded as helping a hh child //not sure about this because it's weird the child isn't included on the roster tab child replace child=1 if child==0 & (trcode==30101 | trcode==30102 | trcode==30103 | /// trcode==30104 | trcode==30105 | trcode==30106 | trcode==30107 | trcode==30109 | /// trcode==30112 | trcode==30199 | trcode==30201 | trcode==30203 | trcode==30299 | /// trcode==30301 | trcode==30302 | trcode==30399 | trcode==40101 | trcode==40102 | /// trcode==40103 | trcode==40104 | trcode==40105 | trcode==40106 | trcode==40107 | /// trcode==40109 | trcode==40112 | trcode==40199 | trcode==40201 | trcode==40203 | /// trcode==40299 | trcode==40301 | trcode==40302 | trcode==40399) tab child ******** *animal* ******** gen animal=0 replace animal=1 if trcode==20601 | trcode==20699 | trcode==80701 | trcode==80702 | /// trcode==90301 | trcode==130110 | trcode==130121 tab animal preserve keep if animal==1 tab trcode restore ********* *shoprof* ********* //not really sure what shoprof is -- seems like it's about purchasing services from other people? //I think the 2012 CTUR code has an error, they had 10103 when they should have 100103 gen shoprof=0 replace shoprof=1 if trcode==30202 | trcode==80201 | trcode==80202 | trcode==80301 | /// trcode==80401 | trcode==80402 | trcode==80501 | trcode==80601 | trcode==80701 | /// trcode==80801 | trcode==80899 | trcode==90101 | trcode==90103 | trcode==90201 | /// trcode==90301 | trcode==90401 | trcode==90501 | trcode==100101 | trcode==100102 | /// trcode==100103 | trcode==100401 | trcode==100499 | (eloc==6 & (trcode==110101 | /// trcode==110199)) | trcode==110202 | trcode==120401 | trcode==120402 | trcode==120403 | /// trcode==120404 | trcode==120405 | trcode==130401 | trcode==130402 | trcode==130499 | /// (eloc>2 & (trcode==70101 | trcode==70103 | trcode==70104 | trcode==70105 | trcode==90102)) tab shoprof drop test test2 save tempfile3, replace ****** *main* ****** **# cd "/Users/kelseydrotning/Box Sync/GA Assignments/IPUMS GA/Harmonization" use tempfile3 tab main //where are 2 and 4? replace main=1 if trcode==500105 | trcode==10401 | trcode==10499 | trcode==10599 | trcode==19999 tab main replace main=3 if trcode==10101 | trcode==10102 tab main replace main=5 if trcode==10199 tab main //only one person in 5 replace main=6 if trcode==10201 | trcode==10299 tab main replace main=7 if trcode==10301 | trcode==10399 | trcode==10501 tab main replace main=8 if trcode==50202 tab main replace main=9 if trcode==110201 | trcode==110203 | trcode==110204 | trcode==119999 | /// (eloc!=6 & (trcode==110101 | trcode==110199 | trcode==110299)) tab main replace main=10 if tewhere!=1 & (trcode==50101 | trcode==50299 | trcode==50205 | /// trcode==50299 | trcode==59999) tab main replace main=11 if tewhere==1 & (trcode==50101 | trcode==50199 | trcode==50299 | /// trcode==59999) tab main replace main=12 if trcode==50102 | trcode==50301 | trcode==50302 | trcode==50303 | /// trcode==50304 | trcode==50399 | trcode==50305 tab main replace main=13 if trcode==50201 | trcode==50203 tab main replace main=14 if trcode==50103 | trcode==50204 | trcode==50104 tab main replace main=15 if (trcode>50400 & trcode<50406) | trcode==50499 | trcode==180504 tab main replace main=16 if trcode==60101 | trcode==60103 | trcode==60104 | trcode==60199 tab main replace main=17 if trcode==60301 | trcode==60302 | trcode==60303 | trcode==60399 tab main replace main=18 if trcode==60102 | trcode==60299 tab main replace main=19 if trcode==60201 | trcode==60202 | trcode==60203 | trcode==60204 | /// trcode==60401 | trcode==60402 | trcode==60403 | trcode==60499 | trcode==69999 | /// trcode==160103 tab main replace main=20 if trcode==20201 | trcode==20202 tab main replace main=21 if trcode==20203 | trcode==20299 tab main replace main=22 if trcode==20101 | trcode==20301 | trcode==20399 | trcode==20401 tab main replace main=23 if trcode==20102 | trcode==20103 tab main replace main=24 if trcode==20302 | trcode==20303 | trcode==20402 | trcode==20499 | /// trcode==20502 | trcode==20701 | trcode==20799 | trcode==20801 | trcode==20899 tab main replace main=25 if trcode==20104 | trcode==20199 | trcode==20901 | trcode==20902 | /// trcode==20905 | trcode==20999 | trcode==29999 tab main replace main=26 if trcode==70101 | trcode==70103 | trcode==70104 | trcode==70105 | /// trcode==90102 tab main replace main=27 if trcode==70102 | trcode==70199 | trcode==70201 | trcode==70299 | /// trcode==70301 | trcode==70399 | trcode==79999 tab main replace main=28 if trcode==80501 | trcode==80502 | trcode==80599 | trcode==160105 tab main replace main=29 if trcode==80401 | trcode==80402 | trcode==80403 | trcode==80499 | /// trcode==80701 | trcode==80702 | trcode==80799 tab main replace main=30 if trcode==90101 | trcode==90103 | trcode==90104 | trcode==90199 | /// trcode==90201 | trcode==90202 | trcode==90299 | trcode==90301 | trcode==90302 | /// trcode==90399 | trcode==90401 | trcode==90402 | trcode==90499 | trcode==90501 | /// trcode==90502 | trcode==90599 | trcode==99999 | trcode==160106 tab main //33 and 34 seem off to me replace main=31 if trcode==80201 | trcode==80202 | trcode==80203 | trcode==80299 | /// trcode==100101 | trcode==100102 | trcode==100103 | trcode==100199 | trcode==100301 | /// trcode==100302 | trcode==100304 | trcode==100399 | trcode==100401 | trcode==100499 | /// trcode==109999 | trcode==160108 | trcode==160104 tab main replace main=32 if trcode==80301 | trcode==80302 | trcode==80399 | trcode==80601 | /// trcode==80602 | trcode==80699 | trcode==80801 | trcode==80899 | trcode==89999 tab main replace main=33 if infant==1 & (trcode==30101 | trcode==30108 | trcode==30109 | /// trcode==30199 | trcode==80101 | trcode==80102 | trcode==80199 | trcode==160107) tab main replace main=34 if main==-5 & (child==1 & (trcode==30101 | trcode==30108 | trcode==30109 | /// trcode==30199 | trcode==80199)) | trcode==80101 | trcode==80102 | trcode==160107 | /// trcode==80199 tab main replace main=34 if main==-5 & trcode==30108 //not sure why this one is here tab main replace main=35 if trcode==30301 | trcode==30302 | trcode==30303 | trcode==30399 tab main replace main=36 if trcode==30103 tab main replace main=37 if trcode==30104 | trcode==30107 | trcode==30201 | trcode==30202 | /// trcode==30203 | trcode==30204 | trcode==30299 tab main replace main=38 if trcode==30102| trcode==30106 tab main replace main=39 if trcode==30110 | trcode==30111 | trcode==30112 | trcode==39999 tab main replace main=40 if (trcode>30400 & trcode<30406) | trcode==30499 | (trcode>30500 /// & trcode<30505) | trcode==30599 | (trcode>40100 & trcode<40105) | (trcode>40105 /// & trcode<40113) | trcode==40199 | (trcode>40200 & trcode<40205) | trcode==40299 /// | (trcode>40300 & trcode<40304) | trcode==40399 | (trcode>40400 & trcode<40406) /// | trcode==40499 | (trcode>40500 & trcode<40509) | trcode==40599 | trcode==49999 tab main //where are 43 - 48? replace main=41 if (trcode>150100 & trcode<150107) | trcode==150199 | (trcode>150200 /// & trcode<150205) | trcode==150299 | trcode==150301 | trcode==150302 | trcode==150399 | /// trcode==150401 | trcode==150402 | trcode==150499 | trcode==150501 | trcode==150599 | /// trcode==150601 | trcode==150602 | trcode==150699 | trcode==150701 | trcode==150799 | /// trcode==150801 | trcode==150899 | trcode==159999 tab main replace main=42 if trcode==100201 | trcode==100299 | trcode==100303 | trcode==100305 tab main replace main=49 if (trcode>140100 & trcode<140106) | trcode==149999 tab main replace main=50 if trcode==120405 | trcode==120499 | trcode==120504 tab main //where is 55, 58, 59? replace main=51 if (trcode>130200 & trcode<130300) | trcode==130302 | trcode==130399 | /// trcode==130402 | trcode==130499 tab main replace main=52 if trcode==120403 tab main replace main=53 if trcode==120401 tab main replace main=54 if trcode==120402 tab main replace main=56 if main==-5 & (trcode==110202 | trcode==110101 | trcode==110199 | trcode==120404 | /// trcode==110299) tab main replace main=57 if trcode==120201 | trcode==120202 | trcode==120299 tab main replace main=60 if (trcode>130100 & trcode<130104) | trcode==130105 | trcode==130107 | /// (trcode>130108 & trcode<130112) | (trcode>130112 & trcode<130116) | trcode==130117 | /// (trcode>130118 & trcode<130131) | (trcode>130131 & trcode<130200) | trcode==130301 | /// trcode==130401 | trcode==139999 tab main //where is 61 replace main=62 if trcode==130131 | (trcode>500000 & tewhere==14) tab main replace main=63 if trcode==130104 tab main replace main=64 if trcode==130108 tab main replace main=65 if trcode==30105 | trcode==40105 tab main replace main=66 if trcode==130106 | trcode==130112 | trcode==130116 | trcode==130118 tab main replace main=67 if trcode==20501 | trcode==20599 tab main replace main=68 if trcode==20601 | trcode==20602 | trcode==20699 tab main //no 70 or 71 or 74 or 79 or 80 replace main=72 if main==-5 & (trcode==120101 | trcode==120199) tab main replace main=73 if main==-5 & (trcode==120307 | trcode==129999) tab main replace main=75 if main==-5 & trcode==120313 tab main replace main=76 if main==-5 & trcode==120309 tab main replace main=77 if main==-5 & (trcode==120310 | trcode==120311) tab main replace main=78 if main==-5 & (trcode==120301 | trcode==120302 | trcode==120399 | /// trcode==120501 | trcode==120502 | trcode==120503 | trcode==120599) tab main //no 82, 83 replace main=81 if main==-5 & trcode==120312 tab main replace main=84 if main==-5 & trcode==120306 tab main replace main=85 if main==-5 & trcode==120305 tab main replace main=86 if main==-5 & (trcode==120303 | trcode==120304) tab main replace main=87 if main==-5 & trcode==20903 tab main replace main=88 if main==-5 & (trcode==160101 | trcode==160102 | trcode==169999 | /// trcode==160199 | trcode==160201 | trcode==160299) tab main replace main=89 if main==-5 & (trcode==20904 | trcode==120308) tab main replace main=90 if main==-5 & trcode==500103 tab main replace main=91 if main==-5 & (trcode==180101 | trcode==180199 | (trcode>181100 /// & trcode<181200) | trcode==180302 | trcode==180304 | trcode==180305 | /// (trcode>180400 & trcode<180500)) tab main replace main=92 if main==-5 & trcode==180502 tab main replace main=93 if main==-5 & (trcode==180501 | trcode==180599 | trcode==180503) tab main replace main=94 if main==-5 & (trcode==180601 | trcode==180602 | trcode==180699 | /// trcode==180603 | trcode==180604) tab main replace main=95 if main==-5 & ((trcode>180200 & trcode<180300) | (trcode>180700 /// & trcode<180800) | trcode>180800 & trcode<180900) | (trcode>180900 & trcode<181002) /// | (trcode>181002 & trcode<181100) tab main replace main=96 if main==-5 & (trcode==180301 | trcode==180303 | trcode==180399) tab main replace main=97 if main==-5 & ((trcode>181400 & trcode<181500) | trcode==181002 | /// (trcode>181500 & trcode<181600)) tab main replace main=98 if main==-5 & (trcode==189999 | (trcode>181200 & trcode<181400) | /// (trcode>181600 & trcode<181700) | (trcode>181800 & trcode<181900)) tab main replace main=-8 if main==-5 & (trcode==500101 | trcode==500102 | trcode==500104 | /// trcode==500106 | trcode==500107 | trcode==509999) tab main *re: CTUR, *any -5 codes will be cases of new codes that were not assigned to *a location. *if all ok, this will produce a warning that there are no such cases. preserve keep if main==-5 tab trcode restore //no -5 codes remaining *the series of preserve codes below are to verify the trcodes are matched with //// *the right main codes preserve keep if main<11 tab trcode main restore preserve keep if main>10 & main<21 tab trcode main restore preserve keep if main>20 & main<31 tab trcode main restore preserve keep if main>30 & main<41 tab trcode main restore preserve keep if main>40 & main<51 tab trcode main restore preserve keep if main>50 & main<61 tab trcode main restore preserve keep if main>60 & main<71 tab trcode main restore preserve keep if main>70 & main<81 tab trcode main restore preserve keep if main>80 & main<91 tab trcode main restore preserve keep if main>90 & main<100 tab trcode main restore ************************************ *set up variable for quality checks* ************************************. gen dqcheck1=0 label var dqcheck1 "miss main, mode of transport or in restaurant" label def dqcheck1 0 "ordinary diary" 1 "includes pattern" label val dqcheck1 dqcheck1 gen dqcheck2=0 label var dqcheck2 "miss main, 2nd act reported" label def dqcheck2 0 "ordinary diary" 1 "includes pattern" label val dqcheck2 dqcheck2 gen dqcheck3=0 label var dqcheck3 "miss 20 min before/after travel" label def dqcheck3 0 "ordinary diary" 1 "includes pattern" label val dqcheck3 dqcheck3 preserve keep if main==-8 tab eloc restore replace dqcheck1=1 if main==-8 & eloc==8 tab dqcheck1 replace main=90 if main==-8 & eloc==8 replace dqcheck1=1 if main==-8 & eloc==6 tab dqcheck1 replace main=56 if main==-8 & eloc==6 tab dqcheck1 *Note that TRTCCTLN is the summation of TRTCC_LN and TRTCOCLN. *fill in what secondary activity is possible to identify. tab sec replace sec=34 if trtcc_ln>0 | trtcoc_ln>0 replace sec=90 if main<90 & eloc==8 & sec!=34 & trtec_ln<=0 tab sec preserve keep if trtec_ln>0 tab sec restore *Beginning in 2012 secondary elderly care is also asked. *Investigate the cases where two secondary care activities were *reported simultaneously. gen secflag=0 replace secflag=1 if sec==34 & trtec_ln>0 //119 cases where both secondary childcare and elderly care are reported at the same time preserve keep if secflag==1 tab pid restore gen testb=0 replace testb=2 if trtec_ln!= time preserve keep if secflag==1 tab testb restore *alternatively preserve keep if secflag==1 tab trtec_ln time replace sec=40 if trtec_ln>0 restore *Both tests show that original elderly care time is equal to time variable we computed. *Code those cases as elderly care and create a supplementary file with flag that *shows the cases wher both secondary childcare and secondary elderly care were reported. preserve keep pid year epnum secflag save US2018xincare, replace restore *************************Code below here is correcting missingness based on *where someone was or what they were doing before or after the missing value**** ******************************************************************************** * From CTUR *If the main activity is missing and the location is *valid and not at home or another home and the *previous location is valid, not travelling, and *different from the present location, the main *activity is coded to impute travel * *if the main activity is valid and not travel and *both the current and previous locations are valid *and the previous location is not travelling and the *current and previous location differ, the secondary *activity is coded as imputed travel * *if the main activity is missing and the location is *at home or another home and the previous location is *travel, main activity is coded as unknown personal *or household care * *if the main activity is missing and the previous location *is travel and the present location is valid but not at *a home, the main activity is coded as imputed out of *home activity. sort pid epnum replace dqcheck3=1 if pid==l.pid & main<0 & ((eloc==1 | eloc==2) & l.eloc==8) replace main=2 if pid==l.pid & main<0 & ((eloc==1 | eloc==2) & l.eloc==8) replace main=58 if pid==l.pid & main<0 & ((eloc>2 & eloc<8) | eloc==9) & l.eloc==8 replace main=90 if pid==l.pid & main<0 & ((eloc>0 & eloc!=8 & l.eloc>0 &l.eloc!=8) /// & eloc!=l.eloc) replace sec=90 if (main>0 & main<90) & sec==0 & (((eloc>1 & eloc<6) | eloc==7) /// & ((l.eloc>1 & l.eloc<6 | l.eloc==7 & eloc~=l.eloc))) tab sec preserve keep if main==-8 | main==2 | main==58 | main==90 tab main restore *if main activity still missing and the location is *valid and not at home, code the main activity as *imputed activity away from home. replace main=58 if main<0 & eloc>2 preserve keep if main==-8 | main==58 tab main restore preserve keep if main==-8 tab trcode restore //remaining are 500101 (insufficient detail), 500106 (gap/can't remember), and //500107 (unable to doe at 1st tier) //unlike 2012, none of these are 500104 so skipping this coding (ln1672-1702) *now see if any of the remaining missing cases happen at the end of the diary day. gen test=0 replace test=1 if lastep==1 & main==-8 & l.main>0 preserve keep if test==1 tab3way time eloc trcode restore //all cases, respondents are in their own home gen test2=0 replace test2=l.main if test==1 tab test2 replace main=4 if test==1 preserve keep if main==-8 | main==2 | main==4 | main==58 | main==90 tab main restore *now look at cases where the first activity is missing drop test gen test=0 replace test=1 if epnum==1 & main==-8 tab test preserve keep if test==1 tab time eloc restore //5 cases, 1 is missing location, all others are at home drop test2 gen test2=0 replace test2=main if l.test==1 tab test2 tab test2 eloc //4 at home, 1 transportation replace main=4 if test==1 & test2<90 replace main=2 if time<91 & test==1 & test2>89 preserve keep if main==-8 | main==2 | main==4 | main==58 tab main restore *look at cases that remain missing that happen after sleep drop test gen test=0 replace test=1 if pid==l.pid & main==-8 & (l.main==3 | l.main==4) preserve keep if test==1 tab eloc time restore //103 cases, 99 at own home *CTUR: those coded as less than an hour are coded as imputed personal or household care replace main=2 if time<61 & test==1 preserve keep if main==-8 | main==2 tab main restore *look at cases where time gap follows eating or food prep drop test gen test=0 replace test=1 if main==-8 & (l.main==9 | l.main==20 | l.main==21) preserve keep if test==1 tab time eloc restore *316 cases where the time gap in a diary follows eating *or food preparation of setting table/putting away *dishes; all at home or another's home. 60% an hour or less, *some gaps very long. Set gaps of an hour or less to *imputed personal or household care. replace main=2 if test==1 & (eloc==1 | eloc==2) & time<61 preserve keep if main==-8 | main==2 tab main restore preserve keep if trcode>500000 tab main restore *at this point, 1186 of the 2617 cases coded with missing codes *in the original data are still coded as missing - 34 now general *personal care; 411 imputed personal or household care, 8 imputed *sleep; 16 restaurant; 449 imputed time away from home; 32 walking; *481 imputed travel. preserve keep if main==-8 tab time eloc, col restore *remaining gaps range from 1 minute to 180 minutes. Majority *are an hour or less. No further effort made to fill in gaps *using other diary information. *Moving on to eating tab mtrav replace mtrav=5 if (main>89 | sec>89) & mtrav==-7 tab mtrav *impute eating where no main activity eating but *main activity food preparation or setting table reported. gen eatdr=0 replace eatdr=time if main==8 | main==9 | main==56 egen anyeat = sum(eatdr), by (pid) tab anyeat if epnum==1 *449 respondents (4.68 percent) reported no eating preserve keep if anyeat==0 tab maxep main restore *some really low episode diaries with missing eating. *There are some diaries with food preparation activities *but no eating. replace sec=2 if anyeat==0 & (main==20 | main==21) & sec==0 & maxep>9 tab sec sort pid epnum save tempfile4, replace **# **hfep is being saved here for the first time** ***********************************************. cd "/Users/kelseydrotning/Box Sync/GA Assignments/IPUMS GA/Harmonization" use tempfile4 preserve keep survey wave pid diaryday cday month year time clockst start end epnum main /// sec inout eloc mtrav alone infant child sppart clsfam hhadult animal shoprof /// cowork wellknw otherp unknwp tufinlwgt save USA2018hfep, replace restore egen dqcheckone = max(dqcheck1), by(pid) egen dqchecktwo = max(dqcheck2), by (pid) egen dqcheckthree = max(dqcheck3), by (pid) save USA2018marker, replace clear all **********************************using CPS file******************************** *extract the state information to construct a separate weight *excluding those states not found in all studies in the AHTUS. //prepare respondent file for merge cd "/Users/kelseydrotning/Box Sync/GA Assignments/IPUMS GA/Harmonization/2018 AHTUS files/atusresp-2018" import delimited "atusresp_2018.dat" save atusresp_2018.dta, replace clear all //prepare cps file for merge cd "/Users/kelseydrotning/Box Sync/GA Assignments/IPUMS GA/Harmonization/2018 AHTUS files/atuscps-2018" import delimited "atuscps_2018.dat" save atuscps_2018.dta, replace keep if tratusr==1 //only respondents to atus keep tucaseid gestfips prtage //merge cd "/Users/kelseydrotning/Box Sync/GA Assignments/IPUMS GA/Harmonization/2018 AHTUS files/atusresp-2018" merge 1:1 tucaseid using atusresp_2018, gen(cpsmark) keep tucaseid tuyear gestfips prtage mdesc //this is recoding statefip, I don't think this is really necessary since ATUS uses fip tab gestfips # delimit ; recode gestfips (1=1 "Alabama") (2=2 "Alaska") (4=2 "Arizona") (5=4 "Arkansas") (6=5 "California") (8=6 "Colorado") (9=7 "Connecticut") (10=8 "Delaware") (11=9 "District of Columbia") (12=10 "Florida") (13=11 "Georgia") (15=12 "Hawaii") (16=13 "Idaho") (17=14 "Illinois") (18=15 "Indiana") (19=16 "Iowa") (20=17 "Kansas") (21=18 "Kentucky") (22=19 "Louisiana") (23=20 "Maine") (24=21 "Maryland") (25=22 "Massachusetts") (26=23 "Michigan") (27=24 "Minnesota") (28=25 "Mississippi") (29=26 "Missouri") (30=27 "Montana") (31=28 "Nebraska") (32=29 "Nevada") (33=30 "New Hampshire") (34=31 "New Jersey") (35=32 "New Mexico") (36=33 "New York") (37=34 "North Carolina") (38=35 "North Dakota") (39=36 "Ohio") (40=37 "Oklahoma") (41=38 "Oregon") (42=39 "Pennsylvania") (44=40 "Rhode Island") (45=41 "South Carolina") (46=42 "South Dakota") (47=43 "Tennessee") (48=44 "Texas") (49=45 "Utah") (50=46 "Vermont") (51=47 "Virginia") (53=48 "Washington") (54=49 "West Virginia") (55=50 "Wisconsin") (56=51 "Wyoming"), gen(state) ; #delimit cr gen exclude=0 replace exclude=1 if state==2 | state==8 | state==12 | state==13 | state==17 | /// state==27 | state==29 | state==30 | state==32 | state==35 | state==40 | /// state==46 | state==51 label var exclude "exclude states to exclude in xtimewt" label def exclude 0 "states in all samples" 1 "states only in most recent samples" label val exclude exclude tab state exclude rename tucaseid pid keep pid exclude cd "/Users/kelseydrotning/Box Sync/GA Assignments/IPUMS GA/Harmonization" save excludestates, replace clear all *create variables for summary time use file cd "/Users/kelseydrotning/Box Sync/GA Assignments/IPUMS GA/Harmonization" use USA2018hfep tab main //0.64 percent of main activity time missing in this file sort survey pid gen t0pcare=0 gen t1paidw=0 gen t2ed=0 gen t3unpaid=0 gen t4acvol=0 gen t5outhm=0 gen t6exerc=0 gen t7inhm=0 gen t8media=0 gen t9trav=0 gen tmiss=0 gen outside=0 gen inveh=0 gen inside=0 gen locunk=0 gen athome=0 gen atwrksc=0 gen elsewhr=0 gen lunk=0 gen walone=0 gen wchild=0 gen wsppart=0 gen wclsfam=0 gen wother=0 gen withunk=0 replace t0pcare=time if main>=1 & main<10 replace t1paidw=time if main>9 & main<16 replace t2ed=time if main>15 & main<20 replace t3unpaid=time if main>19 & main<40 replace t4acvol=time if main>39 & main<50 replace t5outhm=time if main>49 & main<60 replace t6exerc=time if main>59 & main<70 replace t7inhm=time if main>69 & main<80 replace t8media=time if main>79 & main<90 replace t9trav=time if main>89 & main<100 replace tmiss=time if main==-8 replace outside=time if inout==1 replace inside=time if inout==2 replace inveh=time if inout==3 replace locunk=time if inout==-8 replace athome=time if eloc==1 replace atwrksc=time if eloc==3 | eloc==4 replace elsewhr=time if eloc==2 | eloc>4 replace lunk=time if eloc==-8 replace walone=time if alone==1 replace wchild=time if child==1 replace wsppart=time if sppart==1 replace wclsfam=time if clsfam==1 replace wother=time if otherp==1 | cowork==1 | shoprof==1 | hhadult==1 | (clsfam==0 /// & wellknw==1) replace withunk=time if unknwp==1 //replaces vector and do repeat code from CTUR foreach x of numlist 1/98 { gen tmain`x'=0 replace tmain`x'=time if main==`x' } foreach x of numlist 1/98 { gen tsc`x'=0 replace tsc`x'=time if main==`x' & (sec>32 & sec<40) } sum t0pcare t1paidw t2ed t3unpaid t4acvol t5outhm t6exerc t7inhm t8media t9trav /// tmiss outside inside inveh locunk athome atwrksc elsewhr lunk walone wchild /// wsppart wclsfam wother withunk *compute a test variable to ensure the 1 and 2 digits *sets of time use codes add up to 1440. both cases need *to include missing time in the sum. gen tdiga = t0pcare + t1paidw + t2ed + t3unpaid + t4acvol + t5outhm + t6exerc + t7inhm /// + t8media + t9trav + tmiss egen t1dig = total(tdiga), by(pid) tab t1dig egen tdigb = rsum(tmain*) gen t2digmiss = tdigb+tmiss egen t2dig = total(t2digmiss), by(pid) tab t2dig save tempfile5, replace sum tsc* sum tmain* *the variables that have no labels do not exist as AHTUS codes *(tmain47, tmain59, tmain61, tmain69, tmain79 and tmain80). *The activities in which no minutes were reported are: *tmain(43, 44, 45, 46, 48, 55, 70, 71, 74, 82, 83) *tsc(3, 4, 5, 8, 13, 33, 35, 36, 37, 38, 43, 44, 45, 46, 48, 55, 64, * 70, 71, 74, 82, 83, 92 (0=-9). *set all those variable codes which could not be constructed *to -9 for missing. recode tmain43 tmain44 tmain45 tmain46 tmain48 tmain55 tmain70 tmain71 /// tmain74 tmain82 tmain83 /// tsc3 tsc4 tsc5 tsc8 tsc13 tsc33 tsc35 tsc36 tsc37 tsc38 tsc43 tsc44 tsc45 /// tsc46 tsc48 tsc55 tsc64 tsc70 tsc71 tsc74 tsc82 tsc83 tsc92 (0=-9) //prepare summary file using collapse? #delimit ; collapse (max) diaryday cday month year numep=epnum tmain43 tmain44 tmain45 tmain46 tmain48 tmain55 tmain70 tmain71 tmain74 tmain82 tmain83 tsc3 tsc4 tsc5 tsc8 tsc13 tsc33 tsc35 tsc36 tsc37 tsc38 tsc43 tsc44 tsc45 tsc46 tsc48 tsc55 tsc64 tsc70 tsc71 tsc74 tsc82 tsc83 tsc92 tufinlwgt t1dig t2dig (sum) tottime=time t0pcare t1paidw t2ed t3unpaid t4acvol t5outhm t6exerc t7inhm t8media t9trav tmiss tmain1 tmain2 tmain3 tmain4 tmain5 tmain6 tmain7 tmain8 tmain9 tmain10 tmain11 tmain12 tmain13 tmain14 tmain15 tmain16 tmain17 tmain18 tmain19 tmain20 tmain21 tmain22 tmain23 tmain24 tmain25 tmain26 tmain27 tmain28 tmain29 tmain30 tmain31 tmain32 tmain33 tmain34 tmain35 tmain36 tmain37 tmain38 tmain39 tmain40 tmain41 tmain42 tmain49 tmain50 tmain51 tmain52 tmain53 tmain54 tmain56 tmain57 tmain58 tmain60 tmain62 tmain63 tmain64 tmain65 tmain66 tmain67 tmain68 tmain72 tmain73 tmain75 tmain76 tmain77 tmain78 tmain81 tmain84 tmain85 tmain86 tmain87 tmain88 tmain89 tmain90 tmain91 tmain93 tmain94 tmain95 tmain96 tmain97 tmain98 tsc1 tsc2 tsc6 tsc7 tsc9 tsc10 tsc11 tsc12 tsc14 tsc15 tsc16 tsc17 tsc18 tsc19 tsc20 tsc21 tsc22 tsc23 tsc24 tsc25 tsc26 tsc27 tsc28 tsc29 tsc30 tsc31 tsc32 tsc34 tsc39 tsc40 tsc41 tsc42 tsc49 tsc50 tsc51 tsc52 tsc53 tsc54 tsc56 tsc57 tsc58 tsc60 tsc62 tsc63 tsc65 tsc66 tsc67 tsc68 tsc72 tsc73 tsc75 tsc76 tsc77 tsc78 tsc81 tsc84 tsc85 tsc86 tsc87 tsc88 tsc89 tsc90 tsc91 tsc93 tsc94 tsc95 tsc96 tsc97 tsc98 outside inside inveh locunk athome atwrksc elsewhr lunk walone wchild wsppart wclsfam wother withunk , by (survey wave pid) ; #delimit cr save USA2018hfsum, replace ********************************************************************************************** *HHID - originally this was set to the variable hrhhid, which is the CPS household identifier* *in early year of the ATUS, this variable permitted a match back to the CPS variables, but * *values are recycled in later years of the survey, making the match back to the CPS more * *complex, better for users to use the ATUS-X system to extract CPS variables for use with the* *AHTUS variables https://www.atusdata.org/index.shtml * **********************************************************************************************. *Note: the code regarding hrhhid is commented out in the CTUR file so not sure if it is needed. clear all cd "/Users/kelseydrotning/Box Sync/GA Assignments/IPUMS GA/Harmonization/2018 AHTUS files/atuscps-2018" use atuscps_2018 keep tucaseid tulineno hrhhid huinttyp sort tucaseid tulineno cd "/Users/kelseydrotning/Box Sync/GA Assignments/IPUMS GA/Harmonization/2018 AHTUS files/atusresp-2018" merge 1:1 tucaseid tulineno using atusresp_2018, gen(hhid_mark) mdesc //only the diarists are matched. Hopefully that's correct. keep if hhid_mark==3 //matched rename tucaseid pid destring hrhhid, replace //already numeric collapse (max) hrhhid, by(pid) rename hrhhid hhid save tempa1, replace *extract marker of whether household children present. cd "/Users/kelseydrotning/Box Sync/GA Assignments/IPUMS GA/Harmonization/2018 AHTUS files/atusresp-2018" use atusresp_2018 keep tucaseid trhhchild rename tucaseid pid cd "/Users/kelseydrotning/Box Sync/GA Assignments/IPUMS GA/Harmonization" save tempa2, replace ****************************************************** *now ready to finalize files, start with summary file* ******************************************************. clear all cd "/Users/kelseydrotning/Box Sync/GA Assignments/IPUMS GA/Harmonization" use USA2018hfsum *check that variables sum to 1440 in aggregated file. sum t1dig t2dig tottime *all diaries add up to the correct 1440 minutes for *both the 1 and 2 digit levels of activity code. //adding merge step to add hhid, not in CTUR file merge 1:1 pid using tempa1, gen(_hhid) //next two merges are in CTUR file merge 1:1 pid using dem, gen(_dem) merge 1:1 pid using tempa2, gen(_tempa2) tab age *all ages present sort hhid pid save weights, replace ********* *badcase* ********* use USA2018hfep tab main merge m:1 pid using tempa1, gen(_hhid) sort hhid pid **compute misbasic variables** gen tmiss2=0 gen aetdr=0 gen asleep=0 gen apcare=0 gen atrav=0 gen anycare=0 replace tmiss2=time if main==-8 sum tmiss2 //indication here that secondary activities are not fully coded by CTUR. *only has values for 0, 2, 34, 40, 90 but many more codes below. Also wondering *if tmain and tsc creation are off because it was >32 and <40, but I think 40 *should have been included. Also, based on what is documented on AHTUS website, *sec should basically match main in labels but doesn't? *include the diaries with food preparation activities *but no eating. replace aetdr=1 if main==8 | main==9 | main==20 | main==21 | main==56 | sec==2 | /// sec==8 | sec==9 | sec==20 | sec==21 | sec==56 | eloc==6 *includes do nothing, think, time out or work break. replace asleep=1 if main==3 | main==4 | main==5 | main==13 | main==78 | sec==3 | /// sec==4 | sec==5 | sec==13 | sec==78 *includes purchase of per'l care and imputed p'l or hhold care. replace apcare=1 if main==1 | main==2 | main==6 | main==7 | main==28 | sec==1 | /// sec==6 | sec==7 | sec==28 *includes sports/exercise, walking, cycling, outdoor recreation, *gardening, petcare, hunting. replace atrav=1 if (main>89 & main<99) | main==60 | main==62 | main==63 | /// main==64 | main==65 | main==66 | main==67 | main==68 | (sec>89 & sec<99) | /// sec==60 | sec==62 | sec==63 | sec==64 | sec==65 | sec==66 | sec==67 | /// sec==68 *to create carer flag replace anycare=1 if (main>32 & main<41) | (sec>32 & sec<41) egen eatdr = max(aetdr), by (pid) egen sleep = max(asleep), by (pid) egen pcare = max(apcare), by (pid) egen trav = max(atrav), by (pid) sum eatdr sleep pcare trav *compute test variable to figure the diarists who stayed at home all day. //this acts like hhid has a value, but it doesn't currently becuase it is set to zero. capture drop test gen test=0 sort pid epnum replace test=1 if eloc!=l.eloc & (hhid==l.hhid & pid==l.pid) & trav==0 tab test //88 respondents stayed home all day capture drop test2 gen test2=0 replace test2=1 if (hhid==l.hhid & pid==l.pid) & l.asleep==1 & (atrav==1 | /// (main>9 & main<70) | main>89) & pcare==0 tab test2 egen pcarex = max(test2), by(pid) egen travx = max(test), by(pid) replace pcare=1 if pcare==0 & pcarex==1 replace trav=1 if trav==0 & travx==1 sum eatdr sleep pcare trav drop test gen test=1 replace test=time if eloc==1 | eloc==2 egen athome = sum(test), by (pid) egen maxep = max(epnum), by(pid) sum athome maxep *Diaries including only 2 of the basic act but have at least *12 episodes where the diarist reports being at home all day *but otherwise meet the other 4 good diary criteria count as *good diaries. *What are these 4 criteria? replace trav=1 if trav==0 & maxep>11 & athome>1000 tab trav collapse (sum) eatdr sleep pcare tmiss2 trav (max) anycare epnum diaryday, /// by(survey wave hhid pid) gen misbasic=0 replace misbasic=misbasic+1 if trav==0 replace misbasic=misbasic+1 if sleep==0 replace misbasic=misbasic+1 if pcare==0 replace misbasic=misbasic+1 if eatdr==0 label var misbasic "number basic activities not recorded" tab misbasic tab anycare misbasic *diaries of carers who otherwise meet the 4 good diary criteria count as good diaries. replace misbasic=1 if (epnum>7 | anycare==1) & misbasic==2 tab misbasic preserve keep if misbasic>1 tab epnum tmiss2 restore *we need sex and age information sort hhid pid //CTUR calls a sexage_all file here, but as far as I can tell, that is not a file //I have created or that is available at BLS so I don't know what that file is. //But sex and age information are also in the weights file so going to proceed //with that merge. merge 1:1 pid using weights, gen(_weights) gen lowqual=0 replace lowqual=1 if tmiss2>90 | epnum<7 | misbasic>1 tab lowqual *490 (5.11%) bad diaries, 9,103 good ones. recode tmiss2 (0/90=0) (91/200=1) tab tmiss2 recode epnum (0/6=1) (7/75=0), into(nep) tab nep recode misbasic (0/1=0) (2/4=1) sum tmiss2 nep misbasic age sex mdesc tmiss2 nep misbasic age sex *As no sex or age missing, only the diary quality variables *determine bad case in this case. tab3way tmiss2 nep misbasic /*490 bad diaries. 0 diaries are low quality on all 3 counts. 64 bad on 2 counts. 63 missing basic acts and low episode count 1 is missing basic acts and missing 91+ minutes Others bad on 1 count 90 have low number of episodes 310 missing 91+ minutes 26 missing 2+ basic acts */ **# ********** *caremflg* ********** *If the person is missing 2+ basic acts but providing care he/she is flagged as carer. *The person flagged with caremflg is likely to combine one of the basic acts with care. tab anycare misbasic gen caremflg=0 replace caremflg=1 if anycare==1 & misbasic==1 & epnum>6 tab caremflg //only one person, this seems off sort hhid pid save weights2, replace ********* *weights* *********. use weights2 svyset [pweight=tufinlwgt] tab diaryday svyset, clear *tufinlwgt corrects for survey and day of the week. gen childm=0 replace childm=1 if age<18 tab childm mean age, over(childm) gen origwght = tufinlwgt gen owghtflg = 3 gen xtimewt=0 gen infltwt=0 gen recwght=0 *From CTUR: Note that recwght should sum to the number of diary cases (good and bad) *and its mean should be 1. *Also, the number of cases when you weight by the original inflated weight should be equal *to the the number of cases when you weight by infltwt. gen baddem=0 replace baddem=1 if sex<1 | age<15 //no baddem tab lowqual baddem #delimit ; recode age (15/17 = 0) (18/24 = 1) (25/34 = 2) (35/44 = 3) (45/54 = 4) (55/64 = 5) (65/74 = 6) (75/90 = 7), into(agegp) ; #delimit cr #delimit ; recode age (15/17 = 0) (18/24 = 1) (25/34 = 2) (35/44 = 3) (45/54 = 4) (55/64 = 5) (65/74 = 6) (75/90 = 7), into(agegp2) ; #delimit cr replace agegp=8 if lowqual==1 | baddem==1 | diaryday==-1 tab agegp trhhchild *need to deflate to sampled number. sum origwght *the final weight weights up to the population size. *9,593 diarists, mean 9964744, but this weight artificially *increases the child diarists compared to the adults. *this weighting process makes me think the 2012 weights might be off based on *the small magnitude of the under 18 weight. preserve keep if age<18 sum origwght return li restore *the weight for young people is 17650656.47932331 preserve keep if age>17 sum origwght return li restore *the weight for adults is 9745547.015492655 gen origwu18=origwght/17650656.47932331 gen origwo17=origwght/9745547.015492655 gen origw2=0 replace origw2 = origwu18 if age<18 replace origw2 = origwo17 if age>17 tab origw2 sum origw2 //this seems like a super weird process to me gen tempagesexgp=agegp + 10*(sex-1) gen tempagesex2=agegp2 + 10*(sex-1) sum tempagesexgp tempagesex2 #delimit ; recode tempagesexgp (0=1 "men 15-17") (10=2 "women 15-17") (1=21 "men 18-24") (2=22 "men 25-34") (3=23 "men 35-44") (4=24 "men 45-54") (5=25 "men 55-64") (6=26 "men 65-74") (7=27 "men 75+") (11=31 "women 18-24") (12=32 "women 25-34") (13=33 "women 35-44") (14=34 "women 45-54") (15=35 "women 55-64") (16=36 "women 65-74") (17=37 "women 75+") (8=90 "men miss age/bad diary") (18=91 "women miss age/bad diary"), gen(agesexgp) ; #delimit cr #delimit ; recode tempagesex2 (0=1 "men 15-17") (10=2 "women 15-17") (1=21 "men 18-24") (2=22 "men 25-34") (3=23 "men 35-44") (4=24 "men 45-54") (5=25 "men 55-64") (6=26 "men 65-74") (7=27 "men 75+") (11=31 "women 18-24") (12=32 "women 25-34") (13=33 "women 35-44") (14=34 "women 45-54") (15=35 "women 55-64") (16=36 "women 65-74") (17=37 "women 75+") (8=90 "men miss age/bad diary") (18=91 "women miss age/bad diary"), gen (agesex2) ; #delimit cr tab agesexgp sex tab agesexgp agegp tab agesex2 sex tab agesex2 agegp mdesc replace origw2=0 if agesexgp>89 *From CTUR: The older datasets covered samples of most contiguous states + *Washington DC, but did not draw samples from 11 states: *Delaware, Idaho, Kansas, Montana, Nevada, New Hampshire, *New Mexico, North Dakota, Rhode Island, Vermont, and Wyoming. *The 1992-94, 1995, 1998-1999 and 1999-2001 samples did cover *these states, but did not include Alaska and Hawaii. The ATUS *covers all 50 states. It is possible that with some analysis, *an apparent different across time may reflect the range of *states included. For this reason, we construct two weights *in addition to our recommended weight, one which excludes *the states not covered in the older samples, and a second *which includes all states and inflates to the national population. sort pid merge 1:1 pid using excludestates, gen(_exclude) tab exclude *8.5 percent of states are only in the most recent samples preserve keep if lowqual==0 & exclude==0 tab agesexgp restore preserve keep if agesexgp<5 & exclude==0 tab agesexgp restore preserve keep if (agesexgp>5 & agesexgp<90) & exclude==0 tab agesexgp restore mdesc gen origw3=origw2 replace origw3=0 if agesexgp>89 | exclude==1 preserve keep if agesexgp>89 | exclude==1 sum origw3 restore preserve keep if origw3>0 sum origw3 restore svyset [pweight=origw3] tab agesexgp childm svyset, clear sort agesexgp diaryday preserve collapse (sum) grouptot=origw2, by(agesexgp) save group, replace restore gen dayn=0 preserve collapse (sum) daytot=origw2 (count) dayn, by(agesexgp diaryday) save day, replace restore merge m:1 agesexgp using group, gen(_group) merge m:1 agesexgp diaryday using day, gen(_day) update replace sort agesexgp exclude diaryday preserve collapse (sum) groupt3=origw3, by(agesexgp exclude) save group3, replace restore gen dayn3=0 preserve collapse (sum) daytot3=origw3 (count) dayn3, by(agesexgp exclude diaryday) save day3, replace restore merge m:1 agesexgp exclude using group3, gen(_group3) merge m:1 agesexgp exclude diaryday using day3, gen(_day3) update replace gen exptot=grouptot/7 gen exptot3=groupt3/7 replace recwght=(exptot/daytot)/(dayn/daytot) if lowqual==0 & baddem==0 tab recwght replace xtimewt=(exptot3/daytot3)/(dayn3/daytot3) if lowqual==0 & baddem==0 & exclude==0 tab xtimewt sum recwght xtimewt gen under18=0 replace under18=1 if age<19 tab exclude under18 tab age under18 if exclude==0 tab age under18 if exclude==1 *340 child diaries in states in all samples (305-35). *9253 adult diaries in states in all samples (8473-780). gen suma=0 gen sumc=0 replace suma=recwght if age>17 replace sumc=recwght if age<18 gen xsuma=0 gen xsumc=0 replace xsuma=xtimewt if age>17 replace xsumc=xtimewt if age<18 egen recadsum = sum(suma), by (survey) egen recchsum = sum(sumc), by (survey) egen xadsum = sum(xsuma), by (survey) egen xchildsum = sum(xsumc), by(survey) tab recadsum tab recchsum tab xadsum tab xchildsum *the sum of adult weights is 8894.406 - it should be 8473 *the sum of child weights is 259.5324 - it should be 305 *the sum of xtime adult weights is 8250.161 - it should be 9253 *the sum of xchild weights is 236.7677 - it should be 340 **************************CHECK FOR ERRORS************************************** //second number in each of these is wrong. I'm not sure where the number is coming from replace recwght=recwght*(8473/11490.98) if age>17 replace recwght=recwght*(305/4500917504.00) if age<18 sum recwght //mean should be 1 replace xtimewt=xtimewt*(9523/10825.13) if age>17 replace xtimewt=xtimewt*(340/4226043741.45) if age<18 sum xtimewt preserve keep if exclude==0 sum xtimewt restore *mean among the states in all years should be 1 svyset [pweight=recwght] tab diaryday agesexgp svyset, clear preserve keep if age<18 sum origwght restore return li *the mean weight for young people is 17650656.47932331 preserve keep if age>17 sum origwght restore return li *the weight for adults is 9745547.015492655 *the numbers are means of original weights for adults and children replace infltwt=recwght*17650656.47932331 if age<18 replace infltwt=recwght*9745547.015492655 if age>17 sum infltwt ***************************after this should be okay**************************** sort survey wave hhid pid preserve drop sleep t1dig t2dig tufinlwgt anycare sex age trhhchild misbasic agegp agegp2 agesex2 exclude origw3 grouptot daytot dayn groupt3 daytot3 dayn3 exptot exptot3 save USA2018youthsum, replace restore sort pid preserve keep hhid pid lowqual baddem caremflg origwght owghtflg xtimewt infltwt recwght childm save addlow, replace restore sort survey wave hhid pid preserve //keep if childm==0 WRONG keep survey wave hhid pid diaryday cday month year age /// tottime numep t0pcare t1paid t2ed t3unpaid t4acvol /// t5outhm t6exerc t7inhm t8media t9trav tmiss tmain1 tmain2 /// tmain3 tmain4 tmain5 tmain6 tmain7 tmain8 tmain9 /// tmain10 tmain11 tmain12 tmain13 tmain14 tmain15 tmain16 /// tmain17 tmain18 tmain19 tmain20 tmain21 tmain22 tmain23 /// tmain24 tmain25 tmain26 tmain27 tmain28 tmain29 tmain30 /// tmain31 tmain32 tmain33 tmain34 tmain35 tmain36 tmain37 /// tmain38 tmain39 tmain40 tmain41 tmain42 tmain43 tmain44 /// tmain45 tmain46 tmain48 tmain49 tmain50 tmain51 tmain52 /// tmain53 tmain54 tmain55 tmain56 tmain57 tmain58 tmain60 /// tmain62 tmain63 tmain64 tmain65 tmain66 tmain67 tmain68 /// tmain70 tmain71 tmain72 tmain73 tmain74 tmain75 tmain76 /// tmain77 tmain78 tmain81 tmain82 tmain83 tmain84 tmain85 /// tmain86 tmain87 tmain88 tmain89 tmain90 tmain91 /// tmain93 tmain94 tmain95 tmain96 tmain97 tmain98 tsc1 /// tsc2 tsc3 tsc4 tsc5 tsc6 tsc7 tsc8 tsc9 tsc10 /// tsc11 tsc12 tsc13 tsc14 tsc15 tsc16 tsc17 tsc18 tsc19 /// tsc20 tsc21 tsc22 tsc23 tsc24 tsc25 tsc26 tsc27 tsc28 /// tsc29 tsc30 tsc31 tsc32 tsc33 tsc34 tsc35 tsc36 tsc37 /// tsc38 tsc39 tsc40 tsc41 tsc42 tsc43 tsc44 tsc45 tsc46 /// tsc48 tsc49 tsc50 tsc51 tsc52 tsc53 tsc54 tsc55 tsc56 /// tsc57 tsc58 tsc60 tsc62 tsc63 tsc64 tsc65 tsc66 tsc67 /// tsc68 tsc70 tsc71 tsc72 tsc73 tsc74 tsc75 tsc76 tsc77 /// tsc78 tsc81 tsc82 tsc83 tsc84 tsc85 tsc86 tsc87 tsc88 /// tsc89 tsc90 tsc91 tsc92 tsc93 tsc94 tsc95 tsc96 tsc97 /// tsc98 outside inveh inside locunk athome atwrksc elsewhr /// lunk walone wchild wsppart wclsfam wother withunk lowqual /// baddem caremflg origwght owghtflg xtimewt infltwt recwght cd "//Users/kelseydrotning/Box Sync/GA Assignments/IPUMS GA/Harmonization/2018 AHTUS files" save USA2018hfsum, replace //careful, this has the same file name as above so saved in a different folder restore clear all cd "/Users/kelseydrotning/Box Sync/GA Assignments/IPUMS GA/Harmonization" use USA2018hfep merge m:1 pid using addlow, gen(_addlow) save USA2018fepA, replace preserve //keep if childm==0 THIS IS WHAT IS REMOVING CCHILDREN keep survey wave hhid pid diaryday cday month year time clockst start end /// epnum main sec inout eloc mtrav alone infant child sppart clsfam hhadult /// animal shoprof cowork wellknw otherp unknwp lowqual baddem caremflg /// origwght owghtflg xtimewt infltwt recwght cd "/Users/kelseydrotning/Box Sync/GA Assignments/IPUMS GA/Harmonization/2018 AHTUS files" save USA2018hfep, replace *Now USA2018hfep only includes adult files. THIS IS WRONG restore ****************************************** *check prevalence of secondary activities* ******************************************. gen anysec=0 gen sectime=0 gen secep=0 gen seccc=0 gen seccctm=0 gen secccep=0 replace anysec=1 if sec>0 replace sectime=time if sec>0 replace secep=1 if sec>0 replace seccc=1 if (sec>32 & sec<40) | sec==96 | sec==99 replace seccctm=time if (sec>32 & sec<40) | sec==96 | sec==99 replace secccep=1 if (sec>32 & sec<40) | sec==96 | sec==99 sum anysec secep seccc secccep sum sectime seccctm collapse (max) anysec seccc epnum lowqual (sum) sectime seccctm secccep secep, by(hhid) save aggr, replace gen propsec = secep/epnum*100 gen pcc = secccep/epnum*100 keep if lowqual==0 tab anysec seccc sum sectime seccctm propsec pcc preserve keep if seccc==1 sum seccctm pcc restore *stopping here *ln 2797 in CTUR syntax file