General Questions 
What can I find in the webtool that is not in the written report?

The written report profiles and analyzes metro-specific levels of educational attainment for a selected number of occupations, including computer software developers, engineers, management analysts, registered nurses and primary school teachers. Our webtool allows users to access the entire database of more than 300 occupations and customize their own search and analysis. The webtool also incorporates Bureau of Labor Statistics (BLS)' data on average annual salary by occupation for each metro in 2011. (Why do the salaries differ from the original BLS data? Jump to the explanation.

Why do you provide educational profiles by occupation for just the 50 most populous metros? 

We compiled the dataset using the Census Bureau's Public-Use Microdata Samples (PUMS). The PUMS files contain a sample of anonymous individual records collected by the Census Bureau. The rule of thumb is that the larger the metro, the more records available to calculate metro-specific statistics. This makes the numbers more reliable (with a smaller margin of error). The reliability issue is of particular concern when we go beyond a few general occupational categories and drill down to more than 300 occupations. What is the reliability issue? Jump to the explanation.

Why can't I find certain degree levels for the occupation I am interested in? 

Because different occupations have different educational profiles and entry requirements, some levels of educational attainment are not relevant for certain occupations. (One obvious example is that a high school graduate cannot practice medicine.) We chose the most pertinent levels of education for each occupation based on its national distribution in 2010. How is this done? Jump to the explanation.

Why do you focus on workers who are at least 25 years old? 

Part of the workforce, particularly part-time workers, consists of young adults who are still pursuing degrees while working. To avoid penalizing metros with younger workforces when we compared and ranked educational attainment, we focused on workers who are at least 25, which is the average age for university graduation among developed countries. However, in the statistical analysis of our written report, we estimate returns on education by regressing the average years of schooling among those who are at least 16 years old on a number of explanatory and control variables. We are not concerned about penalizing metros with younger workforces in this case because we can control for the cross-metro differences in that regard.

Why don't you use the Standard Occupational Classification (SOC) and the North American Industry Classification System (NAICS) for your occupation and industry classifications, respectively?  

The current official classification systems such SOC and NAICS are revised regularly and did not even exist in 1990. (Although SOC was initially introduced in 1977, it was completely restructured in 1998.) This makes it a treacherous task to compare educational attainment by occupation and each occupation's top hiring industries across three decennial years. We circumvented this issue by adopting the time-consistent occupation and industry classifications provided by the Integrated Public Use Microdata Series (IPUMS). For more details, jump to Data Construction.

How do you define "worker"? And how did you assign each worker to a metro? 

We defined workers as those who were currently "at work," whether they worked full-time or part-time. For example, workers on disability leave were excluded from our calculation. This way, we measure the educational attainment of the portion of workforce that is directly involved in the operations of a regional economy. We also wanted to assign workers to the metro where they work, as opposed to where they live. In the PUMS files, we can only identify the place of work for those who are currently at work.


Methodology
How did you construct the dataset? 

We constructed the data set using the Census Bureau's Public-Use Microdata Samples (PUMS). Specifically, we downloaded from IPUMS the 5 percent census files for 1990 and 2000, and the 2006-2010 American Community Survey five-year file. Each file contains the number of individual records that equals to 5 percent of the U.S. population. The major advantage of this approach is that IPUMS provides alternative variables of occupation and industry (OCC1990 and IND1990, respectively) that are consistently defined over time. In addition, the PWMETRO variable can be readily used to identify the metropolitan area where a worker is employed.

However, there are two major concerns about the PWMETRO variable. First, to construct the panel data set utilized in the statistical analysis in the written report, we needed to match up the metropolitan areas defined by PWMETRO with the current definition of metropolitan statistical area (MSA) that is adopted by other data series we used for the analysis. Our matching strategy is predominantly by MSA names. When conflicts arose or when there was no match by name, we used the county components of each metro. Finally, because the matching process was not necessarily one-to-one, we consolidated metros that could not be uniquely matched in either of the definition files. At the end of the process, we had matched up 299 metropolitan areas, some of which did not contain observations for certain year(s). In the finalized panel data set, we kept 261 metropolitan areas in total, with a small fraction of them lacking observations in 1990.

Second, IPUMS stresses that PWMETRO does not completely identify a metropolitan area in many cases, with some cases more severe than others. That is, some individuals who are employed in a metropolitan area are not assigned the corresponding PWMETRO code due to a technical issue in the identification process. Since the unidentified individuals may be very different from the rest of the employed, this can bias the estimates of educational attainment distribution. To assess how serious this bias may be, we compared for all metropolitan areas the percentage distribution of different levels of educational attainment (including less than high school graduate, high school graduate or equivalent, some college or associate's degree, bachelor's degree, and graduate or professional degree) that were computed by us with those provided in the Census 2000 EEO data, which is a special tabulation for the Equal Employment Opportunity Commission. We found that the two data series provided very similar estimates and that the deviations from the EEO data were more likely caused by issues other than the incomplete identification of metropolitan areas.

The computation of educational attainment was conducted in the following steps. First, we calculated the population counts for each sub-group (e.g., the number of employed management analysts holding at least a master's degree in Washington-Arlington-Alexandria, DC-VA-MD-WV), using the individual sampling weight for each observation in the data set. Based on the total and sub-group population counts, we then calculated the percentage distribution of different levels of educational attainment. Finally, we calculated the average years of schooling by assigning the total years of schooling for each level of educational attainment as listed in the following table.

What is a margin of error (MOE)? Why should I be concerned when reading metro rankings?  

MOE measures the sampling error in PUMS and helps us gauge the reliability of the educational attainment statistics that we calculated from these census files. The smaller the MOE is, the more reliable the corresponding statistic, or point estimate. And the more observations we have in a sub-sample (e.g., engineers holding at least a bachelor's degree in San Jose-Sunnyvale-Santa Clara, CA), the smaller the MOE. Generally speaking, national statistics are more reliable than those for large metros, which are more reliable than those for small metros. Similarly, statistics for more aggregate occupational groups tend to have smaller MOEs than statistics for detailed occupations do.

We obtained the margin of error for each estimate using the design factor method detailed in the technical documents provided by the Census Bureau for users of PUMS files. We used the design factors determined at the national level and calculated it at the 90 percent confidence interval.

The MOE has important implications for interpreting metro rankings. We ranked metros based on the point estimates at each relevant level of educational attainment. In general, comparing closely ranked metropolitan areas should be done with caution because the differences in the estimated share of workers may not be statistically significant. The larger the margin of error is, the more caution is required, even for distantly ranked metros.

How do you determine the relevant degree levels for each occupation? 

We chose the most pertinent levels of educational attainment for each occupation based on its national distribution in 2010. We ordered all individuals according to level of education, from doctoral degree to less than high school. Then we identified the educational attainment of the individual located at the 5th, 10th, 15th, 20th, 25th, 50th, and 75th percentiles. The reason we examined more percentiles at the upper end is because we wanted to identify advanced degree holders.

Take physicians' assistants, for example. The individuals at the 5th to 10th percentile have a professional degree, at the 15th to 25th percentile have a master's degree, at the 50th percentile a bachelor's degree, and at the 75th percentile an associate's degree. Therefore, users will find four national statistics for this occupation: 12.62(±4.65) percent have completed at least a professional degree, 42.22(±3.78) percent have completed at least a master's degree, 71.71(±2.64) percent have completed at least a bachelor's degree, and 84.74(±1.94) percent have completed at least an associates' degree (for year 2010). For some other occupations, however, because the vast majority of workers have similar levels of educational attainment, users will find as few as two national statistics. For example, among physicians, 13.02(±1.53) percent have completed a doctoral degree while 95.42(±0.35) percent have at least a professional degree.

Why do the 2011 occupational salaries differ from the original BLS data?  

As explained in Data Construction, we defined metros and occupations somewhat differently than the official systems do. Therefore, we needed to match our classifications with the metros and occupations provided by the BLS in its 2011 Occupational Employment Statistics. When multiple occupations in the BLS data matched a single occupation in our definition, we calculated the latter's Average salary using the estimated employment and Average salary for each corresponding BLS occupation. A similar procedure was applied when multiple metros in the BLS data matched a single metro in our definition. (We did not find cases where one BLS occupation/metro matched multiple occupations/metros in our definition.)