Social Statistics and Data

Data Parameters

Before looking for data, it often helps to define specific parameters you're looking for. These include:

  • Subjects: Exactly what do you want counted?
  • Time Coverage: What time period(s) should the statistics cover?
  • Geographic Coverage: What geographic area(s) should the statistics cover?
  • Cross-tabulation: Do you want the data broken down by characteristics like sex, age, income, occupation, or industry?

It often helps to draw up an example of the table you're looking for. What elements appear on the X axis? the Y axis? What units of measurement do you want? Do you need some parameters to be grouped in specific increments? You may not be able to find numbers that exactly match your example, but knowing which elements you want will help your search tremendously.

Time Coverage

Do you need the numbers broken down by decade, year, or month? Are you looking for the most recent data or historical data? Things to watch out for:

  • Most government statistics provide snapshots, not historical time series, so you may have to look in multiple volumes to get data over time.
  • There's usually a lag between data collection and publication. For example, the Census is done on every tenth year, but the most detailed reports take 3–5 years to be compiled and published. In many cases up-to-the-minute statistics are not available.
  • Definitions can change over time, even in the same statistical source. (See Defining Race below for a good example.) Be sure to check the exact definitions of what's being counted in each source you consult.

Geographic Coverage

Typical geographic areas include nations, states/provinces, regions, metropolitan areas, counties, and cities. Things to watch out for:

  • Boundaries can change over time, so be sure to check associated definitions or maps to ensure consistency.
  • Statistics for small areas (neighborhoods, zip codes, or census tracts) present unique challenges. See the guide for Neighborhood Research and Community Analysis.

Defining Race in the US

Breaking down data by race is one of the most common forms of cross-tabulation researchers look for, and, unfortunately, also one of the most complicated. Most government statistical agencies in the US use the racial categories defined by the Census Bureau, but those categories have changed dramatically over the years. For example, early censuses recognized only two races (white and black), and race designations were made by census-takers who gathered the statistics. In contrast, the 2000 Census had six main race categories (American Indian or Alaska Native, Asian, Black or African American, Native Hawaiian or Other Pacific Islander, White, and Other), determined race solely by self-identification (a "white person" was anyone who checked the box labelled "White"), and was the first census to allow people to check multiple race boxes.

Another factor which many people find confusing is that of Hispanic origin. With a few historical exceptions, the Census Bureau has not recognized a racial category for Latinos. Instead, in most recent decades there is a separate yes/no question regarding Hispanic origin. So anyone who is "of Hispanic origin" is also counted under one of the main racial categories.

It's always best to double-check the subject definitions used by the Census Bureau in each separate census publication. Those definitions are usually available in the back of the book (for print products) or in separate technical documentation files (for digital products).

Occupation and Industry Categories in the US

Occupation refers to the type of job an individual holds. Industry refers to the type of business. The two don't necessarily coincide. For example, a computer programmer may work for an advertising agency, and an advertising sales agent may work for a software company.

Occupations are usually classified by Standard Occupational Classification (SOC) categories (developed by the Department of Labor). Each occupation is assigned a 6-digit code. The first two digits represent a broad occupational classification, and additional digits represent more detailed sub-divisions. For example

  • 11 = Management Occupations
  • 11-10 = Top Executives
  • 11-1030 = Legislators

Historically, the US government classified businesses using Standard Industrial Classification (SIC) codes. However, beginning with 1997 most data products switched to the North American Industry Classification System (NAICS). Concordance tables are available for conversion between the two systems. Both SIC and NAICS codes are hierarchical in nature. The first two digits represent a broad industry category, and each digit afterwards represents a sub-division. For example, in NAICS

  • 44 = Retail Trade
  • 445 = Food and Beverage Stores
  • 4452 = Specialty Food Stores
  • 44523 = Fruit and Vegetable Markets

The US Census Bureau uses its own sets of codes for both occupations and industries, but mappings are provided between these and the regular standards.