- Determine the size of your target population. Let’s say you want to survey pediatricians in the United States…a quick Google search (search terms = “how many U.S. general pediatricians”) points to the American Academy of Pediatrics Division of Workforce and Medical Education Policy webpage, which reports a total of 58,726 U.S. general pediatricians (based on data from the 2006 American Medical Association Masterfile). So, in this example, the target population size = 58,726.
- Determine how big a sample is needed to represent the target population. Thankfully, there’s an abundance of free sample size calculators online. I typically use this one. Four things are needed to calculate sample size: 1) margin of error, 2) confidence level, 3) population size, and 4) response distribution. Actually, the only thing you really need to know is population size (which for U.S. pediatricians is 58,726). Just like we all accept P < .05 as the benchmark for statistical significance, the standards for margin of error, confidence level, and response distribution are 5%, 95%, and 50%, respectively. Use the sample size calculator referenced above and U.S. pediatricians as the target population (the recommended sample size is 382).
- Before you start surveying, there’s one more important (and often overlooked) step: pulling a random sample from your target population for your survey pool. To do this, you’ll need to estimate your survey response rate. The best way to estimate your survey’s response rate is to see what’s been achieved in other studies. Relevant to our pediatrician example, a quick PubMed search (search terms = email + pediatricians + survey) identifies the following:
- McMahon SR, Iwamoto M, Massoudi MS, et al. Comparison of e-mail, fax, and postal surveys of pediatricians. Pediatrics. 2003;111:e299-303. (article)
This study of pediatricians in Georgia reported a 26% response rate to an email survey (after two invitations). So if I’m expecting a 26% response rate (assuming I’m doing a web-based survey of pediatricians), and my recommended sample size is 382, then I will need to randomly select 1,469 U.S. pediatricians from the AMA Masterfile (based on this calculation: 0.26[x]=382). A 26% response rate from 1,469 U.S. pediatricians randomly selected from the AMA Masterfile will meet my sample size requirement of 382.
You need to pull a random sample to reduce concerns such as self-selection bias (i.e., respondents’ decision to participate in your survey may be correlated with traits that affect the study, making the participants a nonrepresentative sample). There are a number of ways to pull a random sample, as well as a number of factors that dictate which method to use (click here for the Wikipedia summary).
Here’s one method for pulling a simple random sample using MS Excel:
- Start with an Excel spreadsheet containing everyone in your target population (continuing with our example, that would be 58,726 U.S. pediatricians).
- Create a new column (call it “random sample”), and type this formula in the first cell: =RAND(). This will provide a random number between 0 and 1.
- Copy and paste this formula into all cells in that row. You now have a random number in each row.
- Sort the entire worksheet based on this column. Select the first however many that are needed for a random sample (in our case, the first 1,469). This is your survey pool. Of note, after the “sort,” the random number in each row will recalculate (making it look like they were never sorted). Ignore this. The numbers were sorted first (by ascending or descending), and then the random values recalculated. This column will recalculate every time you run a function in this worksheet.