Ever been questioned about your data? Not questioned one-on-one sitting in your office, but more of an "on-the-spot, everyone I know is watching" kind of questioning? Yeah, not fun. As a healthcare data analyst, you aim to provide data that is accurate, representative, and helps the right people make the right decisions. So what happens when someone calls out your data in front of everyone and claims that it doesn't represent reality?
Yeah, I know how that feels. I've been there. But how do you really know if what you are providing does actually represent reality? (Psst - if you aren't sure if it represents what is really happening, how can we expect others to trust what you are providing?)
There are many ways (unfortunately) that data can become misaligned with what is actually happening and provide an inaccurate picture; formula errors, data extraction errors, data entry errors, and so on - but the one I want to talk about in this blog - random sampling - helps ensure data accuracy.
Random Sampling, what's that?
Random sampling is a technique used when it is not feasible or practical to obtain and analyze an entire population of data. In statistics, a population is the complete set of data for the question of interest. We can use random sampling to obtain a subset of data from the whole population in order to estimate what the entire population is telling us. That's a mouthful, I know.
Let's say you wanted to estimate the average length of stay of a hospital inpatient over the past 6 months. If you could easily obtain all of the patient length of stay data, you could just use software to add up all the individual "lengths of stay", divide by the total number of patients and "presto!" you'd have the average length of stay of the population (hospital inpatients in the last six months).
In the day of electronic medical records, the data for the whole statistical population is becoming more readily available electronically - which greatly simplifies our data collection. But what if you were interested in learning more about length of stay and the underlying causes - maybe something that is not available in a report and would necessitate a chart audit or some other manual data collection process.
How can you ensure that your sampling is representative of the whole?
When you sample, the key is to make sure your sampling is random - meaning you can't just take 15 patients from Unit A and 15 from Unit B and 15 from Unit C and so on. Nor should you just list all of your patients in order and take "every 10th" patient. You should have a method to randomly select - free from any selection bias.
Microsoft Excel has a really easy formula to truly take a random sample from a data set. Check out the video below and I'll show you how it works.
Christopher M. Spranger, MBA, ASQ MBB
Want to receive free tips on how to use Lean Six Sigma to improve your business?