The first thing is to not automatically accept the conclusions as written in the abstract at face value. An abstract is like a sound bite – it needs to grab the attention of the reader, enough that they’ll then go looking for (and buy) the full paper. It needs to be exciting enough to get published somewhere. And, it will probably reflect the biases of the researchers. Ideally, a study is done without bias – yes the researchers are testing a hypothesis (or theory) about something, or looking for connections that they suspect are there – but they are supposed to be open to whatever the data shows. Unfortunately, in my experience, many researchers are really looking to support what they already believe…so a grain of skepticism is definitely in order, even when the conclusions are what YOU are looking for! It can be dangerous to make conclusions about a given topic based only on the available abstracts. To really understand the research you need to read the full studies.
So, what do I look for in the body of the abstract or better yet, complete study? First thing – how many women were in the study groups? The bigger the better – a study might show that there’s no difference in maternal outcomes between VBAC and elective repeat cesarean but if they only compared 100 women in each group, they probably didn’t look at enough women to see any changes. I don’t completely discount a small study, sometimes that’s all you have, but you need to be aware of the pitfalls. Hopefully, the authors will address this in the full paper…but sometimes it’s pretty hidden and often there’s a lot of statistical manipulation involved (more on statistics later).
Next, I look at the methods – most studies in obstetrics are “retrospective” which means the researchers looked back at events that already occurred, comparing two groups of women (the study group and the control group, hopefully matched such that they are “the same” in every way except for the one event of interest). I look to see if they looked at discharge papers (the billing paperwork for insurance) or did they actually look at patient records? A study is much more accurate if the second method is used, but it is much more time consuming to do it that way (and thus, uncommon). There’s been research that’s shown that insurance coding (the codes used by the insurance industry to identify procedures) is often very inaccurate. So, a woman might be coded as a uterine rupture when in fact she had an extension of the incision during surgical delivery – similar events but definitely not the same if you are studying the incidence of uterine rupture! Most retrospective studies depend on these codes, call ICD-9 codes. If a study is done this way, be aware that the data may be flawed from the beginning.
Every once in a while, you’ll have a study that’s prospective – that means that the study groups are set up ahead of time and then followed – this is supposed to be a more trustworthy way to do a study and in many cases it is, but in the area of obstetrics, its important to realize that rarely are women going to be randomly assigned to a group (very important for a prospective study to be much better than a retrospective study). A good example of this is the now infamous Hannah Breech Trial. Supposedly, women with babies in breech presentation were randomly assigned to one of two groups – surgical delivery and vaginal birth. The problem is, that’s not really what happened – it wasn’t random and the fact that it wasn’t did impact the results. So, if a study claims to be “prospective” look very carefully at the criteria used to make the selections. The gold standard for research is a double-blinded, randomly assigned study. This would mean, in the case of a breech study, that neither the women nor the Obstetrician knew whether she was in the surgery group or the vaginal birth group. Obviously, this isn’t going to happen. And if the people involved know the assignment, biases can change the results.
Similar to this is the need to look at what’s considered the study group and what’s considered the control group. Periodically, there will be a minor splash in the media about some paper that has once again shown that epidurals don’t increase the cesarean rate. Just about any anesthesiologist will tell you this with absolute certainty. The problem is that the control group (the group that didn’t get an epidural) was not what most people expect (drug free). These women were simply women who used other forms of pharmaceutical pain control. So, it may be true that getting an epidural doesn’t increase your chances of a cesarean compared to a woman who’d getting IV injections of narcotic but the research says nothing about how it compares to a drug-free labor. This is almost never evident in popular press versions of research and if you don’t read carefully, you might miss it in an abstract too. Remember, researchers come to the research with their own biases – anesthesiologists don’t have any reason to study women who aren’t using anesthesia.
The next thing that’s very important in reading obstetrical research is to look at what procedures and measurements are “controlled for” (that means they documented them and included them in the analysis) and what underlying assumptions are made. For example, there is research indicating that macrosomia (“big baby”) is a risk for uterine rupture. This leads to the conclusion that women with a suspected big baby should just have a repeat cesarean. The problem is, these studies didn’t ask whether these women with the suspected big babies were induced or not. We know that typically, when a woman is suspected to have a big baby, she is induced, often early. Other research shows that induction is a real risk factor for uterine rupture. If you don’t ask whether these women who had big babies were induced, then you really don’t know what caused the uterine rupture. Turns out, when you do control for induction, a big baby isn’t a risk for uterine rupture. With regard to underlying assumptions, another problem with the Hannah Breech trial was the selection of the obstetricians. The assumption was that they were all equally skilled in breech vaginal birth. This proved to be incorrect and thus, invalidated the data on whether cesarean delivery is safer than vaginal birth. When reading the research, you need to pay close attention to what was “controlled for” and what wasn’t and what assumptions are made. This happens too often but particularly in the research that looks at risk factors for a particular poor outcome (failed trial of labor, shoulder dystocia, etc) – factors like induction or malpositions (eg. posterior presentation) or epidural use are often not taken into account – this throws the results into doubt.
If you have the full study, the section called “Conclusions” can be very enlightening. This tends to be where the underlying biases come shining through, especially when the results were not what the authors were looking for. In a recent study on suture layers in the uterine closure after a cesarean, the authors were looking at whether a single-layer closure was a risk factor for uterine rupture. A previous study had indicated that it was, fairly dramatically. But this study found exactly the opposite. The authors spent a lot of time trying to figure out how and why this could be true. Finally, they concluded that it was not a risk factor but made sure they mentioned that the incidence of non-clinical dehiscences (a benign separation of the previous incision) was actually higher in the single-layer group and then speculated that maybe this is important, even if their study didn’t show it. Obviously they didn’t get the results they expected (or wanted) and this was their way of salvaging something from the study.
Mark Twain said “Figures often beguile me, particularly when I have the arranging of them myself; in which case the remark attributed to Disraeli would often apply with justice and force: "There are three kinds of lies: lies, damned lies and statistics." I’ve taken 3 different statistics course, 2 of them at the graduate/doctorate level. And I can’t keep up. With the advent of easy and high powered desktop computing, it is possible to do analyses that were impossible even 10 years ago. So…do we trust them? I really don’t know. I was taught that the analysis is only as good as the data you put in and obviously, I have a problem with a lot of the data. I will say that I do have concerns with what I refer to as research mills – research groups that have a single large group of women as a data pool and then they analyze that same group in multiple different ways to get multiple different conclusions and thus publish multiple different papers. So, what do I do about my inability to determine the legitimacy of statistical results?
I call it the bullshit test, or the intuitive test if you prefer. Remember that all statistics do is give you the probability that 2 things are correlated. Yes, you can be sure that event A is correlated to event B with 95% certainty (there is only a 5% chance that the events are not correlated and what you are seeing is simply random chance) but statistics do not prove cause and effect. They do not say that there’s a 95% chance that event A causes event B. There is a study on the use of evening primrose oil (EPO) in primiparous women to see if it has the effect of shortening gestation and labor. The results of the study indicated that no, it doesn’t, in fact the women who took it had longer gestations and more complications during labor. The abstract indicated that women who used EPO during pregnancy were more likely to need pitocin and more likely to need vacuum assistance during labor. The findings were statistically significant. But what does it mean? How does EPO “cause” a woman to have vacuum assistance? How does it “cause” her to have more pitocin? Yes, there’s strong correlation between those things but my red flags go up when I see them reported as “significant” in an abstract because honestly it just sounds silly. Which leads me to believe that the authors really wanted to prove that EPO isn’t effective and were willing to blame just about anything on it, as long as it was statistically significant. Maybe EPO is a problem, I don’t know. I do know that the authors lost credibility in my eyes, making me suspicious of their motives and thus, not trusting of their results (or their interpretation of their results).
The final thing to remember is that research only goes so far. It is about groups of women, not individuals. It can be used to identify risk factors, to counter the common misinformation about pregnancy and birth that many women receive in standard prenatal care, to get a sense of “the odds” for a particular outcome. But what it can’t tell you is what you will experience. Ultimately, what is important is what a woman knows in herself. That knowing can include research in its formation but it should also include factors that aren’t so measurable. There comes a point when the research has to be let go, and trust needs to grow. Use the research but don’t let it control you or your decision making. And remember, the largest study group ever, all the women who’ve come before you have proved that most of the time, for most women, birth works. It really is as safe as life gets.
*To access Medline, I use PubMed: http://www.ncbi.nlm.nih.gov/entrez/ It takes a bit of practice to figure out the best search terms to use and it is easy to find yourself there for hours on end but I’d encourage anyone to go try it.
April 28, 2006