Part 1: My Thesis Introduction
This post is the first in a series of posts based on portions of my PhD thesis. The second one is here, the third one is here, and the fourth one is here
My PhD was a grueling, exhausting experience, mostly because I pushed myself pretty hard, but also because nothing went to plan, datasets never came from external sources, and what I thought were the important things to focus on resulted in a less than fruitful harvest. Out of this chaos, however, came enough material to make a contribution to the scientific community or at least my thesis examiners thought so. Because my PhD work wasn’t straightforward and things didn’t materialize until the end, I didn’t end up publishing any papers. One was submitted early on in my candidature, but was rejected twice, and as I will discuss in a future blog post, this paper eventually formed the backbone of my thesis analysis. Publishing papers is essential lifeblood if you are an academic/scientific researcher. The process of getting a paper through peer review is very tough and the comments from reviewers can often seem puzzling, but I am no longer in academic research, so I don’t deal with that regularly.
Being free of the academic publishing process, I thought I’d try and do a series of blog posts on my thesis work to make my research more approachable for my friends and family who haven’t spent years in scientific research and may find scientific writing dreary and stultifying. Like my favorite thing I did in my academic career – teaching – I tried to present the basic concepts as straightforward as possible, while mixing in stories and anecdotes. If you are still trapped working in academia, this blog is not peer reviewed and could be construed as borderline ranting. But, if you find something useful here, you can reference my PhD thesis, which was and is also free to access! I put very few citations in these posts and reference Wikipedia (because I can), so you can go to my thesis to find a bibliography there. I also published all of my MATLAB code used in my thesis here. Hopefully this work will help you in your research and perhaps get your paper published, although, as any PhD student or Postdoc knows, most landlords still don’t accept paper citations in lieu of rent money.
Diffusion MRI
Prior to my PhD, I worked in industry for fifteen years in hardware and software development. After a brief sojourn in academia getting a Masters degree in optics, I started my PhD research and on day three was holding in my hand a human prostate, just extracted via surgery. I have to say the experience was surreal, and spending time in the pathology lab was interesting to someone with no such experience. It also was a sobering experience in that the work we were doing was very real and affecting people’s lives. Prostate cancer is a horrible disease that affects hundreds of thousands of men, is the second most common cancer in men, and the third most lethal. Prostate cancer is difficult to diagnose, for example, the once-promising PSA blood test now has a recommendation against it. Its anatomical location makes it difficult to access for imaging methods, and the methods of obtaining tissue biopsies are, let’s just say, uncomfortable.
Medical imaging technology is improving, though, as are our methods of treatment. One technology that has allowed for significant improvements in prostate cancer diagnosis is MRI. There are many resources out there to learn more about MRI, better than I can explain, and the Wikipedia page is pretty good. In your basic MRI imaging protocols, the signal pulses of the scanner are varied to highlight a specific, desired tissue. Building on that method, these pulses can be modified such that the scanner images how much water molecules in a specific volume move over time. This is known as Diffusion MRI. A small volume of still water, say 1 cubic millimeter, still has “billions and billions” of molecules all bouncing off each other, a process known as diffusion. If we had a system that allowed us to measure how each water molecule moved over a given period of time, we could start a stopwatch and after a certain amount of time (on the order of milliseconds) figure out the total distance in a given direction that one molecule moved after bouncing off its neighboring molecules. If we grouped all of the molecules in this volume and combined the distances that each molecule moved in a given direction, we would find out that these distances would form a Gaussian distribution, also known as the normal distribution or the “bell curve”. If we increased the time that we tracked the distance of the molecules, we would find that this distribution is still Gaussian but is flatter and wider than the previous distribution. This makes sense, since allowing more time for the molecules to move would mean the distances traveled would increase.
Of course, a system that tracks individual molecules would require an infinitely complex mechanism far beyond anything we currently have. While we have the ability to look at individual molecules with various microscopes, this requires extraction of the substance, which isn’t always beneficial when discussing a living, breathing human being. Our bodies are made up of mostly liquids, most of which allow us to live, and pulling all of these liquids out of our bodies kind of defeats the purpose. Besides, in the context of prostate cancer and other cancers in organs and soft tissue, we aren’t so much concerned with the liquids themselves but how they are sloshing around. If we were looking at a volume of pure water as illustrated above, the water molecules would only be bouncing off of each other. In the human body, there are a myriad of obstacles that water molecules interact with, for example, they could be inside cells, outside cells, passing through cellular membranes, etc. If we could conduct the same experiment on these water molecules in cellular tissue, we would find that these various obstacles hinder or restrict the motion of the molecules. Over time the distances these molecules would travel would be less than in pure water. Think of a sample group of humans trying to get various places in an open park versus a crowded bar.
We make use of this effect in diffusion MRI. In the case of prostate cancer, when cancerous tissue invades normal tissue, the orderly cellular structure is replaced by a tightly packed, amorphous mass – a tumor. This can be seen in the classification of prostate tissue via the Gleason grading system. So, an increase in cancer is associated with an increase in cellular structure in a given volume. This leads to more cellular restriction and thus, a decrease in the total distance that the water molecules in these cells will move over time.
Inverse Problems and the Effects of Noise
Medical imaging is often categorized as an inverse problem. We just described the forward problem – an increase in the density of the cellular structure leads to a decrease in the distance that the water molecules travel in a given time. When we stick a person in an MRI scanner, we are interested in the opposite - we want to measure the distances that the water molecules move in a given time, and based on those distances, determine whether there is cancerous tissue present. Of course, inverse problems are orders of magnitude more difficult, and various sources of uncertainty complicate matters. For example, in a real, live person, the water is not only diffusing through cells, but water flows through the body in the form of blood and lymph. Additionally, while it’s important to keep still during an MRI scan so the small volumes of tissue we are interested don’t move during the course of scan, our organs are busy digesting breakfast. The process of absorbing nutrients and moving waste through the intestines involves a lot of involuntary movement, which will affect the accuracy of our water molecule measurements. On top of that, we also have uncertainty contributions from the scanner itself. Fluctuations in the magnetic field and background noise from the electronics itself are just a couple of noise sources in the scanning process.
These noise/uncertainty sources affect the acquisition process and can make it difficult to determine whether what we are detecting is a true underlying phenomenon (the signal) or a ghost where uncertainty has significantly affected a measurement (the noise). Thus, we want to improve our signal-to-noise ratio (SNR). For example, trying to see faint stars at night in the middle of a large city with streetlights all around us can be nearly impossible. Move to a desert location, far away from any local light sources and suddenly millions of faint stars are visible. In relating this back to MRI, a single molecule, bouncing around by itself, will have its own magnetic spin, and this will contribute a minuscule blip on the MRI scanner. This molecular signal, however, will be massively overwhelmed by the amount of noise in the acquisition process and is therefore undetectable. If we combine the contributions of the trillions of molecules together, they give a strong enough signal that is detectable over the level of the noise. As our MRI scanner technology has improved, the background noise level has decreased. This allows us to detect contributions from fewer molecules, meaning we can image smaller volumes, which gives us images with better resolution. We can get great resolution now using special scanners that image small tissue samples, but again this involves extracting the tissue out of the person; something we’d rather not do unless we were sure it was cancerous (the tissue not the person).
So, while we can’t get the infinite precision we’d like to have, we do have a measurement of the combined contribution from a particular volume of molecules. This is a loss of information in that we know the signal from a group of molecules, but we no longer have any idea what the signal of any individual molecule is. (Dramatic Voice) We are now entering the realm of statistics. We encounter statistical measures and this same loss of information in our everyday lives. For example, while we might not want to know individual measures about all 330 million people in the USA, we can combine the values and refer to mean measures easily, such as life expectancy, height, income, etc. A mean height measure could allow us to compare, say, height differences between countries. We could then break down the data and look at the mean heights for all fifty states, individual counties, and so on. If we are using height measurements which are typically normally distributed, the mean is the center point or top of the bell curve. Another useful statistical measurement is the standard deviation, which is related to the width of this curve. We can relate this standard deviation width measure back to the example of the molecular distance distribution above. If we measure over a longer time period, the distance distribution becomes wider, and thus the standard deviation becomes larger. When our molecules follow this Gaussian distribution, the variance of this distribution, another statistical measure which is simply the square of the standard deviation, is a variable called D times t, the time interval over which we assess the molecules. D is known as the diffusivity or diffusion coefficient and is a property of the molecule we are measuring (H20) as well as the ambient temperature. Using a diffusion MRI specific signal sequence, this diffusion coefficient can be related to the measured signal with a specific equation
S/S0 = exp(-b∙D)
In this equation, S is the signal measured with the added extra pulses for diffusion, S0 the signal measured if these extra pulses are turned off, exp means exponential (often seen as just e), and b is a combination of several factors in the MRI scanner sequence. This mathematical relationship is an exponential decay, because as the value of b or D gets bigger, the value of S/S0 is smaller, and plotted, the graph looks like a curved ramp sloping downward. Again, there are various sources to find out more about diffusion MRI and I give a bit more in-depth description in my thesis, if interested. The most important concept to realize here is that we now have a mathematical relationship between the width of the distribution, how much the molecules move in a given time, and the measured signal at the MRI scanner.
Models
This relationship can also be called a mathematical model with the variables in the equation also known as model parameters. This model reduces our physical problem to a mathematical abstraction, which we can easily perform calculations on. A more familiar example of a mathematical model would be: we can drive on the highway at 60 miles per hour (rate) and we can drive 6 hours today (time), how far will we travel (distance)? Since we know our mathematical model in this case is rate x time = distance, we can multiply the values together and get 360 miles. In our diffusion MRI problem, if we are measuring an unknown free liquid, we know the parameters for our scan (b), we measure two signals S (with diffusion pulses) and S0 (no diffusion pulses), we can plug in our values to get the value of D, the diffusion coefficient of the liquid. This works very well in practice and these were among some of the earliest demonstrated experiments in diffusion MRI. However, our model above hasn’t account for an additional component in our measurement, noise. Thus, to better describe our problem, we will add an additional noise component to our model, the Greek letter epsilion, or ε, giving
S/S0 = exp(-b∙D) + ε
This noise term helps us remember that there is a source of uncertainty in this model. Thus, the value of ε may be different for each measurement, but if we took a bunch of measurements, we would see a distribution for ε, which is very often normally distributed. Because of this, to gain a more accurate value of D, we want to take multiple measurements and average the value of these measurements. Because a normal distribution has a mean of zero, increasing the number of measurements means the average value of epsilion tends toward zero, minimizing its value in the equation and giving us a better estimation of D. This estimation of D is known as the estimate with the highest or maximum likelihood of being the true value. While it may not be the true value, it’s the best estimate we have with the number of signal measurements we obtained, so we’ll go with it. While we do have some uncertainty associated with our estimate of D, when measuring free water, our value of D does represent the actual phenomenon happening, a Gaussian distribution of molecular displacements. Going back to our previous example of molecular distributions in the human body, however, the numerous tissue restrictions means that our molecular displacement distribution is almost certainly not Gaussian. Thus, if we performed the same experiment and used the same model on human tissue, we would very likely not be measuring anything real. Yet during the early days of medical applications in diffusion MRI, many researchers realized that this model was still effective at measuring changes in the estimated diffusion coefficient that correlated with changes in human tissue. So, when using this model on human tissue, researchers often denote the parameter D instead as the Apparent Diffusion Coefficient (ADC) since apparently it works! This ADC model has proved very useful over the years, and has become a clinical standard in the diagnosis of stroke, for example, and in my area of research is part of a clinical standard for prostate imaging, known as PI-RADS
As MRI scanner technology has improved so have the capabilities of the ADC model. The examples presented thus far have discussed the determination of how far molecules in a single axis. Researchers have expanded on this and combined measurements in multiple three-dimensional axes to get useful information, especially in the brain. For example, computationally combining the measurements from neighboring volumes allow researchers to create beautiful 3D pictures of the neural bundles in the brain, a process known as tractography. Additionally, researchers have expanded on the above monoexponential ADC model by adding more parameters. Two models will be examined in future blog posts: the biexponential model, a sum of two exponential decay components, and the kurtosis model, a model with an additional component that attempts to estimate how the shape of the distribution differs from a Gaussian. Studies have analyzed data with even more complex models with several parameters all targeting specific phenomena. If these models provide us with more information and allow us to ascertain additional detail, why wouldn’t we always use these models?
The answer also relates to the earlier discussion on noise and SNR. If, again, we had near-infinite precision over time, we could also make our models as complicated as we wanted. However, given our current technology, we only have so much we can do, and that includes the number of measurements we can make. While increasing the number of measurements allows us to get a more precise signal, in the context of an in vivo MRI acquisition this increases the measurement time. Thus, the person has to lie still in the scanner for a longer time, which is more difficult, so the person moving may give us a poorer result in the end. Using a more complex model may seem great because it is more flexible, but this can backfire with a limited number of noisy measurements causing a phenomenon known as overfitting. This more complex model is actually producing estimates based on noise, so the values of these estimates don’t reflect anything real. Thus, in measurements with a lot of noise, sometimes it’s best to use a straight line, a model which is robust against noise. The monoexponential ADC estimate is, in fact, a straight line if we take the logarithm of both sides of its model equation above. In essence, when attempting to model a specific phenomenon under certain measurement conditions, we are looking for the Goldilocks solution – what is just right. Trying to find this optimal solution isn’t always easy and falls under the realm of model selection – a topic I will look at in a future post.
Statistical Inference
I took my first course related to statistics years ago in my undergrad education, and really liked the concepts there. I loved the concept of probability and odds, related both to physics and to games of chance. The statistical concepts and measures, while difficult to grasp at first, are straightforward in what they do. Drawing inferences from these measures, well, that is a whole ‘nother kettle of fish. When you start out doing scientific research, you learn how these statistical measures allow you to tell a good story about how your new model is better because the mean value of the comparable distributions is higher in yours, for example. When measuring human tissue, perhaps our model targets a specific phenomenon, so if we see it in our data we cry out victory, and refer to this as a biomarker, because this statistical measurement MUST be due to these tissue changes. Of course, to get a paper published, we are also convincing two or more human reviewers, so we embellish the language to show the importance of our findings. If it happens to get published and the press picks up on it, well correlation then BECOMES causation (see here), and our study’s importance rises to stratospheric heights.
This is the state of the scientific publishing process we currently find ourselves in, and since academic careers ride on how many papers get published with new and interesting findings, statistical inference gets pushed to the limit. In fact, strains and cracks are beginning to show in this huge body of scientific evidence. Over the last several years, the field of social psychology has developed a replication crisis due to other researchers going back and replicating earlier studies, and finding no effects whatsoever. This isn’t confined to just psychology research either with recent research finding that about $28 billion dollars of preclinical biomedical research was found to be irreproducible.
There are many blogs, articles, and papers out there that attempt to get to the heart of this crisis, and I encourage you to read as much as you can find. The first statistical measure that is often targeted in these articles is the p-value. The p-value is a beautiful measure as it does one thing really well. If you are comparing two hypotheses (say “Nothing will be seen when performing this test” and “A real effect will be seen when performing this test”) if the nothing or null hypothesis is really true, then the p-value is the rate at which you would see a false positive effect over repeated tests. So, if your p-value falls under the magical significance value of 0.05, this means that over repeated experiments where there is no true effect, you would see false positives less than 1 in 20 experiments. So, if you are performing such an experiment and happen to get a p-value less than 0.05, you might interpret this as a surprising result, so maybe you want to conduct additional research and do more severe testing to determine whether there is a real effect in your research.
The current scientific literature seems to have bypassed this element of surprise and have given the p-value near mythical powers. Obtaining a p-value less than 0.05 for a given statistical test seems to be the be-all and end-all of scientific publishing today. Meeting this 0.05 significance level means your paper is demonstrating a significant result, and is therefore worthy of publication. The interpretation of this p-value is also often taken of evidence that your hypothesis of there being a real effect is now true – a power which the p-value does not have. There are many statistical resources out there where you can find out more about p-values, but a good starting point is the recent publication by the American Statistical Association on the recommendations of using p-values link. It gives a good synopsis of the current problems with p-values and statistical inference and what p-values can and cannot do. I also highly recommend reading the supplemental material with this publication, which has several commentaries from various statistical researchers. My favorite one of these is by Prof. Andrew Gelman, which I found so useful, I added two quotes from there to my thesis conclusion section:
“…we tend to take the ‘dataset’ and even the statistical model as given, reducing statistics to a mathematical or computational problem of inference and encouraging students and practitioners to think of their data as given. Even when we discuss the design of surveys and experiments, we typically focus on the choice of sample size, not on the importance of valid and reliable measurements. The result is often an attitude that any measurement will do, and a blind quest for statistical significance.”
and
“…it seems to me that statistics is often sold as a sort of alchemy that transmutes randomness into certainty, an ‘uncertainty laundering’ that begins with data and concludes with success as measured by statistical significance…This is what is expected—demanded—of subject-matter journals. Just try publishing a result with p = 0.20”
I like Gelman’s commentary (as well as his blog) because it doesn’t demonize the p-value but instead points out that we attribute magical powers to statistical measures. Statistics doesn’t necessarily provide powers of causation, yet we treat it as such, and if one statistical measure or model doesn’t give me the results I want, I’ll keep trying another and another until I get something that works. This process leaves students bewildered and always asking – is this an OK thing to do? Does anyone know? What is known is that variation and uncertainty will probably not get your paper published, so while you are trying to prepare careful analysis, some other research team has just submitted similar material with decrees of significance to a journal. Guess who’s likely to get grant money in the future…
Outro
I’ll finish this post by commenting on another portion of statistical analysis – the software. Like statistics, statistical software is often taken as a means of producing significant result. With the current processing power of computers, I can take all of my data and crank them through several different algorithms and models in a day and analyze my results. I take my interesting results that demonstrate some sort of effect and publish. Why look at the data? The statistical measures show significance – what could be wrong? Prior to my academic research, I spent over ten years developing software in various languages, and got quite handy debugging various software languages. The related blog posts to follow are based on the result of my “snooping around” in the inner workings of the code of various statistical algorithms. From some of these investigations, I determined that some models and algorithms that are currently being used in the scientific literature may be leading researchers to false conclusions. If you are interested in learning more, feel free to continue on to my second post.