Learning from data

In most fields of human endeavor, there has been a sea-change, a revolution in technology, over the last decade which has gone largely unrecognized or acknowledged outside of the IT industry. It has been in the area of what is known either a machine-learning or data-mining. These are different tactics for accomplishing the same goal – learning from data.

What makes Google such a formidable competitor in the ads space is machine learning. What makes my bank now able to do such a good job of warning me about possible fraud is machine learning. What makes travel companies so good at pricing is data mining and machine learning. If I were giving any aspiring student going to a university to study computer engineering advice, it would be to focus on this area. It is almost like magic. We see it in subtle ways like NetFlix movie recommendations, but this is just the tip of the iceberg. Beneath the waves, almost every field is moving in this direction. And, these systems are dynamic and rapid. They are constantly learning and constantly improving.

There has been one notable exception. Health care. Machine learning and data mining do require a lot of data. Since you aren’t able to do controlled double blind randomized experiments, you need enough data to make the conclusions statistically significant in a messy data world. But given enough data, learning can and does happen. We are poised at the beginning of a similar sea-change in health care. As vast amounts of personal health care data start to get collected we will start to learn what is actually effective and what isn’t for whom. This is really a prerequisite for personalized health. The term is used loosely to mean giving people the personalized advice/treatment that they need based on their data. But the only way to personalize is to know what’s effective for whom. Some of this will doubtless be based on genomic information. But far more will just be based on looking at what is working for whom based on their conditions, ongoing test results, and treatments.  And this is key. The human body varies tremendously based both on environment and on inheritance. One size doesn’t fit all.

Until recently, a lot of machine learning from health data has been still-born for 3 reasons:

  1. It has been too hard to translate what is known about personalized medicine from research into clinical practice. This is known as the “translation” problem. But online tools that do know these things are going to rapidly change this failure in translation in the decade to come.
  2. There hasn’t been nearly enough data because almost no data was automated and, even when it was, it wasn’t tracking the data over the individual and their treatment plan. Instead, it was tracking the order over the insurance number and the practice because that’s where the money was. Between ARRA’s meaningful use mandates which are going to force tracking against the patient and the burgeoning consumer movement to take charge of their own health as the system increasingly limits their access to continuous care from physicians, this lack of data is going to change at least as profoundly in the decade to come.
  3. There was no money in giving consumers personalized treatment and indeed movements against it, both the population studies (witness the debates right now about diabetics being told to lower their blood sugar) and because the doctor’s weren’t paid for outcomes. But consumers are going to demand the treatment for the best outcome. Also we’re learning that often, it will cost less. Often the standard care given is too much treatment, so brilliantly called out in the book “Overtreated” and, paradoxically, your outcomes are better as the cost goes down, not up.  Back surgery tends to be a post-child for this, also called out well in the book “Flatlined“. We are going to be forced to figure out how to be more cost-effective, and more effective in general in treating illness.

All the systems emerging to help consumers get personalized advice and information about their health are going to be incredible treasure troves of data about what works. And this will be a virtuous cycle. As the systems learn, they will encourage consumers to increasingly flow data into them for better more personalized advice and encourage physicians to do the same and then this data will help these systems to learn even more rapidly. I predict now that within a decade, no practicing physician will consider treating their patients without the support/advice of the expertise embodied in the machine learning that will have taken place. And finally, we will truly move to an evidence based health care system.

14 Responses to Learning from data

  1. From reading your writings, I have gathered that we share to some degree the notion that it will take a more holistic approach to getting on-demand electronic health records implemented…including getting the patients (mass public) pushing for it. To date, most of the EHR movement has been directed by vendors, politicians, etc, toward larger healthcare providers (read: hospitals) and insurance companies, where it is a comparatively easy sell based on cost savings to them.

    In my work, I have dealt with a significant number of small private healthcare practices (clinics and physicians offices), who simply don’t perceive a ROI to them esp given what appears to be a very expensive investment. I’m sure I’m not telling you something you don’t know about here, and that your vision provides for a way such primary and specialty care providers can participate in EHR.

    But what may turn out to be the primary driver for EHR adoption are the patients themselves, IF they learn the benefits firsthand in a meaningful way. SO…providing patients with the means of “taking” their health records with them everywhere they go and showing them when they need to any healthcare provider may do the trick. How to do this? A viral way of accomplishing that may be providing patients a way to log into and display their EHRs via a smart phone web browser, or from their doctors’ web browser. And perhaps you get patients to start doing this because (1) it’s free, (2) it’s relevant info, (3) it’s easy, (4) it evokes excitement, humor, passion, or some other appropriate emotion (ie marketing 101), and (5) it ultimatley benefits them personally. As more people become exposed to the idea of online health info on the patient side, the more pressure to implement naturally occurs on the provider side via politics/regulation, technologies/solutions, and clear ROI factors.

    BTW, on point (4) above, something healthcare-related that is an analog to what you find at http://www.propel.com is one such tool that evokes passion; Propel is a retail distributor of biofuels, and they help their customers “feel good” about the carbon reduction they’re accomplishing by tracking their biodiesel purchases and calculating how much less CO2 they produced and equivalent number of trees for consumption. Perhaps someone can think of a way to dynamically track one or more benefits of shared EHRs that can be displayed to users as a feedback metric on how great this all is.

    • adambosworth says:

      Totally agree that the patients themselves are going to drive EHR’s as they find that they are actually useful.

  2. John Phelan says:

    I totally agree with the sentiment, and the passion. Creating this change is tough and continues to be a huge challenge. Why did the stimulus package not incent consumers to have their own health records? How about a $150 per year tax break for those consumers who can prove that they have their records with Keas, Zweena, or the myriad of other companies. The key here is incentive. Most consumers just do not know what they are missing until they have the opportunity to become one of our customers. Then it is like the cell phone: how did I ever live without it!

    Keep up the great work!

    • adambosworth says:

      I’d love to see an incentive program for people but I’d tie it to them actually doing the work, not just being enrolled and ideally making progress. The cost of having an unhealthy life style is one we all pay for and as with car insurance where being a good driver and doing the right things lowers your premium, shouldn’t we consider something similar for having a healthy life style and doing the right things?

  3. Neil says:

    Isn’t one of the lessons we are learning about ‘personalized health care’ that data mining might be less useful in these situations?

    For example, if a cholesterol drug works for 80% of the people, but not for me, due to some unique genetics, I’m likely to be lost in the noise. Data mining relies to some extent on generalizability — e.g., most people type “britney spears” correctly — , but presumably this is exactly the opposite of where medical/pharmaceutical research is heading.

    Although problematic, it seems expert systems and the like still have some role to play.

    Maybe what you are saying is that a) without more data, we can’t even tell whether that is the case and b) even ‘personalized’ treatments will still help a large enough group that data mining can detect it.

  4. Rob Cowan says:

    One of the problems we have found in translating data mining for populations into data mining for individual patients has been the assumption that there is an economy of scale. By this I mean that, as presently conceived, and as you quite correctly point out, the quality of the data mining is dependent on having a lot of data. But for an individual patient, there isn’t a lot of data, and the process of generalizing to increasingly larger groups of “similar” patients actually decreases efficiency rather than increases it, as it does with Netflix preferences. Thus, it might be that the reason for such slow progress with respect to healthcare data mining is that the model is wrong.
    My colleagues and I have been developing a program that collects a high level headache history, using AI technology, to generate a narrative history that can be presented to a primary care physician as if it had been collected and formulated by a headache specialist. We then take this data set and use it first to obtain a best fit and anomaly picture using formalized diagnostic criteria and then fitting components for treatment options ( a large data set), that considers both comorbid conditions, best practices, etc. Then, the patient monitors headache frequency, severity, duration, and interventions with a computer-based diary which feeds back into the headache history to suggest modification of the treatment plan and diagnosis over time.
    In this way, rather than generalizing to a larger group, and losing specificity (the goal in developing an individual treatment plan) we are able to use data mining of large sets (treatment options, drug data bases, psychological profiling, etc) in combination with a slowly accumulating smaller data set for the individual. Obviously, this model should work for other chronic diseases as well, but we are well along in the development of this for headache, and it seems to be working well. I think that efforts to find the largest group of patients for a standardized treatment protocol will have significant limitations unless we cone down at the same time.

  5. […] that wants to help people understand their health data, set health goals and pursue them. He has a new blog post where he talks about machine learning in the context of health care. He (probably rightly) sees […]

  6. Martin Khun says:

    It is inevitable. It is already happening in insurers and government payers. The driver is not individuals, but the cost of delivering healthcare. The process in hospitals and clinics with repeat diagnostics, waiting for information handling, and low nurse and doctor productivity is killing us. If you look at the health delivery process, it is the paved cow path of the 1930’s with doctors operating as cottage industries.

    I suspect that in the next 5 years, we will see electronic patient records established. The question will be; will they be accessible to enable learning. A new model of sharing de-identified data will be required or we may be in a world of information islands. The privacy rules and sensitivity of data will be the biggest challenge and may slow the process.

    The alternative is the facebook of health where people self divulge health records. This is also happening in many ways. It will be a very different data base and create a very different set of values and issues. Enough group think and we will create a new disease, or treatments – a bit like rising sea levels and the yo yo craze or we will uncover observational disease data that reveal insights beyond any basic science, especially in areas where science has few answers today, like MS and autoimmune diseases.

    Most likely, both will occur. How exciting. Need help?


  7. Brian Baum says:

    At the risk of inflaming the medical community – of which I have the utmost respect for – I had the opportunity to participate in a conference at Duke a few years back. Dr. Ralph Synderman – then Chancellor presided over the meeting. There were perhaps fifty people – mostly physicians ranging from specialists to PCPs. The general topic of the conference was IT. A proposition was put forth – the practice of medicine is 95% science and 5% art. The premise being that in roughly 95% of patient cases – symptoms, lab values, history, images, etc – could be submitted into an automated system and the correct diagnosis and treatment protocol could be accurately and immediately presented. That in only about 5% of cases would the “finesse” of the human mind – (the physicians mind) be required to integrate non-standard experiences, diagnoses, etc and extrapolate the patient’s condition.

    Of course this was presented against the backdrop of data that indicates a patient visiting a physician has basically 50/50 odds of being accurately diagnosed and prescribed the most effective course of treatment. (As you can imagine this ignited a lively debate.)

    The reality – the potential of the human mind to process all known symptoms, conditions, etc., and align them with emerging research, innovations in protocols is quite simply – impossible. This is what IT does so well. Where is the physicians talent best utilized – the very front end of processing “raw” data from their patient, or the backend of receiving refined data and then applying their insights and experiences to the proposed diagnosis and treatment plan?

    If we extend this one level – we have in excess of 300,000,000 people in the U.S. and a much smaller population of physicians to provide care for every condition from the acute, to the chronic to the preventive. The ratio doesn’t work. Intelligent systems have the capacity to provide guidance and insight to consumers and direct them to the appropriate level of care. In some cases – self directed – with trended monitoring, in others clinical care. The foundation is the tracking of accurate, personal and automatically updated health information. Once this information is stored – IT and associated applications can work their data mining magic and create a new generation of health consumers and providers.

    An overnight cure for healthcare for 100% of the population – absolutely not. The foundation for comprehensive long term “reform” – quite possibly.

  8. Great post. I don’t want to rain on the parade but I think you’ve missed a fourth reason why machine learning from health data hasn’t happened on a large scale, and that reason is privacy. Amassing and mining health care data on a large scale is a two-edged sword – on one hand, the possibilities to generate new knowledge are tremendous, but the privacy implications are scary. It doesn’t even matter if the data are stripped of any personal information – linking different databases and publicly available information will allow re-identification of individuals. I am not a privacy zealot and all for patient empowerment and PHRs (I’ve written whole books about it), but still I think this is the top reason for why this vision (which some of us have for decades) isn’t as easy to achieve as it sounds like. Putting these data into the hands of private corporations raises some additional issues, as these are not bound by professional codes (such as hospitals, physicians).

  9. Adam –

    Thanks for the thought-provoking piece. Here’s what it, um, provoked from me:

    The “Safeway” amendment currently included in the latest health reform draft potentially does just what you suggest in your reply to JP – allow payers to offer reduced premiums based upon good behaviors (smoking cessation, weight reduction, etc.). Given that motivation, people (or their employers or insurers) my look to tools like Keas to empower individuals to achieve their health goals. We’ll see what comes out of the other end of the sausage-making process.

    While I agree with the eventual result you suggest, I wonder if the timing is right. Ten years seems like too short a time to effect a sea-change of the magnitude you present. ARRA Meaningful Use incentives for EHR adoption (and the ARRA enhancements to the language of HIPAA that gives patients the right to request an _electronic_ copy of their records and have them sent to a designated repository) will certainly move us down this path in a significant way. But there are still significant impediments to moving us to true evidence-based medicine that leverages information exchange, data mining and machine learning.

    – Even given (a big, hairy given, mind you) sufficient data that is accessible, aggregated, longitudinal, and attributable to specific individuals (whether “identified” or not), finding the “evidence” takes time because most changes in medicine happen over time. We’ve learned an immense amount about heart disease and related illnesses through the Framingham Study data, but it’s taken decades of data collection to follow the progression of disease and reach a measurable clinical endpoint. Personal Health Records will help get us there because only the individual person is the common link between a lifetime of health-related actions, encounters and outcomes. That may seem obvious, but most data sources are provider, payer or disease-state centric, and don’t cover a lifespan. So the data will accumulate, but it will take time for the data stores to collect sufficient baseline conditions (genomics, environmental, etc.), interventions, and outcomes to show the evidence.

    – Once the evidence is clear, you still need to change perceptions around the veracity of “wild” data compared with the gold standard of double-blind, placebo-controlled, prospective, randomized clinical studies that control for everything except the thing you’re trying to prove or disprove. Acceptance is growing, but not necessarily in a balanced way. Take the Vioxx example. Individual studies are equivocal regarding the effect of Vioxx and other Cox-2 Inhibitors on heart disease. But the meta-analysis of all of those studies suggests an association. Kaiser digs into their 9 million patient lives worth of data, shows the association, and shares the data with the FDA. FDA goes public and Vioxx is pulled off the market (Celebrex and Bextra too). Score one for data mining. But what if, instead of showing an increased mortality rate, Kaiser’s data had shown a beneficial effect on heart disease. Do you think the FDA would have given Merck a new indication for Vioxx based on this evidence? Not a chance. They would have said: go spend $20 million and five years to prove it in trials, then come back with the “real” evidence.

    – Once you’ve got the evidence and have convinced everyone that it is valid and useful, then you have to change behavior – specifically, clinician behavior. Docs are not known for their rapid adoption of new approaches that are supported by evidence (unless, it seems, they are detailed by preternaturally attractive sales reps and offered samples – but that’s another discussion).

    What I do believe is that this will ultimately be easier to achieve with consumers/patients/people than it will be with healthcare providers and regulators. That’s great and we’ll see some marked improvements in these capabilities through precisely what you’ve described. But there are an awful lot of conditions and interventions that can only be managed through the healthcare system (you know, where that frighteningly huge percentage of our GDP gets spent?). Changing that will take some serious time.

    Nice WSJ video interview, btw. Great to see Keas finally taking wing…

  10. Ram Duriseti says:

    I speak as an active clinician and someone with a doctorate in computational modeling and in depth experience with the application of machine learning techniques to clinical data.

    Until EHRs support computable data that covers subjective clinical data with objective metrics, we will never “bend the cost curve”. Clinical decision making is a highly connected graph with complex joint probabilities are simply unknown. Even large studies fail to provide much insight into these complex joint distributions due to data fragmentation.

    I look forward to innovations at Keas. 2 years ago I worked for a company called Enfold (funded briefly by MDV) to create a similar application. Technology was not the problem. Financial support for the initiative from prospective customers was the failure point even they all “saw the value”. Hopefully Mr. Bosworth’s name recognition will create some sustained traction.

  11. AJ Chen says:

    Hi Adam, you are right on! We’ve got to get patient’s data, clinical data, and bio-med research data on computer. This will allow machine learning and other AI approaches to discover patterns and facts that will help individuals to better manage their health.

    Using intelligence to help people live healthier and longer is also my passion. I approached this by first working on whole-genome analysis technologies since 2000. But, soon I realized that the dream of individualized medicine wouldn’t be possible if there is no large amount of clinical data and patient data available electronically. In fact, there is a more fundamental problem, as you pointed out, that is our whole health care system is lacking data that can be shared and analyzed in large scale.

    So, I have started to work on healthcare search engine for consumers in the past several years. In particular, apply semantic technologies and NLP to integrate structured and unstructured data and make them easily searchable by consumers. I hope by bringing large amount of health care data online, we will then be able to create intelligent systems to deliver healthier choices to everyone.

  12. […] Please read Adam Bosworth’s post on Learning from data. […]

Leave a Reply to adambosworth Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: