March 21, 2014: Weekly Roundup for Big Data in Medical Science

 

Image of the Week

Fast Upload

This week, Apple is rumored to enter the healthcare market, medical conspiracy theories without any data, Google wants your DNA sequences in the cloud for scientific discovery but didn’t get the flu predictions right, CMS proposes releasing more Medicare Part D data for research, and 2014: year of the wearable device?

 

Bits and Bytes

 

Upcoming Events

Link When What Where
MIWC 2014 April 28-29, 2014 Medical Informatics World Conference Boston, MA
BDM 2014 May 21-23 2014  Big Data in Biomedicine Conference Stanford, CA
ASE BDS 2014 May 27-31, 2014  Second ASE International Conference on Big Data Science and Computing Stanford, CA
HCI-KDD@AMT 2014 August 11, 2014  Special Session on Advanced Methods in Interactive Data Mining for Personalized Medicine Warsaw, Poland
BigR&I 2014 August 27-29, 2014  International Symposium on Big Data Research and Innovation Barcelona, Spain
ICHI 2014 September 15-17, 2014  IEEE International Conference on Healthcare Informatics Verona, Italy

Personal Biosensors and the Internet of Medical Things

IoMT

There is tsunami of new devices and apps out that will help you record everything from the number of steps you took in a day to calories and caffeine ingested, sleep quality, weight, blood pressure and blood glucose levels.  The next revolution in Medicine will be the Internet of Medical Things (IoMT), uniquely tagged devices that help monitor blood pressure, blood glucose, physical activity, temperature, sleep, and even motion.  Along with patient entered data from tablets, mobile devices, and conventional desktop computers, data from these devices will change the face of medicine, increase our ability to engage patients in their own health behaviors, and provide massive amounts of data for population health study on an unprecedented scale.

Personal biosensor devices (PBD’s) like Fitbit and Jawbone have become the rage, with many corporations looking to provide PBD’s to employees, with the goal of improving employee health.  Often the devices are paired with financial incentives to motivate people to change behavior.  As reported last year in Wired,  company +Citizen has a program where employees have voluntarily agreed to share their fitness, productivity and happiness data.  Many  vendors, such as FitLinxx, SparkPeople, and Endomondo specifically offer employer packages.

Mobile apps are branching out, and rapidly linking with these devices, allowing coupling of geospatial and biometric data.  The data to be generated by these devices, already in use by hundreds of thousands, if not millions of people, will be staggering.  This past year, clinical research and clinical trials started to incorporate PBD data from smart phones and PBD’s.

At present, it is unclear whether apps or PBD’s will alter health behavior.  Despite their ubiquity, there is little data on improvement in glucose control by diabetics who use such mobile software to manage their blood sugars.  Do weight loss and calorie counting apps really achieve their goals?  I think that it’s fair to say that anectdotal evidence suggests great promise in many cases.  From a personal standpoint, my Fitbit has made me more aware of my sedentary computer habits, and motivated me to take more steps and run out more.  My favorite recent awareness raising app, pointed out to me by my colleague Joshua Schwimmer, is UpCoffee by Jawbone.  I had no idea of the half-life of caffeine before I downloaded the app!

The impact of PBD’s and apps may not be all good, or all predictable.  Sometimes, personal bio-sensing apps can actually lead to bad outcomes.  An article by Alice Gregory in the New Republic last year describes how calorie counting mobile fitness apps can worsen eating disorders.  Given the studies that have described the addictive properties of electronic devices and the internet, and the underlying biology, it is not surprising that these problems can be exacerbated in people with addictive or compulsive behavior tendencies or illnesses.

Where all this leads, we don’t know yet.  Certainly to very large data sets and something far beyond telemedicine.  Something exciting is happening in medicine and research.  I hope that this will lead to the ability to crowdsource population health research questions  and studies beyond our wildest imagination.  What would you study if you had access to data from a million PBD’s?

Revealing Healthcare Networks Using Insurance Claims Data

 

As I noted in my post last week, every healthcare accountable care organization in the United States is trying to understand provider networks. Common questions include:

  • What is the “leakage” from our network?
  • What medical practices should we acquire?
  • What are the referral patterns of providers within the network?
  • Does the path that a patient takes through our network of care affect outcomes?
  • Where should we build the next outpatient clinic?

Much of this analysis is being done by using insurance claims data, and this post is about how such data is turned into a provider network analysis.  Here, I’ll discuss how billing or referrals data is turned into graphs of provider networks.  Most of us are now familiar with social networks, which describe how a group of people are “connected”.  A common example is Facebook, where apps like TouchGraph that show who you are friends with, and whether your friends are friends, and so on.  These networks are build with a simple concept, that of a relationship.

To describe a physician network, we first make a table from claims data that shows which physicians (D) billed for visits or procedures on which patients (P).  This is shown in the figure below.  Next, we tally which physicians billed for the seeing the same patient, and how many times, giving a common billing matrix.  The billing does not have to happen at the same visit or for the same problem, just over the course of the measurement period. Notice that the matrix is symmetrical, with the diagonal giving the total number of patient encounters for each doctor.  This type of matrix is referred to as a distance or similarity matrix.

BillingNetwork

The provider network graph plotted from the above example shows the network relationship between four doctors.  The size of the circle shows total number of patients billed for by that doctor, and the width of the line shows the strength of the shared patient connection.

Network

Now, if we have this data for a large network, we can look at a number of measures using standard methods.  In the above example, we can see that the two orange providers are probably members of a group practice, sharing many of the same patients and referring to many of the same providers. See this humorous post by Kieran Healy identifying Paul Revere as the ringleader of the American Revolution using a similar analysis!  Providers in red are “out-of-network”, and with connections to a single in-network physician.  However, the graph itself does not reveal the reason that these out-of-network providers share patients with the in-network provider.   It could be that the out-of-network group offers a service not available within the network, such as gastric bypass, pediatric hepatology, or kidney transplantation.

It is not difficult to see that you could create network representations using many types of data.  Referral data would allow you to add directionality to the network graph.  You could also look at total charges in shared patients, as opposed to visits or procedures, to get a sense of the financial connectedness of providers or practices.  Linking by lab tests or procedures can show common practice patterns.  Many other variations are possible. Complexity of the network can increase with the more providers and patients in the claims data you have.

These simple graphs are just the beginning.  Couple to network graph with geospatial locations of providers, and you add another layer of complexity.  Add city bus routes, and you can see how patients might get to your next office location.  Add census data, and you can look at the relationship between medical practice density, referral patterns, and the average income within a zip code area.  The possibilities are incredible!

So why is this big data?  To build a large and accurate network, you  need to analyze millions of insurance claims, lab tests, or other connection data.  Analyzing data of this size requires large amounts of computer memory and, often cluster computers, and distributed computing software such as Hadoop (more on this in a future post).  We owe a very large debt to the “Healthcare Hacker” Fred Trotter, who created the first such open source, very large, network graph from 2011 Medicare claims data for the entire United States, called DocGraph. The dataset can be downloaded from NotOnly Dev for $1 here.  This graph has 49 million connections between almost a million providers.  Ryan Weald created a beautiful visualization of the entire DocGraph dataset, which I will leave you with here.

DocGraph

8 March 2014: Weekly Roundup for Big Data in Medicine and Science

 

Image of the Week

HIVmap_gr2

Young S, Rivers C, Lewis B (2014) Methods of using real-time social media technologies for detection and remote monitoring of HIV outcomes.  Peventive Medicine.  http://dx.doi.org/10.1016/j.ypmed.2014.01.024

 

Fast Upload

This week, big data breaches at LA County medical facilities, more US healthcare delivery companies explore use of data mining and analytics.  At the Healthcare Information Management Systems Society meeting this week, “… all healthcare data is big data, and it’s only going to be getting bigger”.

 

Bits and Bytes

 

Upcoming Events

Link When What Where
BDM 2014 May 21-23 2014  Big Data in Biomedicine Conference Stanford, CA
MIWC 2014 April 28-29, 2014 Medical Informatics World Conference Boston, MA
ASE BDS 2014 May 27-31, 2014  Second ASE International Conference on Big Data Science and Computing Stanford, CA
HCI-KDD@AMT 2014 August 11, 2014  Special Session on Advanced Methods in Interactive Data Mining for Personalized Medicine Warsaw, Poland
BigR&I 2014 August 27-29, 2014  International Symposium on Big Data Research and Innovation Barcelona, Spain
ICHI 2014 September 15-17, 2014  IEEE International Conference on Healthcare Informatics Verona, Italy

Heathcare Data Privacy and Self-Insured Employers

Merge Data

In the rush to control healthcare costs, many employers are self-insuring.  As part of this move, most self-insured networks have become intensely interested in analyzing their own claims and medication cost data.  This type of analysis can be highly informative.  For example, Fred Trotter has created an enormous Medicare referral network graph (DocGraph) for all physicians and providers in the United States.  Essentially, he took Medicare claims data and counted the number of instances that two physicians billed for care on the same patients.  Physicians were identified by a unique National Practitioner Identifier (NPI) number, which is publicly available here.   By some very simple matrix manipulation on this very large data set of 2011 Medicare claims, he created DocGraph. The resulting data is very simple:  {provider #1, provider #2, number instances where P#1 billed for seeing patients that p#2 also saw at some point}, but very large (49 million relationships).  This graph can be used to identify referral “cliques” (who refers to whom), and other patterns.  The bottom line is that any organization that has claims data, big data storage and processing capabilities, and some very simple analytics can do this.  Similar analyses can be done for medication prescribing patterns, disability claim numbers, and other care-delivery metrics.

Now, this can be a good thing from a business standpoint.  For example, to contain costs, you want most of your patients treated by providers in your network where you have negotiated contracts.  Out-of-network treatments are termed “leakage” by the industry. Network “leakage” analysis can rapidly identify which physicians are referring out-of-network and how often.   Assuming that the equivalent services are available in-network, and this is the key question, you could make these physicians aware of the resources and craft a referral process that makes it easier for them and their patients to access care.

You can also identify physicians who are the “hubs” of your network,  practitioners who are widely connected to others by patient care. These may be the movers-and-shakers of care standards, and the group that you  want to involve in development of new patient care strategies.  For a great example, see this innovative social network analysis of physicians in Italy and their attitudes towards evidence based medicine.

These types of analyses are not without problems and could be used unwisely.  For example, physicians who prescribe expensive, non-generic medications may be highly informed specialists.  Programs that do not take such information into account may unfairly penalize network providers.  In addition, some services may not be available in-network, so providers referring out of network in these cases are actually providing the best care for their patients.  Finally, these analytics could easily be used to identify “high utilizers” of healthcare services, and to better manage their healthcare.  Network analytics are really good at such pattern recognition.  As we move forward, a balanced approach to such analytics is needed, especially to prevent premature conclusions from being drawn from the data.

There is a larger issue also lurking beneath the surface:  employee discrimination based on healthcare data.  Some healthcare networks are triple agents:  healthcare provider, employer, and insurer.  It may be tempting from a business side to use complex analytics to hire or promote employees based on a combined analysis of performance, healthcare and other data.  Google already uses such “people analytics” for hiring.  Some businesses may try to use such profiling, including internal healthcare claims data, to shape their workforce.  Even if individual health data is not used by a company, it seems likely that businesses will use de-identified healthcare data to develop HR  management systems.  See Don Peck’s article in the Atlantic for some interesting reading on “people management” systems.

As a last thought, it’s a bit ironic that we, as a healthcare system in the United States, will be spending hundreds of millions of dollars analyzing whether our patients going “out-of-network” for care, and designing strategies to keep them in network, when this problem does not exist for single-payer National Healthcare Systems…