Weekly Roundup for Big Data in Medical Science: April 21-28, 2014

 

Image of the Week

Data graphic created from the Institute for Health Metrics and Evaluation web app showing the number of years people with chronic kidney disease live with their disability after diagnosis.

Data graphic created from the Institute for Health Metrics and Evaluation web app showing the number of years people with chronic kidney disease live with their disability after diagnosis.

Fast Upload

This week,  IBM Launches Watson-based big data services for clinical carePersephone, the Real-Time Genome Browser, and yet another online flu web-page view correlation…Wikipedia usage estimates prevalence of influenza-like illness in the United States in near real time

 

Bits and Bytes

Upcoming Events

Link When What Where
MIWC 2014 April 28-29, 2014 Medical Informatics World Conference Boston, MA
BDM 2014 May 21-23 2014  Big Data in Biomedicine Conference Stanford, CA
ASE BDS 2014 May 27-31, 2014  Second ASE International Conference on Big Data Science and Computing Stanford, CA
HCI-KDD@AMT 2014 August 11, 2014  Special Session on Advanced Methods in Interactive Data Mining for Personalized Medicine Warsaw, Poland
BigR&I 2014 August 27-29, 2014  International Symposium on Big Data Research and Innovation Barcelona, Spain
ICHI 2014 September 15-17, 2014  IEEE International Conference on Healthcare Informatics Verona, Italy

Weekly Roundup for Big Data in Medical Science: April 4-11, 2014

 

Image of the Week

Evolution and Genomics Workshop: Circos diagram

Fast Upload

This week, the big buzz about the Medicare release of the complete physician reimbursement data setWill privacy concerns derail collection of large personalized data sets for genomics research?  New bioinformatics methods for SNP research in epidemiology.

Bits and Bytes

 

The Big Medicare Payment Data Release

Today Medicare released payment data for over 880,000 healthcare providers, and include charge and payment information, provider specialties and addresses, billing codes, and other specific information.  The Medicare data set is downloadable here.  The description on the Medicare web site describes the data set as:

“Provider Utilization and Payment Data: Physician and Other Supplier Public Use File (Physician and Other Supplier PUF), with information on services and procedures provided to Medicare beneficiaries by physicians and other healthcare professionals.  The Physician and Other Supplier PUF contains information on utilization, payment (allowed amount and Medicare payment), and submitted charges organized by National Provider Identifier (NPI), Healthcare Common Procedure Coding System (HCPCS) code, and place of service. This PUF is based on information from CMS’s National Claims History Standard Analytic Files. The data in the Physician and Other Supplier PUF covers calendar year 2012 and contains 100% final-action physician/supplier Part B non-institutional line items for the Medicare fee-for-service population.”

There are some notable caveats to making conclusions about the data, which have been extensively outlined by docgraph.org.  Problems such as payer mix and specialty bias should be considered.  For example, pediatricians will have many fewer Medicare patients, while specialties with patients 65+ or special Medicare programs, such as Nephrology (Disclosure:  this my sub-specialty), may have a higher proportion of Medicare insured patients.

How will this large data set help us understand healthcare practices in the United States?  Several promising analyses come to mind:

  • Analysis of varying payment amounts for similar procedures – Because the same medical procedure can be billed on several different codes that account for the complexity of care provided, there is the opportunity for the “Lake Woebegone Effect” – where all the procedures have above average difficulty.  In some cases it might be true that a particular physician specializes in the most difficult cases (e.g. advanced chemotherapy using an implantable pump for liver cancer), but this is the exception rather than the rule.

 

  • Network analysis of unusual billing patterns -Here is where coupling this database with DocGraph (see my previous post here), a network graph database of all the referral patterns for Medicare for all US patients, may yield very interesting findings.  Some networks of physicians may have unusual billing patterns compared with others.  In some cases, this will be a sign of efficiency and great medical care delivery.  In others, it may be a sign of inefficiency or, in rare cases, something more ominous such as a pattern of fraud among a group or organization of providers.

 

  • Network analysis of procedure frequency – More useful, will be the ability to study types of procedures and visits among providers in different geographic areas, and the reimbursement variations.  Already, USA Today has posted a map of average reimbursement by state.  While some sophisticated analysis will be needed to reach thoughtful conclusions about regional variations in care, this will certainly spur a great deal of analysis and hopefully some good healthcare policy.

 

So, a good day for data transparency in healthcare delivery, and I say that as somebody whose Medicare practice is in the database! Let’s hope that high quality data analytics with thoughtful research follows.

 

Weekly Roundup: March 28 – April 5, 2014

 

Image of the Week

Fast Upload

This week, a mesoscale-connectome of the mouse brainMerk uses Hadoop to optimize vaccine production, hospitals turn to big data to reduce re-admission rates, another philanthropic gift for data science.

 

Bits and Bytes

 

Upcoming Events

Link When What Where
MIWC 2014 April 28-29, 2014 Medical Informatics World Conference Boston, MA
BDM 2014 May 21-23 2014  Big Data in Biomedicine Conference Stanford, CA
ASE BDS 2014 May 27-31, 2014  Second ASE International Conference on Big Data Science and Computing Stanford, CA
HCI-KDD@AMT 2014 August 11, 2014  Special Session on Advanced Methods in Interactive Data Mining for Personalized Medicine Warsaw, Poland
BigR&I 2014 August 27-29, 2014  International Symposium on Big Data Research and Innovation Barcelona, Spain
ICHI 2014 September 15-17, 2014  IEEE International Conference on Healthcare Informatics Verona, Italy

March 21, 2014: Weekly Roundup for Big Data in Medical Science

 

Image of the Week

Fast Upload

This week, Apple is rumored to enter the healthcare market, medical conspiracy theories without any data, Google wants your DNA sequences in the cloud for scientific discovery but didn’t get the flu predictions right, CMS proposes releasing more Medicare Part D data for research, and 2014: year of the wearable device?

 

Bits and Bytes

 

Upcoming Events

Link When What Where
MIWC 2014 April 28-29, 2014 Medical Informatics World Conference Boston, MA
BDM 2014 May 21-23 2014  Big Data in Biomedicine Conference Stanford, CA
ASE BDS 2014 May 27-31, 2014  Second ASE International Conference on Big Data Science and Computing Stanford, CA
HCI-KDD@AMT 2014 August 11, 2014  Special Session on Advanced Methods in Interactive Data Mining for Personalized Medicine Warsaw, Poland
BigR&I 2014 August 27-29, 2014  International Symposium on Big Data Research and Innovation Barcelona, Spain
ICHI 2014 September 15-17, 2014  IEEE International Conference on Healthcare Informatics Verona, Italy

Personal Biosensors and the Internet of Medical Things

IoMT

There is tsunami of new devices and apps out that will help you record everything from the number of steps you took in a day to calories and caffeine ingested, sleep quality, weight, blood pressure and blood glucose levels.  The next revolution in Medicine will be the Internet of Medical Things (IoMT), uniquely tagged devices that help monitor blood pressure, blood glucose, physical activity, temperature, sleep, and even motion.  Along with patient entered data from tablets, mobile devices, and conventional desktop computers, data from these devices will change the face of medicine, increase our ability to engage patients in their own health behaviors, and provide massive amounts of data for population health study on an unprecedented scale.

Personal biosensor devices (PBD’s) like Fitbit and Jawbone have become the rage, with many corporations looking to provide PBD’s to employees, with the goal of improving employee health.  Often the devices are paired with financial incentives to motivate people to change behavior.  As reported last year in Wired,  company +Citizen has a program where employees have voluntarily agreed to share their fitness, productivity and happiness data.  Many  vendors, such as FitLinxx, SparkPeople, and Endomondo specifically offer employer packages.

Mobile apps are branching out, and rapidly linking with these devices, allowing coupling of geospatial and biometric data.  The data to be generated by these devices, already in use by hundreds of thousands, if not millions of people, will be staggering.  This past year, clinical research and clinical trials started to incorporate PBD data from smart phones and PBD’s.

At present, it is unclear whether apps or PBD’s will alter health behavior.  Despite their ubiquity, there is little data on improvement in glucose control by diabetics who use such mobile software to manage their blood sugars.  Do weight loss and calorie counting apps really achieve their goals?  I think that it’s fair to say that anectdotal evidence suggests great promise in many cases.  From a personal standpoint, my Fitbit has made me more aware of my sedentary computer habits, and motivated me to take more steps and run out more.  My favorite recent awareness raising app, pointed out to me by my colleague Joshua Schwimmer, is UpCoffee by Jawbone.  I had no idea of the half-life of caffeine before I downloaded the app!

The impact of PBD’s and apps may not be all good, or all predictable.  Sometimes, personal bio-sensing apps can actually lead to bad outcomes.  An article by Alice Gregory in the New Republic last year describes how calorie counting mobile fitness apps can worsen eating disorders.  Given the studies that have described the addictive properties of electronic devices and the internet, and the underlying biology, it is not surprising that these problems can be exacerbated in people with addictive or compulsive behavior tendencies or illnesses.

Where all this leads, we don’t know yet.  Certainly to very large data sets and something far beyond telemedicine.  Something exciting is happening in medicine and research.  I hope that this will lead to the ability to crowdsource population health research questions  and studies beyond our wildest imagination.  What would you study if you had access to data from a million PBD’s?

Revealing Healthcare Networks Using Insurance Claims Data

 

As I noted in my post last week, every healthcare accountable care organization in the United States is trying to understand provider networks. Common questions include:

  • What is the “leakage” from our network?
  • What medical practices should we acquire?
  • What are the referral patterns of providers within the network?
  • Does the path that a patient takes through our network of care affect outcomes?
  • Where should we build the next outpatient clinic?

Much of this analysis is being done by using insurance claims data, and this post is about how such data is turned into a provider network analysis.  Here, I’ll discuss how billing or referrals data is turned into graphs of provider networks.  Most of us are now familiar with social networks, which describe how a group of people are “connected”.  A common example is Facebook, where apps like TouchGraph that show who you are friends with, and whether your friends are friends, and so on.  These networks are build with a simple concept, that of a relationship.

To describe a physician network, we first make a table from claims data that shows which physicians (D) billed for visits or procedures on which patients (P).  This is shown in the figure below.  Next, we tally which physicians billed for the seeing the same patient, and how many times, giving a common billing matrix.  The billing does not have to happen at the same visit or for the same problem, just over the course of the measurement period. Notice that the matrix is symmetrical, with the diagonal giving the total number of patient encounters for each doctor.  This type of matrix is referred to as a distance or similarity matrix.

BillingNetwork

The provider network graph plotted from the above example shows the network relationship between four doctors.  The size of the circle shows total number of patients billed for by that doctor, and the width of the line shows the strength of the shared patient connection.

Network

Now, if we have this data for a large network, we can look at a number of measures using standard methods.  In the above example, we can see that the two orange providers are probably members of a group practice, sharing many of the same patients and referring to many of the same providers. See this humorous post by Kieran Healy identifying Paul Revere as the ringleader of the American Revolution using a similar analysis!  Providers in red are “out-of-network”, and with connections to a single in-network physician.  However, the graph itself does not reveal the reason that these out-of-network providers share patients with the in-network provider.   It could be that the out-of-network group offers a service not available within the network, such as gastric bypass, pediatric hepatology, or kidney transplantation.

It is not difficult to see that you could create network representations using many types of data.  Referral data would allow you to add directionality to the network graph.  You could also look at total charges in shared patients, as opposed to visits or procedures, to get a sense of the financial connectedness of providers or practices.  Linking by lab tests or procedures can show common practice patterns.  Many other variations are possible. Complexity of the network can increase with the more providers and patients in the claims data you have.

These simple graphs are just the beginning.  Couple to network graph with geospatial locations of providers, and you add another layer of complexity.  Add city bus routes, and you can see how patients might get to your next office location.  Add census data, and you can look at the relationship between medical practice density, referral patterns, and the average income within a zip code area.  The possibilities are incredible!

So why is this big data?  To build a large and accurate network, you  need to analyze millions of insurance claims, lab tests, or other connection data.  Analyzing data of this size requires large amounts of computer memory and, often cluster computers, and distributed computing software such as Hadoop (more on this in a future post).  We owe a very large debt to the “Healthcare Hacker” Fred Trotter, who created the first such open source, very large, network graph from 2011 Medicare claims data for the entire United States, called DocGraph. The dataset can be downloaded from NotOnly Dev for $1 here.  This graph has 49 million connections between almost a million providers.  Ryan Weald created a beautiful visualization of the entire DocGraph dataset, which I will leave you with here.

DocGraph

8 March 2014: Weekly Roundup for Big Data in Medicine and Science

 

Image of the Week

HIVmap_gr2

Young S, Rivers C, Lewis B (2014) Methods of using real-time social media technologies for detection and remote monitoring of HIV outcomes.  Peventive Medicine.  http://dx.doi.org/10.1016/j.ypmed.2014.01.024

 

Fast Upload

This week, big data breaches at LA County medical facilities, more US healthcare delivery companies explore use of data mining and analytics.  At the Healthcare Information Management Systems Society meeting this week, “… all healthcare data is big data, and it’s only going to be getting bigger”.

 

Bits and Bytes

 

Upcoming Events

Link When What Where
BDM 2014 May 21-23 2014  Big Data in Biomedicine Conference Stanford, CA
MIWC 2014 April 28-29, 2014 Medical Informatics World Conference Boston, MA
ASE BDS 2014 May 27-31, 2014  Second ASE International Conference on Big Data Science and Computing Stanford, CA
HCI-KDD@AMT 2014 August 11, 2014  Special Session on Advanced Methods in Interactive Data Mining for Personalized Medicine Warsaw, Poland
BigR&I 2014 August 27-29, 2014  International Symposium on Big Data Research and Innovation Barcelona, Spain
ICHI 2014 September 15-17, 2014  IEEE International Conference on Healthcare Informatics Verona, Italy

Heathcare Data Privacy and Self-Insured Employers

Merge Data

In the rush to control healthcare costs, many employers are self-insuring.  As part of this move, most self-insured networks have become intensely interested in analyzing their own claims and medication cost data.  This type of analysis can be highly informative.  For example, Fred Trotter has created an enormous Medicare referral network graph (DocGraph) for all physicians and providers in the United States.  Essentially, he took Medicare claims data and counted the number of instances that two physicians billed for care on the same patients.  Physicians were identified by a unique National Practitioner Identifier (NPI) number, which is publicly available here.   By some very simple matrix manipulation on this very large data set of 2011 Medicare claims, he created DocGraph. The resulting data is very simple:  {provider #1, provider #2, number instances where P#1 billed for seeing patients that p#2 also saw at some point}, but very large (49 million relationships).  This graph can be used to identify referral “cliques” (who refers to whom), and other patterns.  The bottom line is that any organization that has claims data, big data storage and processing capabilities, and some very simple analytics can do this.  Similar analyses can be done for medication prescribing patterns, disability claim numbers, and other care-delivery metrics.

Now, this can be a good thing from a business standpoint.  For example, to contain costs, you want most of your patients treated by providers in your network where you have negotiated contracts.  Out-of-network treatments are termed “leakage” by the industry. Network “leakage” analysis can rapidly identify which physicians are referring out-of-network and how often.   Assuming that the equivalent services are available in-network, and this is the key question, you could make these physicians aware of the resources and craft a referral process that makes it easier for them and their patients to access care.

You can also identify physicians who are the “hubs” of your network,  practitioners who are widely connected to others by patient care. These may be the movers-and-shakers of care standards, and the group that you  want to involve in development of new patient care strategies.  For a great example, see this innovative social network analysis of physicians in Italy and their attitudes towards evidence based medicine.

These types of analyses are not without problems and could be used unwisely.  For example, physicians who prescribe expensive, non-generic medications may be highly informed specialists.  Programs that do not take such information into account may unfairly penalize network providers.  In addition, some services may not be available in-network, so providers referring out of network in these cases are actually providing the best care for their patients.  Finally, these analytics could easily be used to identify “high utilizers” of healthcare services, and to better manage their healthcare.  Network analytics are really good at such pattern recognition.  As we move forward, a balanced approach to such analytics is needed, especially to prevent premature conclusions from being drawn from the data.

There is a larger issue also lurking beneath the surface:  employee discrimination based on healthcare data.  Some healthcare networks are triple agents:  healthcare provider, employer, and insurer.  It may be tempting from a business side to use complex analytics to hire or promote employees based on a combined analysis of performance, healthcare and other data.  Google already uses such “people analytics” for hiring.  Some businesses may try to use such profiling, including internal healthcare claims data, to shape their workforce.  Even if individual health data is not used by a company, it seems likely that businesses will use de-identified healthcare data to develop HR  management systems.  See Don Peck’s article in the Atlantic for some interesting reading on “people management” systems.

As a last thought, it’s a bit ironic that we, as a healthcare system in the United States, will be spending hundreds of millions of dollars analyzing whether our patients going “out-of-network” for care, and designing strategies to keep them in network, when this problem does not exist for single-payer National Healthcare Systems…

Primary Care Genomics: The Next Clinical Wave?

DNA Double_HelixIs the main barrier for in healthcare analyzing and connecting the massive amounts of data present in electronic medical records, or is it generating the right data at the right level?  To really move healthcare forward, argues Michael Groner, VP of engineering and chief architect, and Trevor Heritage, we need to move research-level testing (whole exome sequencing, genomics, clinical proteomics) outside of the research environment and make it widely available to primary care physicians.  According to Groner, only when we amass large collections of such data will the true value of big data analytics methods be realized in medicine.

“It’s untenable to expect every physician or health care provider interested in improving patient care through the use of genomics testing to make the costly capital and other investments required to make this science a practical reality that impacts day-to-day patient care. Instead, the aim should be to connect the siloed capabilities associated with genomics testing into a simple, physician-friendly workflow that makes the best services accessible to every provider, regardless of geography or institutional size or affiliation…The true barrier to clinical adoption of genomic medicine isn’t data volume or scale, but how to empower physicians from a logistical and clinical genomics knowledge standpoint, while proving the fundamental efficacy of genomics medicine in terms of improved patient diagnosis, treatment regimens, outcomes and improved patient management.”

It’s a great dream, and parts of it will be realized in the future, but ignores many of the realities of in-the-trenches medical practice and medical science.  Genomics medicine will simply not improve the diagnostic acumen for many clinical problems; it’s just the wrong method.  Some examples include fractures, appendicitis, stroke, heart attacks, and many others.  Sequencing my genome will not diagnose my diverticulitis.  This has nothing to do with making genomic science and whole genome analytics a practical reality, but rather matching the tools to the appropriate medical problem and scale.  Genomics is quite good at providing information about genetic risk of conditions, but not necessarily diagnosing them.  Knowing that somebody has the BRCA1 breast cancer gene mutation does not tell you if they actually have breast cancer, and if they do which breast it’s in, whether it has metastasized, and where.

Groner’s larger point about the need to use data science to make personalized medicine a real-time reality, however, is well taken.  For example, the new guidelines for treatment of cholesterol abnormalities with statins, powerful cholesterol lowering drugs, are based on a risk score that no provider can calculate in their head.  Personalized medicine could evolve to generate a personalized risk assessment, based on a risk score for cardiovascular disease.  Beyond this, one could imagine the risk score being modified by a proteomics analysis of subtle serum proteins and their associated contributions to cardiovascular risk, and a genomic analysis of hereditary risk.  Integrating this evidence and providing clinicians with some measure of how to weight the predicted risk factors when making treatment decisions, are true growth areas for medical genomics and health informatics.