Revealing Healthcare Networks Using Insurance Claims Data


As I noted in my post last week, every healthcare accountable care organization in the United States is trying to understand provider networks. Common questions include:

  • What is the “leakage” from our network?
  • What medical practices should we acquire?
  • What are the referral patterns of providers within the network?
  • Does the path that a patient takes through our network of care affect outcomes?
  • Where should we build the next outpatient clinic?

Much of this analysis is being done by using insurance claims data, and this post is about how such data is turned into a provider network analysis.  Here, I’ll discuss how billing or referrals data is turned into graphs of provider networks.  Most of us are now familiar with social networks, which describe how a group of people are “connected”.  A common example is Facebook, where apps like TouchGraph that show who you are friends with, and whether your friends are friends, and so on.  These networks are build with a simple concept, that of a relationship.

To describe a physician network, we first make a table from claims data that shows which physicians (D) billed for visits or procedures on which patients (P).  This is shown in the figure below.  Next, we tally which physicians billed for the seeing the same patient, and how many times, giving a common billing matrix.  The billing does not have to happen at the same visit or for the same problem, just over the course of the measurement period. Notice that the matrix is symmetrical, with the diagonal giving the total number of patient encounters for each doctor.  This type of matrix is referred to as a distance or similarity matrix.


The provider network graph plotted from the above example shows the network relationship between four doctors.  The size of the circle shows total number of patients billed for by that doctor, and the width of the line shows the strength of the shared patient connection.


Now, if we have this data for a large network, we can look at a number of measures using standard methods.  In the above example, we can see that the two orange providers are probably members of a group practice, sharing many of the same patients and referring to many of the same providers. See this humorous post by Kieran Healy identifying Paul Revere as the ringleader of the American Revolution using a similar analysis!  Providers in red are “out-of-network”, and with connections to a single in-network physician.  However, the graph itself does not reveal the reason that these out-of-network providers share patients with the in-network provider.   It could be that the out-of-network group offers a service not available within the network, such as gastric bypass, pediatric hepatology, or kidney transplantation.

It is not difficult to see that you could create network representations using many types of data.  Referral data would allow you to add directionality to the network graph.  You could also look at total charges in shared patients, as opposed to visits or procedures, to get a sense of the financial connectedness of providers or practices.  Linking by lab tests or procedures can show common practice patterns.  Many other variations are possible. Complexity of the network can increase with the more providers and patients in the claims data you have.

These simple graphs are just the beginning.  Couple to network graph with geospatial locations of providers, and you add another layer of complexity.  Add city bus routes, and you can see how patients might get to your next office location.  Add census data, and you can look at the relationship between medical practice density, referral patterns, and the average income within a zip code area.  The possibilities are incredible!

So why is this big data?  To build a large and accurate network, you  need to analyze millions of insurance claims, lab tests, or other connection data.  Analyzing data of this size requires large amounts of computer memory and, often cluster computers, and distributed computing software such as Hadoop (more on this in a future post).  We owe a very large debt to the “Healthcare Hacker” Fred Trotter, who created the first such open source, very large, network graph from 2011 Medicare claims data for the entire United States, called DocGraph. The dataset can be downloaded from NotOnly Dev for $1 here.  This graph has 49 million connections between almost a million providers.  Ryan Weald created a beautiful visualization of the entire DocGraph dataset, which I will leave you with here.