Image of the Week
Data graphic created from the Institute for Health Metrics and Evaluation web app showing the number of years people with chronic kidney disease live with their disability after diagnosis.
This week, IBM Launches Watson-based big data services for clinical care, Persephone, the Real-Time Genome Browser, and yet another online flu web-page view correlation…Wikipedia usage estimates prevalence of influenza-like illness in the United States in near real time
Bits and Bytes
||April 28-29, 2014
|| Medical Informatics World Conference
|| May 21-23 2014
|| Big Data in Biomedicine Conference
|ASE BDS 2014
||May 27-31, 2014
|| Second ASE International Conference on Big Data Science and Computing
||August 11, 2014
|| Special Session on Advanced Methods in Interactive Data Mining for Personalized Medicine
||August 27-29, 2014
|| International Symposium on Big Data Research and Innovation
||September 15-17, 2014
|| IEEE International Conference on Healthcare Informatics
How much unstructured big data is there in the EMR? Unstructured data is data that doesn’t fit into neat columns on a spreadsheet, or fields and look-up tables in a database, like the narrative text in an HPI. It used to be that we sat down with a pen and the paper chart, and wrote our progress notes in the office and in the clinic. Or, we dictated the notes, which were transcribed. But with the advent of the EMR, templates have crept in, as well as the wide-spread and controversial practice of copying and pasting text from a previous encounter (see the recent NYT article).
This is interesting in a quirky way. As physicians, nurse practitioners, and other providers have become reluctant data entry clerks, they use many shortcuts so that they will have time to take care of the patients, including templates with stylized or constrained vocabularies, self-generated “smart phrases”, and patient-specific narratives that can be recalled and modified. The remainder of the note is populated with structured data already in the system (labs, test results, x-ray results). Because medical changes are often not so dramatic from one day to the next, the actual novel unstructured information content from one note to the next may only be a tiny fraction of the total bytes, and probably the change between the current and previous note may carry as much information than the actual content. But, when people get hurried or sloppy, old information gets carried along that is no longer current, but has not been changed in the notes. So, the key information extraction question is identifying the true changes, separating them from relatively static or outdated data that is carried along, and extracting the novel information.
How is this relevant to big data analytics in medicine? If much of the content is captured by a stylized vocabulary, and filled with structured data already present in data tables, how much independent information will there be in a medical note? If the data has dependencies because of this stylized nature and controlled vocabularies, how does this impact data mining and statistical analytics. I am not sure if this type of problem has a formal technical term in machine learning, but if not it is likely to get one soon!