I attended a short conference event organised by the CEEDs project earlier this month entitled “Making Sense of Big Data.” CEEDS is an EU-funded project under the Future and Emerging Technology (FET) Initiative. The project is concerned with the development of novel technologies to support human experience. The event took place at the Google Campus in London and included a range of speakers talking about the use of data to capture human experience and behaviour. You can find a link about the event here that contains full details and films of all the talks including a panel discussion. My own talk was a general introduction to physiological computing and a statement of our latest project work.
It was a thought-provoking day because it was an opportunity to view the area of physiological computing from a different perspective. The main theme being that we are entering the age of ‘big data’ in the sense that passive monitoring of people using mobile technology grants access to a wide array of data concerning human behaviour. Of course this is hugely relevant to physiological monitoring systems, which tend towards high-resolution data capture and may represent the richest vein of big data to index the human experience.
The phrase “data deluge” came up several times during the day, which is the idea that we can access more data than we can possibly assimilate (but that’s why we have machine learning algorithms – right?). The other issue that ran through the session was an open question about the utility of big data – does the data associated with pervasive, passive monitoring deliver great insight into human experience? How does big data allow technology to support human behaviour?
There seemed to be agreement during the session that big data had sufficient potential to support human activity and decision-making. In the area of physiological computing, I have previously argued that monitoring the user has the potential to make systems much smarter in terms of software adaptation. The bodyblogger project also provides a nice example of how physiological data can provide insight into lifestyle factors for the individual.
If we imagine a future where physiological data (say heart rate for example) is available from the majority of the population via a mobile platform, we can scale up the bodyblogger concept and consider the benefits and challenges of “big” physiological data. The most obvious beneficiaries would be those people working in health epidemiology, especially for the understanding of the link between routine heart function and health (which will benefit us all in the long-term). There is also the potential for crowd-sourced experiments where individuals are randomly selected to adapt lifestyle or diet in order to study the impact on the heart. There is also the ‘human sensor’ idea where changes in the weather or environment (e.g. air pollution) can be quantified in terms of the human cardiovascular system. This type of data takes us into the realm of intelligent cities where human physiological data represents another potential source of information.
There are at least two dimensions to big data according to the picture that I’ve just described. Big data generated by populations or crowds (cross-sectional data where N = many) and big data collected over a period of time that is specific to one person (longitudinal data where N=1). Of course the two are not exclusive because heart rate data can be represented in space (tagged to a GPS coordinate) or time or within a two-dimensional space.
One challenge associated with this “data deluge” is the need for an effective data visualisation to grant insight into the patterns that emerge from physiological sensors. The bodyblogger uses a simple colour-coding system anchored by the days of the week and hours of the day to convey thousands of hours of ECG data. The same scheme could be used to provide a “hotspot” analysis of a spatial location, substituting x-y coordinates or even latitude and longitude to anchor the data. These dimensions provide essential context for the heart rate data. The advantage of this type of visualisation is that it is scaleable and the viewer can zoom in and out of the data to examine changes in heart rate that occurs over weeks, days, hours or minutes.
Issues surrounding the visualisation of big physiological data are connected to the challenge of interpretation. This was one theme that emerged from the talk by Paul Verschure about the crucial role of the scientific method when attempting to extract meaning from large data sets. Big data must be approached with a definite question or set of hypotheses in mind if we are to gain any genuine insight into the behaviour of ourselves or others.
Context also plays an enormous role. If we were to capture heart rate from the population of a large city over 24 hours, we would see lots of hot-spots where heart rate was high for everyone. That sounds interesting but it probably isn’t. Heart rate hot spots would typically cluster around stairs and slopes as people exert themselves. The subtle reactivity of the heart to psychological stimuli would be effectively washed out. Of course we could add an accelerometer to the hypothetical ECG sensors worn by the population and factor out the influence of movement. My point is (and this is related to the wider point made by Paul about the scientific method) that we need a question and we need to control context if this type of big data is going to yield any kind of genuine insight.
The final challenge that came up during the panel discussion was the impact of big physiological data on the privacy of the individual. The recent revelations about the extent of data harvesting of electronic communication conducted by government agencies really places this debate in a new and sinister context. For the individual, there is a safety in numbers when monitoring physiology across large groups of people, i.e. anonymity within big data provided that individual cannot be identified. However, in order to gain insights about the group or the individual, we need context which has the potential to threaten privacy.
It’s a difficult balance to strike and makes me wonder about the benefits and risks of big physiological data.