As part of our ongoing work in EU4CHILD, we have been focused on laying the groundwork for what an Artificial Intelligence (AI) ecosystem for action on childhood cancer can look like. This is the driving force behind what would ultimately be EU4CHILD’s roadmap for the implementation of AI applications in pediatric cancer. During these first 6 months, we have been exploring the possibilities of this ecosystem under the needs, constraints and requirements coming from children and adolescents with cancer, young survivors, their families and caregivers, clinicians and medical professionals, IT professionals and data scientists and many more.
Without further ado, join us in this journey through the seemingly endless potential of AI in pediatric oncology and its most immediate challenges.
The clinical perspective: ethics first
The ethical implications of an ecosystem such as the one we are trying to create for EU4CHILD are many and very diverse. First and foremost is to consider that this ecosystem will be designed for pediatric cancer patients –a vulnerable demographic due to their young age–, while not losing sight of the ramifications regarding responsibility, transparency and agency that appear when bringing Artificial Intelligence into the clinical decision-making in childhood cancer, which are far-reaching.
Explainability lies at the core of these requirements: young cancer patients, survivors and their caregivers deserve to be informed and adequately involved in their own care. Information about treatment and follow-up options should be constant, clear and concise. The caveat: these basic directives clash with the black-box nature of many AI methods. Even when the input data is well known, the logic that prompts an AI algorithm to reach a certain outcome may be totally obfuscated for healthcare professionals and patients, making the whole process hardly auditable. To bring some light into this black-box, decision-support systems should provide a human recognizable sequence of arguments, one which lets the AI system come to the final decision on diagnosis or treatment options.
What would make a model explainable for clinicians then? This simple question ties in nicely with the topic of accountability. From a clinical standpoint, explainability has traditionally been viewed as a means of ultimately justifying decision-making in a given situation. In other words, more so than knowing what results a hypothetical AI model yields, we are fundamentally concerned with how those results came to be.
The problem with data sharing
Even though the intersection of AI and healthcare is still fraught with many ethical and legal challenges, only one of them specifically targets the foundation of all AI systems: data sharing. Building and validating AI applications for a field such as pediatric oncology takes the issue of data sharing –already a widely-recognized hurdle for research as a whole– to new levels of complexity. In the realm of AI, data is critical to train and fine-tune algorithms. Without quality data, we can only arrive at subpar AI models and inevitably, poor outcomes.
These limitations are only magnified when dealing with data on pediatric cancer, a rare disease by definition: apart from data scarcity and fragmentation (both inherent to rare diseases), we must also ponder the dilemma of data privacy and protection. The GDPR allows for the sharing of anonymized data, provided there is legitimate interest for its use. However, can we confidently speak of achieving full anonymity in the realm of childhood cancer when the potential for reidentification of a given patient from their collected data exists?
The answer is “barely”. This is where pseudonymization and informed consent come into play, and new ethical ramifications arise. The moment when a child is diagnosed with cancer is a very delicate situation in which their family and caregivers are expected to process a large volume of critical information under immense stress and uncertainty. Consequently, the process of informing them about diagnosis and treatment options, and requesting their consent to collect, process and share their child’s medical data for medical purposes must be handled with the utmost care.
The technical dimension
In order to successfully streamline the development and adoption of meaningful AI solutions in healthcare, we need to fill in the blanks of what it is exactly that we want to achieve (the application per se) and how it can be accomplished in a privacy-preserving manner. Finding the right approach and model for a given task (among all available AI techniques) is a not a straightforward process; it takes time and effort.
On this topic, Prediction Models as a Service seems to be a reasonable design choice to juggle requirements on data security while preserving flexibility. In this scenario, prediction models are offered as a service by a data processor, so the data controller maintains full control over both the data fed to the model and its preferred degree of pseudonymization. This allows the data controller to keep the link between pseudonym and patient data, which reduces the likelihood and severity of a potential data breach.
To expand on these privacy nuances, a distributed approach seems to be a no-brainer for the healthcare sector. Novel techniques like Federated Learning, Swarm Learning and Differential Privacy share this distributed approach. For some solutions, patient data does not have to leave the clinic and can thus be processed on site, which of course requires on-premise computation resources, but also serves to strengthen the data protection targets even more. At their core, all these approaches rely on the concept to collaboratively develop, share and refine AI models while keeping patient data secure.
Data is the way forward
Fueled by the shared realization of the enormous potential medical data can hold, the collective spotlight is now fixed on how to unlock this ‘dormant’ value in health-related data. This ambition is not trivial, however: the healthcare sector (at the EU and global stage) is rich in data but poor in information, and there is a distinct lack of evidence-based information to inform research, policy decisions and regulations.
There is also the current siloed approach to health data collection and management to contend with, and the overall disparity of data sources severely limits the value that can be extracted from it. Hospitals may store data on the same clinical pathologies in widely different formats, choosing to inform specific data fields and disregard others, ultimately making it so that even when data is accessible, it is difficult for end users to discover and understand its underlying structure and meaning.
Into this already chaotic mix goes yet another ingredient: data quality. Available data within a clinical setting may be either incomplete or not stored routinely, thus hampering its sharing and reuse potential and highlighting the pressing need for broadly applicable standards capable of interpreting it. And it is here precisely that the next frontier in healthcare awaits: in how to unravel the heterogeneity of health data through computer-based modelling (i.e. in silico models) to pave the way towards personalized medicine.
This is particularly true in pediatric oncology, where it looks as if there might be a threshold for cure in many cancer types, and despite high survival rates, further advances are hard to realize by means of just clinical trial operation. It has become essential to merge research results from biomolecular findings, imaging studies and scientific literature with clinical data from patients to expand the capability to generate, analyze and share progressively larger datasets. Just a modest increase in the molecular data collected from individual patients could grant invaluable insights into the degree of variability of pediatric cancer and will help us push the boundaries on how to care for not only young cancer patients, but all those suffering from a rare disease.
When we shift our focus to the individual, all diseases become unique.