NEWS & RESULTS
The following results are for demonstration purposes only. More general validity will become clear as the quality and quantity of our data improves.
02/08/2022: Out of office! We are taking a summer break through August to recharge. Lots of research is taking place with several papers being written on a whole range of topics to be released over the next few months.
01/08/2022: PhD competition! The BloodCounts! consortium is offering two fully funded PhD positions for collaborators in our four African centers (The Gambia, Ghana, Kenya and South Africa).
This will ensure BloodCounts! contributes towards closing the gap between high, middle and low income countries in allowing more equitable access to AI solutions in healthcare.
27/07/2022: AIX-COVNET at MIUA2022! The Medical Imaging and Understanding Conference is being hosted in Cambridge for the first time and AIX-COVNET are heavily involved. Prof. Carola-Bibiane Schönlieb and Dr. Michael Roberts are organisers and both Dr Anna Breger and Dr Ian Selby are presenting posters and an abstract at the conference.
03/07/2022: We have a reproducibility crisis! In our systematic review and the imputation paper, recently submitted, we have identified systemic issues with reproducibility at many levels, including for datasets, codebases, models and analysis pipelines. Dr. Michael Roberts will be talking at workshop titled 'The Reproducibility Crisis in ML based Science' on 28 July about many of the reproducibility issues we have identified through AIX-COVNET, along with solutions to all of these.
Link: https://sites.google.com/princeton.edu/rep-workshop
Update! Here is a link to the talk: https://www.youtube.com/watch?v=7Bxcz7Z_tY0
08/06/2022: Paper announcement! After two years of experiments and data analysis, we are proud to have submitted the latest paper titled 'Classification of datasets with imputed missing values: does imputation quality matter?'. Imputation of incomplete datasets is a common technique to prepare a dataset for machine learning.... but we find that many state-of-the-art methods are not reproducing the distribution of the dataset faithfully. This has implications for both model performance and interpretability.
26/04/2022: AIX-COVNET in Brussels! We are in Brussels for the first in-person catchup of the DRAGON consortium in Brussels.
17/04/2022: Happy Easter All! Lots of exciting work coming to an end now and publications to be submitted in the next few months.
04/04/2022: MIUA 2022 in Cambridge! We are hosting the Medical Imaging and Understanding and Analysis 2022 conference in Cambridge! Encouraging everyone to submit papers and posters as the expression of interest deadline is tomorrow!
30/03/2022: We've been recruiting! BloodCounts! has been actively recruiting through this year and we now have three new PhD students, a new postdoc and an excellent project coordinator. Lots of projects and papers in the pipeline, exciting times ahead!
20/03/2022: AIX-COVNET @ SIAM Imaging 2022. Dr. Anna Breger, Dr. Sören Dittmer and Dr. Michael Roberts are at SIAM Imaging Science 2022 held virtually in Berlin. On top of these three, there will be lots of interesting talks happening there!
11/03/2022: Anna @ the CIA. Dr. Anna Breger is giving a talk at the CIA titled "On data representations, evaluation and applications in medical imaging". This will be very interesting!
01/02/2022: Expanding the team! Dr Anna Breger joins the AIX-COVNET team to push forward our CXR analysis.
30/01/2022: Book chapter submitted! The AIX-COVNET team have submitted a book chapter reviewing the AI methods used in the COVID-19 pandemic applied to point-of-care imaging (i.e. CXR, CT, Ultrasound). Will update when it appears.
17/01/2022: New publication! Led by Imperial and the DRAGON consortium, AIX-COVNET have contributed to: Data harmonisation for information fusion in digital healthcare: A state-of-the-art systematic review, meta-analysis and future research directions. This article summarises and reviews current approaches to data harmonisation, in multiple domains. This forms the backbone of creating usable datasets for machine learning.
05/01/2022: Happy new year! We have a busy start to the year, with two new PhD students and a new postdoc joining immediately followed by a project coordinator and two postdocs for the BloodCounts! project. There are many of our projects nearing publications, expect a nice harvest in the next few months from all of our core themes!
24/12/2021: Implementation in the hospital is underway! In the same week we have received the hardware, kindly provided by Lenovo and Intel, and also have approval from Information Governance at Addenbrooke's to implement this hardware inside the hospital. This will allow for deployment of our algorithms prospectively inside a real hospital. Watch this space!
15/12/2021: Publication of our federated machine learning paper in Nature Machine Intelligence! Through collaboration between AIX-COVNET and colleagues Mr Hanchen Wang, Dr. Adrian Weller and Prof. Joan Lazenby in the Department of Engineering at the University of Cambridge we have developed a system for encrypted federated learning and applied it to COVID-19 CT scans from the UK and China giving an improved generalisability.
14/12/2021: AIX-COVNET at NeurIPS. Michael Roberts, representing the AIX-COVNET collaboration, presented to a workshop at NeurIPS about the challenges of developing machine learning methods in a pandemic and the lessons we can learn. In addition the differences between using benchmarking and real-world data were explored.
17/11/2021: Congratulations to Dr Lei Zhu! We are delighted to announce that AIX-COVNET postdoc Dr. Lei Zhu has secured a permanent position at the Chinese University of Hong Kong as a Lecturer of Applied Mathematics. Although it is a shame he will be leaving our team, we are delighted for him and his career. Good luck Lei!
09/11/2021: Inspiring the next generation about imaging. Michael Roberts presented to secondary school children about the work AIX-COVNET is doing and how we can use mathematics with medical imaging problems.
01/11/2021: We're hiring! We have adverts live for two postdocs and a project coordinator for the BloodCounts! project to allow us to expand the pandemic detection proof-of-concept into a fully fledged product.
Link: [Adverts now closed]
29/09/2021: Presentation to the strategic partners of the University of Cambridge. Michael Roberts gave an overview of the developments in the AIX-COVNET collaboration demonstrating the power of mathematics and our research to the strategic partners of the University of Cambridge including Google, Microsoft and Aviva.
07/09/2021: AIX-COVNET and MIUA 2022. AIX-COVNET are proud to be a partner to the Medical Image Understanding and Analysis conference 2022 which will be hosted in Cambridge. There will be one workshop at the conference devoted to the developments in the analysis of the medical images of COVID-19 patients.
01/08/2021: Out of office. We are taking a break for August to recharge and relax.
25/06/2021: Our BloodCounts! project is announced as a £1m prizewinner for the Trinity Challenge! We have won joint second place in the Trinity Challenge. This will allow us to develop our 'tsunami-like' early warning system for detection of new pandemics.
Link: https://solve.mit.edu/challenges/the-trinity-challenge/solutions/39133
15/06/2021: Spotlight on Incorporation Bias. Mr. Derek Driggs highlights a finding from our systematic review, which is also systemic in the machine learning literature. Incorporation bias is introduced when the outcome labels are not independent of the predictors. Find out more at the link below.
Link: https://gateway.newton.ac.uk/presentation/2021-05-20/30487
15/06/2021: Lessons from the Pandemic for Machine Learning and Medical Imaging. Dr Michael Roberts presented our latest update on the key lessons from this pandemic and how we can be more prepared for next time.
Link: https://gateway.newton.ac.uk/presentation/2021-05-20/30018
02/06/2021: Our collaboration is featured in SIAM News. Providing an overview of the particular imaging challenges for machine learning during a pandemic and lessons that have been learned.
Link: https://sinews.siam.org/Portals/Sinews2/Issue%20Pdfs/sn_June2021.pdf
01/06/2021: Our BloodCounts! project is a finalist for the Trinity Challenge! A proposal, led by our team, to use the complete/full blood count as a pandemic surveillance tool has been selected in the top 16 of 380+ entries to the Trinity Challenge.
Link: https://solve.mit.edu/challenges/the-trinity-challenge/solutions/39133
19/05/2021: We are featured in the New Scientist! In the wake of our systematic review, Michael Roberts penned an opinion piece discussing the complexities of applying machine learning to a pandemic and some fundamental issues for applying machine learning in healthcare.
30/04/2021: AIX-COVNET contributes to a Joint Biosecurity Centre and Turing Institute panel. Dr Michael Roberts appeared as a panellist talking about the shortcomings in the response of machine learning using images to the current pandemic response.
26/04/2021: Pandemic Machine Learning Pitfalls. Derek Driggs, of our collaboration, appeared on the Data Skeptic podcast and discussed in detail the issues with existing machine learning models for COVID-19 prognosis and diagnosis along with how these could be improved.
Link: https://dataskeptic.com/blog/episodes/2021/pandemic-machine-learning-pitfalls
24/03/2021: Our editorial is published in Radiology: Artificial Intelligence! An editorial, in which we consider how researchers can contribute positively to the COVID-19 machine learning community, has been published.
15/03/2021: Systematic review is published in Nature Machine Intelligence! Our extensive review, identifying systemic pitfalls in the machine learning for COVID-19 literature is online. We make detailed recommendations to ensure that future models are held to a high standard and are of potential clinical utility.
02/02/2021: Press release from Cambridge University Hospitals (CUH) about AIX-COVNET. Our collaboration is discussed in a press release from CUH describing how our algorithms will save the lives of patients.
Link: https://www.cuh.nhs.uk/news/artificial-intelligence-covid-breakthrough-will-save-lives/
28/01/2021: Media reporting of AIX-COVNET. Our collaboration was recently featured in several government and NHSX press releases and featured in several news articles.
UK Government: https://www.gov.uk/government/news/ai-at-the-forefront-of-efforts-to-treat-coronavirus-patients
NHSX: https://twitter.com/NHSX/status/1350840137509445635?s=20
i newspaper: https://inews.co.uk/news/health/coronavirus-latest-ai-imaging-algorithm-nhs-improve-treatment-covid-patients-833379
15/01/2021: Systematic review accepted by Nature Machine Intelligence. We are delighted to announce that our systematic review of the literature for machine learning models to diagnose and prognosticate for COVID-19 has been accepted by Nature Machine Intelligence and will appear as an Open Access publication shortly. The accepted version can be found at https://arxiv.org/abs/2008.06388.
01/01/2021: Two new additions to the team. Through the support of the DRAGON consortium and Intel we have recruited Sören Dittmer and Lei Zhu to join AIX-COVNET as Research Associates focussing on Chest X-Ray and CT analysis model development, respectively.
26/11/2020: Einstein Healthcare join the collaboration! The Einstein Healthcare Network, a private non-profit healthcare organisation, based in the Philadelphia, Pennsylvania has joined AIX-COVNET and is contributing expertise and data from the US to help assess the generalisability of developed algorithms.
15/11/2020: Third and final iteration of the living systematic review. An update to our systematic review for machine learning methods for COVID-19 diagnosis and prognostication is now available at https://arxiv.org/abs/2008.06388. We have reviewed the literature up to and including 3rd October 2020.
13/10/2020: Second iteration of the living systematic review. An update to our systematic review for machine learning methods for COVID-19 diagnosis and prognostication is now available at https://arxiv.org/abs/2008.06388. We have reviewed the literature up to and including 14th August 2020.
30/09/2020: AIX-COVNET presents their project at IPEM 2020. See the video below with Dr. Michael Roberts giving an overview and update on the work of the AIX-COVNET project, our ambitions and some preliminary results.
14/08/2020: Systematic review submitted!
We have now completed and submitted our systematic review titled "Machine learning for COVID-19 detection and prognostication using chest radiographs and CT scans: a systematic methodological review". The preprint can be found here: https://arxiv.org/abs/2008.06388.
Whilst the paper is under review, we are continuing our thorough search of the literature and will provide updates every three weeks to ensure the review remains a 'living' document and is always a timely reflection of the current literature.
21/07/2020: Systematic review of AI methods using radiological imaging for diagnosis and prognosis of COVID-19.
Over the last two months our researchers have been completing a systematic methodological review of all articles (pre-prints and published) from 1 Jan 2020 to 24 June 2020 that develop models for diagnosis and prognosis prediction from CT and Chest X-Ray images. We believe it is useful to the community to publish here our current findings before we submit our review article. First we will discuss the process taken to isolate the final 29 papers from 952 candidates and then we give some preliminary findings.
Review process
Our filtering process, shown in the flowchart on the right, was as follows:
We initially used broad search criteria to obtain any machine learning or deep learning papers containing words associated to COVID-19, this highlighted 952 papers, of which 927 were unique articles.
After consideration of the titles and abstracts, and rejecting papers that do not develop machine learning or deep learning models for the diagnosis or prognosis with COVID-19 from Chest X-Ray or CT, we retained 213 papers. A full-text review was then performed and we removed any papers that were ineligible, leaving 168 papers for the next stage.
3. Next, we remove papers from consideration that have not documented their methodologies in sufficient detail to allow reproduction. For deep learning papers we use the Checklist for Artificial Intelligence in Medical Imaging [1], excluding some papers which do not fulfil some mandatory checklist items. For machine learning papers we use the Radiology Quality Score [2] and exclude papers with a score below a pre-defined threshold. This excluded the majority of papers, with only 29 retained for final review.
For each remaining paper, we have performed a review of bias, following the PROBAST guidance, to determine whether there are underlying biases in the remaining 29 papers, along with extracting the important data from those papers.
Review contributions
Our systematic review article is currently in preparation for submission in the next few weeks, but will make the following contributions for the community:
Highlight to the community those publications which are most reproducible and review their model performance.
Identifying systematic issues with the current literature.
Discuss the datasets used throughout the literature and assess them in detail, identifying potential issues.
Identify pressing clinical questions that the literature currently does not address.
Make recommendations to the community for how future papers should be written and define criteria for documentation of methodology that they should aim to fulfil.
Preliminary (unpublished) findings
Below we display a series of unpublished tables with the key information we have extracted for each of the 29 papers:
* the authors state that the "tool will be made publicly available".
** the authors state that the data is available on request but not that the code or models are available.
References
[1] Mongan, John, Linda Moy, and Charles E. Kahn Jr. "Checklist for Artificial Intelligence in Medical Imaging (CLAIM): A guide for authors and reviewers." (2020): e200029.
[2] Lambin, Philippe, et al. "Radiomics: the bridge between medical imaging and personalized medicine." Nature reviews Clinical oncology 14.12 (2017): 749-762.
[3] Wolff, Robert F., et al. "PROBAST: a tool to assess the risk of bias and applicability of prediction model studies." Annals of internal medicine 170.1 (2019): 51-58.
[4] Acar, Erdi, Engin ŞAHİN, and İhsan Yilmaz. "Improving effectiveness of different deep learning-based models for detecting COVID-19 from computed tomography (CT) images." medRxiv (2020).
[5] Amyar, Amine, Romain Modzelewski, and Su Ruan. "Multi-task Deep Learning Based CT Imaging Analysis For COVID-19: Classification and Segmentation." medRxiv (2020).
[6] Ardakani, Ali Abbasian, et al. "Application of deep learning technique to manage COVID-19 in routine clinical practice using CT images: Results of 10 convolutional neural networks." Computers in Biology and Medicine (2020): 103795.
[7] Bai, Harrison X., et al. "AI augmentation of radiologist performance in distinguishing COVID-19 from pneumonia of other etiology on chest CT." Radiology (2020): 201491.
[8] Ghoshal, Biraja, and Allan Tucker. "Estimating uncertainty and interpretability in deep learning for coronavirus (COVID-19) detection." arXiv preprint arXiv:2003.10769 (2020).
[9] Georgescu, Bogdan, et al. "Machine Learning Automatically Detects COVID-19 using Chest CTs in a Large Multicenter Cohort." arXiv (2020): arXiv-2006.
[10] Chassagnon, Guillaume, et al. "AI-Driven CT-based quantification, staging and short-term outcome prediction of COVID-19 pneumonia." arXiv preprint arXiv:2004.12852 (2020).
[11] Chen, Xiaofeng, et al. "A diagnostic model for coronavirus disease 2019 (COVID-19) based on radiological semantic and clinical features: a multi-center study." European radiology (2020): 1.
[12] Ezzat, Dalia, and Hassan Aboul Ella. "GSA-DenseNet121-COVID-19: a hybrid deep learning architecture for the diagnosis of COVID-19 disease based on gravitational search optimization algorithm." arXiv preprint arXiv:2004.05084 (2020).
[13] Luz, Eduardo, et al. "Towards an efficient deep learning model for covid-19 patterns detection in x-ray images." arXiv preprint arXiv:2004.05717 (2020).
[14] Tartaglione, Enzo, et al. "Unveiling COVID-19 from Chest X-ray with deep learning: a hurdles race with small data." arXiv preprint arXiv:2004.05405 (2020).
[15] Shi, Feng, et al. "Large-scale screening of covid-19 from community acquired pneumonia using infection size-aware classification." arXiv preprint arXiv:2003.09860 (2020).
[16] Kana, Evariste Bosco Gueguim, et al. "A web-based Diagnostic Tool for COVID-19 Using Machine Learning on Chest Radiographs (CXR)." medRxiv (2020).
[17] Guiot, Julien, et al. "Development and validation of an automated radiomic CT signature for detecting COVID-19." medRxiv (2020).
[18] Jin, Shuo, et al. "AI-assisted CT imaging analysis for COVID-19 screening: Building and deploying a medical AI system in four weeks." medRxiv (2020).
[19] Cohen, Joseph Paul, et al. "Predicting covid-19 pneumonia severity on chest x-ray with deep learning." arXiv preprint arXiv:2005.11856 (2020).
[20] Ko, Hoon, et al. "COVID-19 Pneumonia Diagnosis Using a Simple 2D Deep Learning Framework With a Single Chest CT Image: Model Development and Validation." Journal of Medical Internet Research 22.6 (2020): e19569.
[21] Lassau, Nathalie, et al. "AI-based multi-modal integration of clinical characteristics, lab tests and chest CTs improves COVID-19 outcome prediction of hospitalized patients." medRxiv (2020).
[22] Mei, Xueyan, et al. "Artificial intelligence–enabled rapid diagnosis of patients with COVID-19." Nature Medicine (2020): 1-5.
[23] Heidari, Morteza, et al. "Improving performance of CNN to predict likelihood of COVID-19 using chest X-ray images with preprocessing algorithms." arXiv preprint arXiv:2006.12229 (2020).
[24] Bassi, Pedro RAS, and Romis Attux. "A Deep Convolutional Neural Network for COVID-19 Detection Using Chest X-Rays." arXiv preprint arXiv:2005.01578 (2020).
[25] Pu, Jiantao, et al. "Any unique image biomarkers associated with COVID-19?." European Radiology (2020): 1.
[26] Qi, Xiaolong, et al. "Machine learning-based CT radiomics model for predicting hospital stay in patients with pneumonia associated with SARS-CoV-2 infection: A multicenter study." medRxiv (2020).
[27] Wang, Shuai, et al. "A deep learning algorithm using CT images to screen for Corona Virus Disease (COVID-19)." MedRxiv (2020).
[28] Wang, Shuo, et al. "A fully automatic deep learning system for COVID-19 diagnostic and prognostic analysis." European Respiratory Journal (2020).
[29] Chen, Xiaocong, et al. "Momentum Contrastive Learning for Few-Shot COVID-19 Diagnosis from Chest CT Images." arXiv preprint arXiv:2006.13276 (2020).
[30] Zhu, Xiaofeng, et al. "Joint Prediction and Time Estimation of COVID-19 Developing Severe Symptoms using Chest CT Scan." arXiv preprint arXiv:2005.03405 (2020).
[31] Li, Xin, and Dongxiao Zhu. "Covid-xpert: An ai powered population screening of covid-19 cases using chest radiography images." arXiv preprint arXiv:2004.03042 (2020).
[32] Zokaeinikoo, Maryam, et al. "AIDCOV: An Interpretable Artificial Intelligence Model for Detection of COVID-19 from Chest Radiography Images." medRxiv (2020).
17/07/2020: We're recruiting! Following on from our collaboration being awarded grants totaling £1.07m, we are looking for two excellent research associates to join the group. We are looking for candidates that have experience in inverse problems, machine learning, neural networks, and/or (medical) image analysis, especially knowledge of integrating multi-stream medical data in clinical settings. Details of the two posts and information on how to apply can be found here. The closing date is 14th August 2020. Vacancy closed.
28/06/2020: AIX-COV-NET collaboration gets access to novel datasets. We are excited to announce that within the space of one fantastic week, our collaboration has been granted ethical approval for access to all imaging data (X-Ray and CT) and clinical data for PCR tested patients at Addenbrookes and Papworth hospitals in Cambridge along with access to the NHSx National COVID-19 Chest Image Database (NCCID). Between the Cambridge and NCCID datasets, we will have access to tens of thousands of images from multiple hospitals throughout the UK which gives the collaboration a huge diverse dataset to train and validate algorithms on.
22/06/2020: Media features about AIX-COV-NET. Our collaboration has been the subject of two articles in the last few weeks, firstly from the University of Cambridge and secondly from Plus magazine.
01/06/2020: External funding from IMI and Intel. We are delighted to announce that the AIX-COV-NET collaboration has been awarded two grants totaling £1.07m.
Firstly, from the EU funded Innovative Medicines Initiative we have received £950k as part of the DRAGON consortium led by Oncoradiomics, which received a total award of €11.4m. This funding will be utilised for funding two post-doctoral researchers for three years each, along with funding the extraction of imaging, clinical and lab data from hospitals and supporting the compute capacity and storage requirements for the algorithm development.
Secondly, Intel have kindly granted £120k funding for two post-doctoral researchers for one year each to work on algorithm development, validation and deployment.
15/05/2020: AIX-COVNET presents their project at the MICCAI Imaging AI based Management of COVID-19 Webinar Series on the 15th of May. See the video below with Prof. Carola-Bibiane Schönlieb giving an overview to the AIX-COVNET project, our collaboration, ambitions and some preliminary results.
07/05/2020: Literature review for the use of AI based solutions for COVID-19 diagnosis and prognostication from Chest X-Ray images
We are delighted to release the first version of our review of the existing AI solutions for COVID-19 diagnosis and prognostication from X-Ray images. As the research landscape changes on a daily basis, this is a live review article and will be updated periodically to reflect new papers describing AI based solutions for X-ray diagnosis and prognosis. This will be followed by a future literature review which considers AI solutions for diagnosis and prognostication from CT imaging.
28/04/2020: X-Ray Update
Work has begun in various areas regarding the use of chest X-ray images (CXR), in our efforts to diagnose and prognosticate for COVID-19 and related pathologies. Being a high-resolution format (often around 6-12 million pixels in the raw DICOM files), CXR has a potential to represent far more than just the signals we care about.
A common pitfall in this field is to overfit to a data source (e.g. a particular hospital) or a particular device manufacturer. This is due to the fact that often the samples of a particular class label (e.g. positive for COVID, or non-survival) will be over-represented in the data from a given hospital or from a particular country.
While neural networks are wonderfully powerful, in a sense they are also lazy. As they will ‘cheat’ in any way they can. If recognising the hospital that the data came from is easier than recognising the disease, and the bias in the data means that is enough to pass their test, then neural networks will learn the easy thing first. They might do something obvious, like learn the font of the text labels printed on the x-ray. Or something less obvious, like spot the level of noise in the background of the image.
Pre-processing
As we know that a neural network can ‘cheat’ by learning to connect irrelevant features to the outcome, our initial step has been to try and reduce some of that ‘giveaway’ information.
Firstly, we use an open source lung segmentation algorithm [1], which delineates the boundary of the lung parenchyma, to crop the lung parenchyma from the X-ray as in Figure 1(b).
Secondly, we then enhanced the segmented lung tissues using adaptive histogram equalisation and various filtering techniques on the masks produced, such as maximum and median filtering. This makes cleaner, fuller lung extractions and enhances the imaging features we want the algorithm to focus on, such as the structures of the bronchia and anomalous attenuations within the lung. The results of this enhancement are displayed using a false colour image in Figure 1(c).
Note: the text label embedded in the top right of the original image is no longer available in the processed image, i.e. it has been removed from the network’s input and so cannot be used to ‘cheat’.
Pre-training
Overfitting can also be seen as ‘learning the examples, not the features’. i.e. if the big powerful neural network gets too few samples to learn from, it quickly gets to around 100% accuracy on the training data. But only on that data. Try it with new data it’s not seen before and its accuracy drops dramatically. Instead of learning what COVID looks like, it would simply have learned what the few examples you’ve given it look like. Therefore, avoiding this behaviour is very valuable indeed!
In addition to the pre-processing methods described previously, another common method used to avoid overfitting to our training data, is to pre-train using a large amount of relevant data. This ensures that the features extracted from the images are relevant. We then ’fine-tune’ the network to the task at hand.
In our case, we do this by taking abundant data (100K+ CXR images – but with no COVID amongst them) and training the network to predict abundant labels (e.g. conditions such as pneumonia, cardiomegaly, pneumothorax, etc…). During this process, the feature extracting layers of the neural network will learn image patterns that are descriptive of these pathologies. Since these visual findings are also likely related to either diagnosis or prognosis of COVID-19, they will come in handy later.
Once the features (visual patterns) are extracted, we can remove the part of the network which performs the actual disease classification for this pre-training dataset. We replace it with a classifier layer which we can train on COVID data.
As much of the hard work is in the extraction of the features from the pre-training dataset, we can do that whilst awaiting usable, anonymised COVID CXR images to arrive. Another benefit is that this allows the hard processing to be done on the large amount of pre-training data. This helps the network avoid overfitting to a relatively small sample of COVID CXR images.
So far, we’ve utilised the dataset [2] used by CheXNet [3] for our pre-training. This dataset has unfortunate, well-documented, issues in its labelling [4]. However, it still remains a valuable asset for pre-training, due to its size and availability.
We have also begun the process of further training on the dataset used by CheXpert (448GB of jpegs) [5]. This dataset is purported to have solved or mitigated many of the concerns around CheXNet’s data labels, through improved NLP labelling and testing. This improvement is far from perfect, yet still very significant. Notably in the quality of the ‘no finding’ category [6]
Although freely available, we’ve used neither of the two algorithms associated with these two open-source projects [7, 8]. Primarily, this is due to our challenge being a little more subtle, as e.g. to distinguish between COVID-19, and other forms of pneumonia is more visually challenging than simply spotting pneumonia. As such we’ve trained networks at significantly higher resolution than those used by the aforementioned projects.
It’s too early to release any results yet, but we’re going as quickly and carefully as we can. Watch this space!
[1]: https://github.com/imlab-uiip/lung-segmentation-2d
[2]: https://nihcc.app.box.com/v/ChestXray-NIHCC
[3]: Rajpurkar, Pranav, et al. "Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning." arXiv preprint arXiv:1711.05225 (2017).
[4]: https://lukeoakdenrayner.wordpress.com/2017/12/18/the-chestxray14-dataset-problems/
[5]: Johnson, Alistair EW, et al. "MIMIC-CXR: A large publicly available database of labeled chest radiographs." arXiv preprint arXiv:1901.07042 1.2 (2019).
[6]: https://lukeoakdenrayner.wordpress.com/2019/02/25/half-a-million-x-rays-first-impressions-of-the-stanford-and-mit-chest-x-ray-datasets/
[7]: https://github.com/arnoweng/CheXNet
[8]: https://github.com/gaetandi/cheXpert
Contact: Philip Teare, philip.teare@astrazeneca.com
12/04/2020: Graph clustering of COVID-19 feature vectors
Using a publicly available Kaggle dataset of X-Rays for COVID-19, viral pneumonia and healthy patients, we aim to determine if the imaging features of the X-Rays can allow us to cluster the scans with each diagnosis. The images are passed through a pre-trained CheXNet CNN and we obtain the feature vectors from the final layer (before the classifier). After feature reduction, we cluster the feature vectors by the Euclidean distance, using off-the-shelf Laplacian regularisation algorithms, such as [2].
Important caveat: the Kaggle dataset is potentially biased and performance may therefore be overstated. Performance of the model will become clearer once we work with the data received from collaborating hospitals.
[1] Rajpurkar, Pranav, et al. "CheXNet: Radiologist-level pneumonia detection on chest x-rays with deep learning." arXiv preprint arXiv:1711.05225 (2017).
[2] Zhu, Ghahramani and Lafferty, Semi supervised learning using Gaussian fields and harmonic functions, ICML 2003
Contact: Matthew Thorpe, matthew.thorpe-2@manchester.ac.uk
05/04/2020: Lung and Ground Glass Opacity (GGO) Segmentation
To build on the lung detection discussed in (1), we also segment both the lung parenchyma and GGO regions in the CT scans to focus the CNN to these regions. Using the model [1] we can automatically segment the lung parenchyma from CT scans, the model being particularly useful for lungs with diseased tissues. The GGO segmentation model is under development currently but preliminary results are shown below.
[1] https://github.com/JoHof/lungmask
Contact: Johannes Hofmanninger, johannes.hofmanninger@meduniwien.ac.at
Blue = GGO
Yellow = Consolidation
01/04/2020: Lung Detection
We first detect the lung regions of the CT scans. This is principally motivated by three problems:
(1) As CT scans are 3D images, they are relatively large, commonly with 50M+ voxels and taking 200MB+ storage. A large number of these voxels are occupied by air outside the patient, the table the patient lies on or anatomy which is outside of the lung. It is advantageous to isolate only the lung voxels of a CT scan to reduce the storage requirement for a large dataset of thousands of CT scans.
(2) Convolutional neural networks (CNNs) consist of millions or billions of parameters and are liable to overfit to a training dataset, learning to connect irrelevant image features to an outcome. In our case, for COVID-19 detection and prognostication, these could be features outside of the body, or specific to a certain scanner. Therefore, to remove biases which can be introduced by inclusion of irrelevant voxels, we will detect the lungs and train the CNN only on the portion of the CT volume which contains the lungs.
(3) GPU computing with 3D CT volumes is still intractable for full resolution thoracic CT scan. Therefore, we typically need to split the volume into large cubes/patches and train on these and it is advantageous to train the algorithm only on the voxels which are relevant to the outcome of interest.
Contact: Michael Roberts, mr808@cam.ac.uk