How good are skin screening apps at detecting skin cancer?

More than 200 dermatology smartphone apps are out there. They can be used for self-surveillance, disease management, UV recommendation and teledermatology. Among the most popular apps in Dermatology are the ones that help patients identify malignant skin lesions or moles and refer them to a specialist. A specialist would inspect suspicious skin lesions, excise them and study them under a microscope to search and confirm malignancy.

Skin cancer is the most common cancer worldwide with more than 3 million cases worldwide. But not all types of skin cancer are dangerous and life-threatening. Out of 100 people with skin cancer: 80% would have basal cell carcinoma, 16% cutaneous squamous cell carcinoma, and 4% melanoma. Basal cell carcinoma (BCC) and cutaneous squamous skin carcinomas (SCC) have the highest prevalence among all skin cancer types and have a very low risk of metastasizing i.e. building secondary tumors or malignancies in other neighboring regions. They are easy to diagnose and have a good outcome if treated early. Melanomas on the other hand are dangerous skin tumors that build metastases in neighboring skin regions and lymph nodes. When advanced, melanoma can metastasize in the lungs, liver, and brain. Around 75% of deaths due to skin cancer result from melanomas. Melanomas are not that easy to diagnose, because they are quite variable. They could have characteristic pigmentations and structure changes that help identify them, but sometimes they are not pigmented, and often further analysis is needed using dermoscopy or a biopsy to exclude any malignancies. The role of skin screening apps should be to help patients and doctors accurately identify skin tumors and reduce delays in diagnosis and therapy initiation.

But skin screening apps are not as good as you might think, even thoes with a medical product certification (CE). A systematic review from 2020 examined the diagnostic accuracy of AI-based skin cancer apps and came to very surprising results.

Skin cancer apps either forward images from smartphones to specialists for review, which is essentially teledermatology, or directly identify malignant lesions using built-in software. With the rising hype of machine learning and digital medicine apps companies started to develop inbuilt AI-powered algorithms to directly classify images as high, moderate or low risk skin cancer. When distriputed to the public such apps need to be regulated and certified as medical products.

To get a regulatrory apporoval such apps undergo evaulation studies to prove effectiveness at fulfilling their intended purpose. You may remember such evaluation studies from the covid-19 vaccinations. Normally good evaluation studies include:

Thousands of participants
A placebo group (a group without the studied therapie or intervention)
Are double blinded (neither patients nor investigators know who is receiving placebo and who is receiving the medication)
Are randomized (Investigators don’t chose who gets placebo and who gets the medication)
And are peer reviewed (other unbiased and uninvovled fellow scientists review such studies to give feedback about bias, study design, evaluation errors and provide more credibilty to the study).

When studying a new medication, vaccine or an interventional approach in medicine, such studies represent a standard and indispensable practice in any approval process. They are essential and very important to prove the safety and efficacy of any drugs or medical interventions. But in digital medicine the bar isn’t as high. Studies don’t normally include thousands of participants, aren’t double blinded and often aren’t peer reviewed. I believe the reason is that digital medicine apps aren’t as life threnting as drugs. They don’t normally have sideeffects, they don’t chemically influence the physiology of our body and in most cases are not treated as therapeutics. They are more likely to be informative, motivating, monitoring and are supplementary to doctors’ visits.

So what were the results of the systemic review of skin screening apps?

In the systemic review of 2020 only 9 studies qualified for further evaluation. In those 9 studies the number of participants ranged from 31 to 256 patients. The number of sudied skin lesions rangend form 15 to 199 skin lesions. Some studies even reported conducting 5 to 10 attempts to obtain an adequate image of each lesion. Participants were patients who attended a national skin day held by university medical centres, or were patients who attended follow up screenings by their dermatologists. Other evaluation studies recruited only patients who were undergoing excision fo suspicious skin lesions.

When apps failed to return a risk assessment of the skin lesions, the images where either excluded from the study or from the skin analysis. Benign skin lesions with similar clinical features to melanoma, which could be falsely identified by the apps as skin cancer, were also excluded from evaluation studies. Furthermore studies showed that skin screening apps are not applicable to amelanotic melanomas (a melanoma, that doesn’t produce pigment), nor are able to identify other more common types of skin cancers such as basal cell carcinoma (BCC) (80% of all skin cancer cases) or squames cell carcinoma (SCC) (16% of all skin cancer cases).

When applied to pigemented and non-pigmented lesions, one app had a sensitivity rate of 71% i.e out of 100 malignant lesions the app was able to identify 71 skin lesions as malignant or premalignant. However the specificity was only 56% i.e. out of 100 benign skin lesions only 56 were identified as low risk or harmless and the remaining 44 skin lesions were identified as malignant or high risk skin cancer lesions. This is only 6% better than asking a chimpanzee for reassurance about my skin lesion.

But the numbers don’t tell the whole story. Most of the images included in these studies weren’t obtained by the patients themselves!. In most cases the images of the participants’ skin lesions were photographed by researchers under optimal conditions (inlcuding optimal light and high image quality). Other studies used dermatology databases to accuiqre images of excised lesions. Such practice of assisted photography, enhanced lightening and overselecting images with optimal quality leads to an overestimate in diagnostic accuracy. Since skin screening apps are typically targeted for the general population, they are meant to be used by the participants themselves using their smartphones. Evaluations studies are ought to replicate these conditions and include more representative samples to simulate the intended use of their apps. They shouldn’t create optimal conditions that in the majority of cases don’t exist in real life.

Digital medicine applications will definitely be part of the future of healthcare. They offer an added value for the patients journey. They also have the potential to restructure our current healthcare practice. Large investments and high pace of developments in digital medicine are driving this field to become one of the essential pillars in healthcare. Digital health providers should implement weidley recognised and accepted scientifical and medical standards to evaluate their products and increase their acceptance among not only patients, but the science community as well.