A new study has found that a direct-to-consumer machine learning model for detecting skin cancers incorrectly classified rare and aggressive cancers as low risk.
Groundbreaking discoveries presented at the 30th EADV Congress today suggest that making applications based on such models available directly to the public without transparency in performance metrics for rare but life-threatening skin cancers is ethically questionable.
Direct-to-consumer skin cancer screening apps fail to detect life-threatening cancers, new study finds
Researchers in London focused on two types of skin cancer, Merkel cell carcinoma (MCC) and amelanotic melanoma , both rare but particularly aggressive cancers that tend to grow quickly and require early treatment. They created a dataset of 116 images of these rare cancers and the benign lesions seborrheic keratosis and hemangiomas, and evaluated these images with two machine learning models.
About Merkel cell carcinoma Merkel cell carcinoma (MCC) is a rare non-melanoma cancer that is very aggressive and fast growing. It starts in the hormone-producing Merkel cells, which are usually found in the top layer of the skin and in the hair follicles. MCC presents as bluish-red lumps on the skin, which are often found on the head, neck, arms and legs, but can spread to other parts of the body. It is primarily associated with ultraviolet light, from prolonged sun exposure and tanning beds, as well as conditions or treatments that weaken the immune system and polyomavirus infections. The prevalence of CCM is 0.2 to 0.45 cases per 100,000 inhabitants. About amelanotic melanoma Melanoma is a type of skin cancer that develops in cells called melanocytes. It affects older people more frequently. Amelanotic melanoma is rare, accounting for about 8% of all melanomas. Amelanotic means without melanin, a dark-colored skin pigment. Unlike other melanomas, amelanotic melanomas are usually red or skin-colored rather than dark. They are often difficult to diagnose due to their lack of color and can be confused with other skin conditions. Thus, as their diagnosis is usually delayed, they are associated with a poor prognosis. |
The first model studied was a certified medical device, sold directly to the public through the app store and advertised as capable of diagnosing 95% of skin cancers (Model 1). The second model was available for research purposes only and was used as a reference (Model 2).
The results showed that Model 1 incorrectly classified 17.9% of MCCs and 22.9% of amelanotic melanomas as low risk. In turn, 62.2% of benign lesions were classified as high risk.
For the detection of malignancy, the sensitivity of Model 1 was 79.4% [95% confidence interval (CI): 69.3-89.4%] and the specificity was 37.7% [95% CI]. %: 24.7-50.8].
For Model 2, MCC was not included in the top 5 diagnosis for any of the 28 MCC images analyzed, raising the possibility that the model was not trained on the existence of this disease class.
The high rate of false positives in Model 1 has potentially negative consequences on a personal and societal level.
The results raise a larger question about the safety of other artificial intelligence (AI) skin cancer detection models available on the market.
Lloyd Steele, lead author of the study at the Blizard Institute at Queen Mary University of London, UK, explains: “To improve, machine learning model evaluations need to consider the spectrum of diseases that will be seen in practice. "At the moment, most of the performance of those models is based on available imaging data, which is particularly sparse when it comes to rare skin cancers."
A global collaboration between research groups and hospitals may be a step in addressing the skin cancer imaging data gap, which is a crucial element for a high-performance machine learning rate.
Marie-Aleth Richard, EADV board member and professor at La Timone University Hospital, Marseille, said: “The number of skin cancer screening apps available for consumer use is growing, but as demonstrated in This research, there must be more transparency about the safety and effectiveness of these applications. Additionally, these devices detect only what they are shown to analyze and do not perform a systematic analysis of the entire skin surface. Not being transparent could put lives at risk."