Global Technical Excellence Manager Colorcon, Inc. Harleysville, Pennsylvania, United States
Purpose: Color is a critical quality attribute of solid oral dosage forms, influencing brand identity, reducing medication errors and patient acceptability. Traditional color selection is iterative, time-consuming and impacted by film coating formulation, target markets and therapeutic indication. To streamline this complex process, the aim of this study was to develop an artificial intelligence that predicts the CIELAB color values for film coating systems based on raw material inputs and regulatory compliance with a potential correlation to therapeutic areas.
Methods: To develop an AI model, a comprehensive and proprietary databank encompassing over 50,000 unique film coating formulations was used (Colorcon Inc.). This extensive dataset included more than 300 distinct raw materials, pigments and opacifiers. The databank included broad coverage across the CIELAB color space, as shown in Figure 1. The AI model was optimized to predict lab values, and minimize mean squared error (MSE) while color difference (ΔE) was used to validate the model prediction. Two validations of the model's performance included a set of formulations isolated at the time of model development (test-set) and a final assessment of over 3000 new formulations developed prior to the model development (assessment-set). The optimized AI model was further used to generate over 30 million distinct colors to explore expansion of color space for pharmaceutical and nutraceutical film coating applications.
Results: The ΔE value, derived from the perceptually uniform CIELAB color space, quantifies the total color difference between two samples. Typically, a ΔE value of less than 4 is visually imperceptible or difficult to detect by the human eye, depending on the specific color and shade. The AI model demonstrated an extremely high level of predictive accuracy, achieving an average ΔE of 2.93 on the test-set validation. This indicated the model's reliability to predict colors within a perceptually acceptable color range across the wide range of film coatings systems and acceptable raw materials. Figure 2 illustrates the model's ability to capture the influence of pigments on final color by comparing predicted versus actual CIELAB values. This demonstrated the use case of the model's performance in understanding the effect of yellow iron oxide when used in a polyvinyl alcohol (PVA) based film coating system. The results confirmed the model's effectiveness in understanding the quantitative contributions of various ingredients to the color of film coating systems. In the final validation, encompassing over 3000 newly developed formulations, the model maintained exceptional performance with an average ΔE of 2.99, underscoring its robustness and consistency. This strong generalizability across a broad range of film coating formulations—including substantial variations in base systems, opacifiers (TiO2, CaCO3 and others), and diverse pigments—is further demonstrated in Figure 3. With confidence in the model's performance, an expanded color space was explored using the 30+ million formulations in the AI color library. As shown in Figure 1, we significantly expanded the known design space compared to the historical formulation library.
Conclusion: In this study, an AI-driven model was investigated and developed to predict color from diverse film coating formulation compositions with low ΔE values. The model also allowed expansion of color space for regulated markets. This model not only accelerates product development and minimizes material waste from repeated testing, but also ensures proactive adherence to changing regulatory landscapes.
Figure 1. Comprehensive color space covered by existing databank.
Figure 2. Predicted vs. Actual CIELAB values for yellow iron oxide film coating (test-set).
Figure 3. Predicted vs. Actual CIELAB values for assessment set with 3000 formulations, showing model generalizability.