AI in Compliance Services: Opportunities, Challenges and Risks
The growing digitalization across industries has raised expectations that AI can simplify many compliance tasks in the pharmaceutical and medical device sectors. Solution providers often claim that AI will revolutionize development methods, accelerate the approval of drugs and medical devices, and significantly increase the chances of passing audits. While self-learning systems for image analysis in diagnostics and production processes are already close to becoming standard, their use in compliance must be assessed with caution. It is essential to critically evaluate whether the promised efficiency gains are truly realistic and reliable. This article explores potential applications of AI in compliance, highlighting opportunities as well as challenges and risks.
Keywords / Focus Topics:
Cybersecurity, Medical Devices, EU MDR (2017/745), EU IVDR (2017/746), EU GDPR (2016/679), German Medical Device Regulations (MPG, MPSV, MPDG, MPAMIV), MDS2, Privacy by Design, MDCG 2019-16, BSI CS-E 132, IEC 81001-5-1:2021, ISO 13485, IEC TR 60601-4-5:2021, Threat Modeling, CAPEC, OWASP, Common Vulnerability Scoring System (CVSS).
Current technological capabilities do not yet allow self-learning systems and artificial intelligence (AI) to be applied efficiently to compliance-related tasks. Globally deployed AI applications tend to deliver generalized results. While these may be useful for introductory purposes, they are not capable of addressing specific compliance questions with the required level of quality. So far, the anticipated potential and benefits of AI in this field have not been demonstrated.
More promising at this stage are specialized AI applications designed for clearly defined tasks. However, before developing such solutions, fundamental expectations and requirements should be documented in a detailed specification and illustrated with representative examples. Any specific AI application must undergo validation before it is put into use, with clear evidence that all requirements are met in a safe and traceable manner.
Self-learning systems represent an intermediate step toward general artificial intelligence (AI). At its core, AI simulates human behavior through electronic data processing. It is often defined as the ability of a machine to imitate human capabilities such as logical reasoning, learning, planning, and creativity. AI enables technical systems to perceive their environment, process what they perceive, and solve problems in order to achieve a specific goal. AI systems can adjust their actions by analyzing the outcomes of previous actions and operate autonomously [1].
In past decades, software solutions that implemented predefined rule structures and decision trees in their algorithms were often described as AI. A simple example is a decision tree that applies the classification rules of Annex VIII of the EU Medical Device Regulation (MDR).
Today, however, such rule-based solutions are no longer classified as artificial intelligence, since they lack the ability to self-train and optimize independently. Under certain circumstances, they may still offer advantages over self-learning systems, particularly when applied to compliance tasks.
Since the introduction of ChatGPT by OpenAI, self-learning systems have gained widespread attention. These systems analyze existing datasets for learned patterns. They make decisions either by evaluating a situation based on available data (e.g., object recognition in medical imaging) or by deriving recommended actions from the data (e.g., therapy suggestions).
The terms AI and self-learning systems are currently used in an almost inflationary manner. In any serious collaboration, it is advisable to clarify at the outset what the parties involved actually mean by these terms and, if not already established, to create a shared understanding.
ChatGPT is a so-called Large Language Model (LLM) trained on vast amounts of text. Because of this training, it can produce answers to a wide range of questions in remarkably high linguistic quality. This alone often leads users to perceive its responses as plausible. When asked about potential applications of self-learning systems in compliance services for medical devices [1], ChatGPT suggests areas such as:
Risk assessment and management: Self-learning algorithms could be used to analyze and evaluate risks, ensuring that medical devices meet mandatory safety standards.
Quality control in manufacturing: Automated monitoring systems with self-learning algorithms could help identify quality issues during device production.
Document management and review: AI can be applied to check large volumes of documents for compliance and accuracy, improving the efficiency of compliance reviews.
Supplier management: Algorithms may support the effective monitoring of suppliers and their adherence to regulatory standards.
Monitoring the regulatory landscape: Self-learning systems could track evolving regulatory requirements and notify companies when compliance measures need adjustment.
Patient safety and incident reports: AI may analyze adverse events and patient feedback to verify compliance with safety standards.
Validation of medical device software: Self-learning algorithms could assist in validating software to ensure it meets regulatory requirements.
Post-market surveillance: AI can process large datasets from post-marketing activities to detect potential safety concerns at an early stage.
Internal training and compliance education: Self-learning algorithms could provide personalized training and monitor employee progress in following compliance guidelines.
Clinical trial data analysis: AI could support the evaluation of clinical trial data to ensure adherence to protocols and regulatory requirements.
Audit-trail analysis: Automated audit-trail monitoring using AI may help detect irregularities or security breaches in relation to medical devices.
Automated compliance reporting: Self-learning algorithms could streamline compliance reporting by reducing reliance on manual processes.
For pharmaceuticals, additional potential applications are listed [2]:
Patient safety and pharmacovigilance: AI could support the analysis of adverse events and side effects to ensure drug safety standards are met.
Validation of manufacturing processes: Self-learning algorithms may help validate pharmaceutical production processes to ensure regulatory compliance.
Clinical trial oversight: AI could assist in monitoring clinical trial data, ensuring that conduct and reporting follow regulatory requirements.
Product traceability: Algorithms can support end-to-end traceability of pharmaceutical products across the supply chain to ensure compliance with required standards.
Post-market drug safety monitoring: Self-learning systems could analyze real-time safety data to identify potential risks after market launch.
The following section discusses the opportunities, challenges, and risks of using self-learning systems in compliance services, illustrated through the example of risk management.
Risk management for pharmaceuticals and medical devices is essential to ensure both the safety and effectiveness of these products. The goal is to identify and assess key risks throughout the product lifecycle at an early development stage, and, where necessary, control them through appropriate risk-mitigation measures.
Opportunities for applying AI in risk management are primarily seen in the automated analysis of large datasets, for example, historical process data, related error information, or data collected across different projects. Expectations include that such datasets could help identify previously unknown risks, detect root causes and effects of failures more quickly and precisely, autonomously assess and classify risks, and significantly simplify risk management documentation through AI-driven text generation [4].
All parties involved in the lifecycle of a pharmaceutical or medical device—manufacturers, authorities, and users alike—expect risk management to reliably identify critical product risks and initiate measures to control them. This ensures the product’s safety, functionality, effectiveness, and reliability. To date, this requires that the individuals involved in risk management bring in-depth expertise and continuously refine their knowledge of products and processes. Risk management therefore becomes a backbone for documenting and transferring manufacturer-specific design and process know-how. It safeguards competitiveness and provides essential input for the development of new products.
If the opportunities outlined earlier are to be realized, the following challenges must be addressed (see Figure 1):
How can the training of AI models ensure that AI is able to efficiently support risk management and provide real added value for product and process design?
How can it be guaranteed that manufacturer-specific know-how remains confidential?
How can AI meet the expectation of reliably identifying previously unknown risks from large datasets?
How can misinformation, poor data quality, and inaccurate assessments be reliably identified when applying AI in risk management?
Figure 1: Challenges in applying self-learning systems to risk management (numbers correspond to the questions above)
On Point 1
One of the main challenges in training self-learning systems is the composition of the training data. This applies both to the required data volume to capture risk scenarios comprehensively, and to the representativeness and quality of information about risk scenarios. For example, risk assessments require a representative classification of data. This is only possible if the dataset includes representatives of all classes, reflecting their real frequency distribution, and if the data within each class is representative of the intended application.
Table 1 illustrates the requirements for data quality. The example represents the variation commonly encountered in practice when describing the same issue. The more general, vague, or ambiguous the information, the lower its specific informational value. The lower the informational value, the more likely AI-supported queries will generate generalized outputs of questionable usefulness for product-specific risk management. For this reason, before developing an AI application for risk management, requirements should be defined in a specification document and illustrated with representative examples. Such examples help derive data quality requirements and establish effective scenarios for validating whether AI is suitable for the intended purpose. A side effect of this process is that an early cost–benefit analysis of the AI application becomes possible.
Table 1: Example of information quality in failure chains (root cause → failure → consequence). The same issue is described with varying specificity.
On Point 2
Once expectations for the AI application are defined (see Point 1), handling sensitive and confidential company know-how becomes another major challenge. There is usually a strong interest in controlling how this knowledge is used and preventing its uncontrolled dissemination. The AI configuration employed (see also Figure 1) has a decisive influence on this. The more extensive the access required to global data pools, the higher the risk of unintended leakage of proprietary knowledge (see Table 2).
Table 2: Impact of AI configuration on the protection of company know-how.
On Point 3
The expectation that AI will identify previously unknown product or process risks is based on the assumption that self-learning systems can detect hidden patterns in datasets and place information into logical, previously unseen relationships. Unfortunately, Grünewald GmbH does not have reliable evidence of AI’s efficiency in identifying new risks compared to the structured approach of expert groups. Research by Frederic Vester in the 1990s on complex systems [5] suggests that expert-driven methods (such as FMEA) may still be considerably more effective than AI-based analysis of large datasets.
On Point 4
At present, reliably identifying misinformation, poor data quality, and faulty assessments in risk management is not technically feasible. Two main reasons are:
AI applications link information based on the frequency with which similar connections and patterns appear in the dataset. However, it is unclear at what frequency a result can be considered “true” or “validated.”
The outcome of an AI query depends heavily on the user’s expertise in formulating good prompts, as well as on the conversation history. Reference [6] even describes a case where ChatGPT’s answer length varied depending on the promised financial reward—hardly acceptable when reliability is required in compliance-related contexts.
Example:
When prompted with “Perform a risk analysis for 3D printing a personalized ceramic knee implant for a 70-year-old patient. List the three main risks and corresponding mitigation measures,” ChatGPT produced the following answer [7]:
Risk Analysis for 3D Printing a Personalized Ceramic Knee Implant:
Material quality and integrity:
Risk: Inadequate ceramic material quality could lead to fractures or structural problems.
Measures: Use high-quality, certified ceramics; monitor and test quality throughout the 3D-printing process.Durability and long-term performance:
Risk: Wear, fatigue, or material degradation could reduce implant functionality over time.
Measures: Conduct stress testing and simulations; ensure long-term follow-up and monitoring.Biocompatibility:
Risk: Immune reactions to the implant may cause inflammation or rejection.
Measures: Select biocompatible materials; perform preclinical tests to confirm compatibility.
While the answer may appear useful at first glance, it is clearly labeled a simplified representation. Moreover, the identified risks relate more to implant use than to the 3D-printing process itself. Only after refining the question to “Perform a risk analysis of the production process for 3D printing…” did ChatGPT list issues such as poor print quality, data loss in personalization, and inadequate cleaning and sterilization. Even then, the answers remained generic, and some risks (e.g., sterilization) are not directly part of the printing process but rather downstream steps. Criteria for selecting the “three main risks” remained unclear, even when asked explicitly.
Conclusion
As the discussion above shows, the quality of the AI configuration and the dataset is crucial for successful application in risk management. At present, proprietary AI configurations tailored to company-specific know-how appear to be the most promising approach (see Table 3). However, the effort required for implementation is considerable, and for small and medium-sized enterprises, the cost–benefit ratio is likely still unfavorable.
Table 3: Comparison of critical quality features across different AI configurations.
The development of AI applications for risk management coincides with a time in which the subjects of analysis are becoming increasingly complex, compliance requirements and documentation burdens continue to rise, and the specialists needed for risk analysis often have less and less time available. The reasons for this are diverse, ranging from outdated corporate processes, staff shortages, poor handling of compliance requirements, and a lack of willingness to assume responsibility, to insufficient methodological expertise, inefficient meeting culture, and excessive internal documentation demands.
Against this backdrop, the temptation is strong to compensate for the shortage of expertise by using AI in risk management. Marketing promises such as “AI-generated failure analysis: simply upload your design or process description and generate your functional safety analysis with our AI-supported process, the most efficient way to perform your FMEA” [8] highlight this trend.
However, considering the challenges described in Section 4.2, the following application risks are evident:
Unchecked compliance documents: AI-generated compliance documents may not be reviewed for accuracy or completeness due to time pressure. Instead, they are simply “approved” to meet documentation requirements, such as those defined in ISO 14971 for medical devices, quickly and with minimal effort.
Perceived plausibility: Because AI-generated documents often appear plausible, users may see no reason to:
have the content reviewed by the specialists responsible for it, or
question the “decision-making patterns” the AI used to compile the information.
False assumptions in approval processes: Approval authorities may assume that relevant specialists were involved and that the documentation was compiled in a transparent and traceable manner.
Lack of transparency: It is often not evident that the documents were generated by AI, leading to the mistaken belief that subject-matter experts contributed to the content.
All of these risks share a common danger: critical health-related risks may go unidentified and uncontrolled. Errors would then only surface in real-world application, potentially causing significant harm and leading to severe regulatory and economic consequences for the manufacturer.
In April 2023, Grünewald GmbH tested ChatGPT with nine questions from the pharmaceutical and MedTech fields to gain an initial impression of whether the AI could handle compliance-related queries efficiently and professionally. The test was repeated in January 2024.
Table 4: ChatGPT test results on compliance questions [9]
The summarized results, shown in Table 4, reveal the following characteristics:
The information provided is mostly general in nature, with limited relevance to specific cases.
Some information is incorrect and lacks traceability.
Recent developments are not reflected; ChatGPT’s knowledge base lags behind by roughly one year.
Notably, the results from January 2024 did not meet expectations of improvement. On the contrary, output quality appeared to decline.
The conclusions outlined in Section 4 regarding the use of AI in risk management are confirmed by these findings:
Global AI applications may be suitable for basic compliance-related queries to gain an initial overview. They can also serve as a starting point for drafting letters, statements, evaluation reports, or presentations. However, all outputs must be carefully reviewed for accuracy and completeness.
Professional use of AI in compliance requires specialized applications tailored to clearly defined subject areas. High-quality datasets are essential for training such systems. In addition, the AI application must be validated to provide evidence that requirements are met in a safe and traceable manner.
Creating a Requirements Specification for AI Applications
We support you in defining the fundamental expectations and requirements for an AI application in compliance. Together with representative examples, this forms the necessary basis for evaluating both the expected benefits and the cost-effectiveness of the AI solution. Where relevant, we also assess whether a new AI application may compete with existing database structures developed through rule-based compliance software.
Conducting a Gap Analysis of Training Data
The quality of training data directly determines the performance of any AI application. We assist in defining requirements for data quality and carry out a gap analysis. The resulting report outlines the concrete steps needed to achieve the required level of data quality.
Stress Testing AI Applications
Before an AI application can be validated for compliance tasks, it should be subjected to a stress test. We provide support in developing the test plan and executing the tests. The final report evaluates the maturity of the AI in relation to the requirements defined in the specification.
Validating AI Applications
An AI application in compliance should only be deployed once its suitability for the intended task has been demonstrated. We take care of the full validation process—creating the validation plan, conducting and evaluating tests, and preparing the validation report. The outcome is an inspection-ready validation documentation package.
Ref. | Titel |
[1] | |
[2] | „Nenne 12 Beispiele für Compliance Dienstleistungen im Lebenszyklus eines Medizinproduktes, bei denen die Dienstleistung durch selbstlernende Algorithmen erbracht wird oder effizient unterstützt wird.“ https://chat.openai.com/ ChatGPT Version 3.5, Abfrage vom 22.01.2024 |
[3] | „Nenne 12 Beispiele für Compliance Dienstleistungen im Lebenszyklus eines Pharmaproduktes, bei denen die Dienstleistung durch selbstlernende Algorithmen erbracht wird oder effizient unterstützt wird.“ https://chat.openai.com/ ChatGPT Version 3.5, Abfrage vom 22.01.2024 |
[4] | Eine Auswahl von Quellen: · https://www.techtag.de/startups/wie-die-industrie-jetzt-millionen-an-qualitaetskosten-spart/ · https://visuresolutions.com/de/Blog/KI-und-maschinelles-Lernen-f%C3%BCr-das-Risikomanagement/ |
[5] | F. Vester, Die Kunst vernetzt zu denken – Ideen und Werkzeuge für einen neuen Umgang mit Komplexität; Der neue Bericht an den Club of Rome, 1999 |
[6] | |
[7] | „Führe eine Risikoanalyse für das 3D-Drucken eines personalisierten Knieimplantates aus Keramik für einen 70-jährigen Patienten durch. Nenne die 3 wesentlichen Risiken und die zugehörigen Maßnahmen zur Risikobeherrschung.“ ChatGPT Version 3.5, Abfrage vom 30.01.2024 |
[8] | Abfrage am 30.01.2024 |
[9] | Interner Ergebnisbericht der Grünewald GmbH zum Test von ChatGPT in Compliancethemen |
Medical care facilities such as physician practices and hospitals are generally classified as critical infrastructure, which means they must meet particularly high requirements for the resilience of their IT systems.
The standard DIN EN ISO/IEC 27001:2017 defines the requirements for Information Security Management Systems (ISMS) and establishes the recognized state of the art.
In Germany, for example, the requirements of § 8a “IT Security of Critical Infrastructures” of the BSI Act were implemented in the sector-specific security standard (B3S) for hospital care. According to the German Social Code (SGB V), since January 1, 2022, hospitals have been legally required to implement IT security in line with the state of the art (§ 75c SGB V).
Similarly, data protection requirements must be given high priority. These include compliance with the EU General Data Protection Regulation (GDPR, EU 2016/679) and national laws for the protection of electronic patient data within the telematics infrastructure, such as Germany’s Patient Data Protection Act (PDSG) or the U.S. Health Insurance Portability and Accountability Act (HIPAA).
A central aspect of operator responsibility is cybersecurity in the IT integration of medical devices, as described in DIN EN 80001-1:2011 “Application of Risk Management for IT-Networks Incorporating Medical Devices – Part 1: Roles, Responsibilities and Activities” and the draft E DIN EN 80001-1:2018.
A prerequisite for this is that customers and operators receive the necessary cybersecurity information about medical devices. Cybersecurity thus becomes a procurement criterion and a relevant factor for downstream marketing and sales.
The Manufacturer Disclosure Statement for Medical Device Security (MDS2) facilitates the exchange of all (cyber)security-related features and functions of medical devices in a standardized, machine-readable format.
MDS2 is expected to establish itself as a global standard and should therefore already be integrated into R&D, marketing, and sales today. |> Expertenkreis CyberMed
In the first part of our [insights] series Connectivity and Cybersecurity in Medical Devices in the MDR & IVDR Era (Part 1: Stakeholder Requirements), we outlined the key user and business cases for medical device connectivity.|> Link
Displaying the availability of a new software version on the GUI of a PEMS and implementing automated software distribution may appear to be obvious requirements in the incremental and agile development of medical devices. From a risk management perspective, downloads can often take place during regular operation. The actual execution of updates, however, is typically tied to a service mode, in which the user confirms that no patient is connected.
The obligation to provide user training for new functions, as outlined for example in Annex 1 of the German Medical Devices Operator Ordinance (MPBetreibV), can be addressed in a straightforward way—for instance, by integrating a video link accessible via QR code.
In the following sections, we will highlight less obvious aspects of PEMS architecture that are particularly relevant for cybersecurity in medical devices.
Authors and Feedback
Dipl.-Ing. Martin Zierau is Quality Management Officer and Senior Compliance Specialist/Consultant at Grünewald GmbH. The content of this blog post was critically reviewed by Georg Bunzel, Head of Software Engineering at ATEM SD GmbH in Remscheid.
We welcome your feedback at martin.zierau@gruenewald-gmbh.de.
GRÜNEWALD Consulting & Services on the Topic
The GRÜNEWALD team offers comprehensive consulting and hands-on services in Lean Innovation & Requirements Engineering as well as in the agile specification, development, and approval of medical devices in line with MDR, IVDR, and FDA requirements.
An overview of our services can be found on the page Support for Development and Approval of Medical Devices & R&D on our website.