7.4 Data Compliance and Privacy Protection Driven by AI

Data Boundaries: Challenges of Data Compliance and Privacy Protection in the Age of AI

Artificial intelligence (AI) operates much like a high-performance precision engine needing a continuous supply of quality fuel; its most crucial “fuel” is data. Whether training complex machine learning models to “learn” subtle patterns and make accurate predictions, or driving various AI applications to provide users with personalized, intelligent, and contextualized services, none can function without the collection, storage, processing, analysis, and utilization of massive, multi-dimensional data. It can be said that data is the cornerstone and lifeblood of artificial intelligence.

However, when the data processed by AI systems involves various types of information that can, alone or in combination with other information, identify a specific natural person or reflect their activities—i.e., personal information (or personal data)—the application of AI must operate within an increasingly strict, detailed, and legally binding framework. The core of this framework is data compliance and personal information protection.

With the global awakening of citizens’ privacy awareness and the enactment and enforcement of high-standard, stringent, and heavily penalized regional or national data protection laws—represented by the EU’s General Data Protection Regulation (GDPR), the US California Consumer Privacy Act (CCPA) and its amendment (CPRA), and China’s highly influential Cybersecurity Law, Data Security Law, and Personal Information Protection Law (PIPL)—the challenge of how to fully leverage the data value inherent in AI technology while ensuring that personal information processing activities always meet fundamental requirements like lawfulness, fairness, necessity, integrity, transparency, accuracy, and security, has become the most central and severe legal compliance challenge facing all organizations developing, deploying, operating, and using AI systems. This especially includes legal service providers like law firms and corporate legal departments, which inevitably handle large amounts of personal information related to clients, employees, and even opposing parties in their daily work.

This section will delve into the main legal requirements, core risk challenges, and key considerations that need close attention in legal practice concerning data compliance and personal information protection in scenarios where AI technology (particularly data-driven machine learning and large language models) is widely applied.

1. Obtaining a Lawful Basis for Processing Personal Information: The “Entry Permit” for AI Applications

Before any AI application begins to collect, process, or utilize personal information in any way, the foremost and most fundamental compliance requirement is to ensure that the processing activity has a clear, valid, and legally defensible lawful basis (or legal basis for processing). Without a lawful basis, all subsequent processing activities, no matter how advanced the technology or “benevolent” the purpose, are non-compliant from the start, akin to driving without a license, carrying immense risk.

Core Principle: “Informed Consent” - The Most Common but Challenging Basis:
- Legal Status & Core Requirements: The “informed consent” principle is the primary and most emphasized lawful basis for processing personal information established by most modern data protection laws worldwide (including GDPR Article 6(1)(a) and PIPL Article 13(1)(i)). Its core requirement involves two interconnected steps:
  1. Duty to Inform: Before collecting personal information (or within a reasonable time after processing begins in certain exceptions), the data controller/processor must provide the data subject with all legally required information about the processing activity proactively, in a conspicuous manner, using clear and plain language (considering the subject’s comprehension, avoiding jargon). According to PIPL Article 17 and similar provisions, this typically includes:
    - The processor’s name and contact details.
    - The purpose(s) of processing (must be specific, explicit, not vague), manner of processing (collection, storage, use, etc.).
    - The categories of personal information processed.
    - The retention period (minimum necessary for the purpose).
    - The data subject’s rights under the law (access, rectification, erasure, withdrawal of consent, portability, etc.) and the methods for exercising them.
    - (If applicable) Name/contact of recipients if data is shared, their purpose, manner, and data categories.
    - (If applicable) Information regarding cross-border transfers (recipient details, purpose, manner, data types, rights exercise channels, risks, safeguards).
    - Potential security risks and main security measures taken.
  2. Obtaining Valid Consent: After providing sufficient information, the processor must obtain the data subject’s specific, unambiguous indication of agreement, given freely and based on adequate information. Valid consent must meet several conditions:
    - Must be an affirmative act: Consent cannot be implied or default. Methods like pre-ticked boxes, or inferring consent from silence, inaction, or continued use of service are generally invalid. Users must actively indicate consent through actions like checking a box, clicking an ‘agree’ button, or other clear affirmative behavior.
    - Must be specific and purpose-bound: Consent cannot be “bundled” or “all-or-nothing.” Requiring consent for processing non-essential data or for multiple distinct purposes (e.g., core service delivery + personalized ads + model training) as a condition for accessing the core service is generally prohibited. Separate consent should ideally be sought for different purposes.
    - Must be freely withdrawable: Individuals must be provided with easy and accessible ways (no more difficult than giving consent) to withdraw their consent at any time. The processor must cease processing based on that consent immediately upon withdrawal (without affecting prior lawful processing) and delete the data upon request if no other lawful basis exists.
    - “Separate Consent” required for specific high-risk processing: For higher-risk activities significantly impacting individual rights, many laws (notably China’s PIPL) mandate “separate consent.” This means for processing sensitive personal information, providing personal information to other processors, making personal information public, using it for fully automated decision-making with significant impact, or transferring it cross-border, processors cannot rely on bundled consent in general privacy policies. They must obtain distinct, explicit consent specifically for each of these processing activities, often through pop-ups, dedicated consent pages, or similar prominent methods.
- Specific Challenges in AI Application Scenarios: Strictly applying the “informed consent” principle to complex AI applications faces numerous challenges:
  - Transparency Difficulty in “Sufficient Notice”: The internal workings, data processing logic, and specific factors influencing predictions or decisions of AI systems (especially complex deep learning models like LLMs, recommendation algorithms) are often highly complex and opaque (“black box” problem). This makes it extremely difficult to explain clearly, accurately, and understandably to ordinary users precisely how their personal data will be processed, which specific algorithms and decision processes it will feed into, and what concrete impacts these activities might have on them. Overly technical explanations are incomprehensible; overly simplified ones may not be “sufficient.”
  - Potential Conflict between “Purpose Specification” and AI Model Training/Iteration Needs: AI applications (especially general-purpose large models or platform-level applications needing continuous learning and improvement) often have multiple potential, sometimes not fully defined future uses at the time of data collection (e.g., vague claims of collecting data “to improve our models,” “optimize algorithms,” “enhance user experience”). This purpose ambiguity and potential expansion creates inherent, deep tension with the data protection principle requiring processing purposes to be specific, explicit at collection, and subsequent processing not exceeding the initial scope. Ensuring AI data processing (especially using user data for retraining or developing new features) stays within the bounds of original consent, or obtaining valid new consent for expanded uses, is a key ongoing compliance challenge.
  - Complexity and Cost of Consent Recording and Management: For AI applications involving long-term, continuous data processing, constant model iteration, and vast user bases, accurately recording when each user consented to which specific purposes and data types, whether their consent remains valid, and timely responding to withdrawal requests, while technically and procedurally ensuring all subsequent processing strictly adheres to the user’s latest preferences, can require extremely complex and costly consent management systems and internal processes.

While “informed consent” is the primary and most rights-respecting lawful basis, note that laws provide other potential bases (see below). However, in practice, especially for commercial AI applications (e.g., using user data for personalization, targeted ads, commercial model training), attempting to completely bypass “consent” by relying solely on alternative bases (particularly for non-essential data or sensitive information) often carries very high legal compliance and user trust risks. Regulators and courts typically interpret the conditions for alternative bases strictly. Therefore, when considering not relying on consent, extremely cautious, case-by-case legal justification is required to ensure the chosen alternative basis is fully and lawfully applicable in the specific scenario.

Other Potential Lawful Bases and Their Limitations in AI Scenarios: Besides “consent,” PIPL Article 13 (and similar provisions like GDPR Article 6) typically lists other grounds for processing personal information without consent under specific conditions. However, their application is usually more restricted and not suitable for all AI scenarios:
- Necessary for Concluding or Performing a Contract to which the Individual is a Party: Requires the processing to be objectively and directly necessary to fulfill the core purpose of that contract. E.g., an e-commerce platform processing shipping address/contact info for delivery. But is using browsing/purchase history to train a personalized recommendation AI also “necessary for performing the e-commerce service contract”? This is often highly debatable, as personalization is usually seen as an ancillary, not core, service. AI applications need to carefully justify the direct necessity of their data processing for the core contract purpose.
- Necessary for Performing Statutory Duties or Obligations: E.g., financial institutions conducting KYC/AML checks as required by law; employers processing employee social security info per labor law. If laws explicitly mandate a specific AI application (e.g., government using AI for required statistical analysis), this basis applies. Cannot be broadly interpreted.
- Necessary for Responding to Public Health Emergencies or Protecting Natural Persons’ Life, Health, and Property Safety in Emergencies: Applies to very specific, urgent situations, e.g., using AI to analyze population movement data during a pandemic for contact tracing (still subject to necessity and security requirements).
- Processing Personal Information within a Reasonable Scope to Carry Out News Reporting, Public Opinion Supervision, etc., for Public Interest: Mainly applies to media organizations, requires processing within a “reasonable scope,” and usually needs balancing public interest against individual rights. AI assisting news writing or public opinion analysis might touch on this, but boundaries must be carefully observed.
- Processing Personal Information Already Disclosed by the Individuals Themselves or Otherwise Lawfully Disclosed, within a Reasonable Scope according to PIPL: Allows processing of lawfully public information, but must not have a major impact on individual rights, and processing should cease if the individual explicitly refuses. Training AI models on public data might partially rely on this, but interpretations of “reasonable scope” and “major impact” remain crucial and need clarification.
- Other Circumstances Stipulated by Laws and Administrative Regulations: A catch-all provision for future legislation.
- (Absence of GDPR’s “Legitimate Interests” Basis in Chinese Law): Notably, GDPR provides a relatively flexible but controversial basis: “processing is necessary for the purposes of the legitimate interests pursued by the controller or by a third party, except where such interests are overridden by the interests or fundamental rights and freedoms of the data subject” (GDPR Art. 6(1)(f)), requiring a strict, case-by-case three-part balancing test (LIA). China’s PIPL did not explicitly adopt a directly corresponding “legitimate interests” basis. While some PIPL provisions (like processing public info) reflect similar balancing ideas, one cannot simply apply GDPR’s legitimate interest analysis directly under Chinese law. In China, for many commercial AI data processing activities (especially those not necessary for core services), obtaining valid user consent typically remains the primary and safest compliance path.

2. Core Data Processing Principles: Seven “Red Lines” Throughout the AI Application Lifecycle

Regardless of the lawful basis, all AI application processing activities involving personal information must strictly adhere to a set of core principles that apply throughout the entire data lifecycle, from collection to deletion. These principles form the basic framework and behavioral guidelines for data compliance; violating any can lead to compliance risks. These principles (mainly referencing PIPL Articles 5-9, highly aligned with GDPR Article 5 core principles) include:

1. Purpose Limitation & Lawfulness, Fairness, Transparency

Core Requirement: The purpose for processing personal information must be lawful, legitimate, necessary, and clearly, specifically communicated to the individual at collection (no vague terms like “to improve service quality”). The manner of processing must be directly related to the stated purpose and should be the least intrusive means necessary. Subsequent processing must not exceed the scope of the purpose initially disclosed and consented to (or covered by another lawful basis).
Meaning for AI Applications: Requires clearly defining the specific, lawful purpose for each data processing activity in AI applications (training, inference, analysis) and ensuring this purpose can be transparently communicated to users. Be vigilant against “Purpose Creep”—using data collected for one purpose for unrelated purposes without authorization (e.g., using core service data to train a separate advertising model).

2. Data Minimization

Core Requirement: Collect and process personal information strictly limited to the minimum scope necessary to achieve the stated processing purpose. Do not collect excessive personal information unrelated to the purpose or beyond necessity. The retention period should also be the shortest time necessary for the purpose (see point 5).
Major Challenge for AI: Modern AI (esp. deep learning) is often “data-hungry,” with performance typically improving with more training data. This technical characteristic creates deep, inherent tension with the legal principle of data minimization. AI developers and users must be able to clearly justify with evidence that every piece of data, every feature variable collected and processed is objectively necessary to achieve the specific, lawful AI function or goal claimed (e.g., providing a specific intelligent service, reaching an acceptable performance/accuracy level), and that no less intrusive alternative data or method exists. The mindset of “collecting just in case it might be useful later” or “more data is always better for the model” must be resolutely avoided. Implementing data minimization often requires careful trade-offs between model performance and privacy protection.

3. Transparency

Core Requirement: Data processors must publish their rules regarding personal information processing (usually via a Privacy Policy / Privacy Notice) in a clear, accurate, complete, easily accessible, and understandable manner. This policy must fully disclose all legally required key information: purposes, means, scope, duration, security measures, potential sharing/transfers, data subject rights, and how to exercise them. Changes to the rules must also be communicated.
Specific Challenge for AI: AI systems (especially complex “black box” models like LLMs, deep recommendation systems, sophisticated risk models) pose huge challenges to achieving genuine transparency. How to explain clearly, accurately, and without causing misunderstanding to ordinary users (lacking technical expertise) how these complex algorithms work? Which specific personal information features they utilize? How these features influence the final decisions or recommendations? And what potential risks or uncertainties this process entails? This is an extremely difficult communication and design problem. Simply providing lengthy, jargon-filled privacy policies is insufficient. Exploring more effective, intuitive, multi-layered information presentation methods (e.g., layered notices, visual explanation tools, interactive Q&A) is needed to truly empower users’ right to know.

4. Accuracy & Completeness

Core Requirement: Processors should take reasonable steps to ensure the personal information they process is accurate and complete, and should rectify or supplement it promptly upon the data subject’s request or based on factual circumstances.
Importance for AI: Data accuracy and completeness are critical for AI system performance and reliability. The “Garbage In, Garbage Out” (GIGO) principle is particularly salient in AI. If the input data used for training AI models or as the basis for real-time AI decisions is itself erroneous, inaccurate, incomplete, or outdated, then the resulting AI output (prediction, classification, risk score, generated content) will inevitably be unreliable, potentially causing severe unfairness or harm to individuals (e.g., denying loans based on wrong credit history; giving incorrect diagnostic suggestions based on inaccurate health data). Therefore, ensuring input data quality from the source and establishing mechanisms for users to easily correct their inaccurate information are crucial for the effectiveness, fairness, and responsibility of AI applications.

5. Storage Limitation / Retention Period

Core Requirement: The retention period for personal information should be the shortest time necessary to fulfill the purpose(s) for which it was initially collected. Once the purpose is achieved, cannot be achieved, or processing is no longer necessary, unless laws require a mandatory longer minimum retention period (e.g., certain financial records, accounting vouchers, evidence related to statutes of limitation), the processor must proactively and promptly delete the personal information held, or render it effectively anonymous (so it can no longer identify individuals and cannot be reversed). Indefinitely retaining personal information unnecessarily, based on vague reasons like “might be useful in the future” or “maybe for training the next model,” is prohibited.
Challenge for AI Training & Model Iteration: Training AI models (esp. large foundation models) often requires extremely large historical datasets. Furthermore, for continuous optimization, iteration, or future development of new features, AI developers and platforms often have strong incentives to retain training data and user interaction data for as long as possible, potentially indefinitely. This incentive creates a clear and profound conflict with the legal principle of storage limitation. Organizations must be able to provide well-reasoned justifications, based on specific processing purposes, for the retention periods set for each category of personal information. They also need to establish strict, automated (if possible) data deletion or anonymization mechanisms upon expiry, subject to oversight and audit. Technical convenience or potential commercial value cannot justify indefinite hoarding of personal information.

6. Ensuring Data Security (Integrity and Confidentiality / Security) - (Core requirements detailed in Sec 6.2)

Core Requirement: Data processors must fulfill their legal data security obligations, taking all necessary and appropriate technical measures (encryption, pseudonymization, access control, security audits, vulnerability management, disaster recovery, etc.) and organizational measures (internal security policies, designated security roles, employee training, incident response drills, etc.) to safeguard the confidentiality (preventing unauthorized access/disclosure), integrity (preventing unlawful alteration/destruction), and availability (ensuring lawful access when needed) of the personal information they process. Effective measures must be taken to prevent and respond to security risks like data breaches, tampering, loss, destruction, or unlawful use.
High Importance in AI Context: As AI systems often process larger volumes, more dimensions, and potentially more sensitive data, and their system architectures can be more complex with more internal/external interactions, securing their data becomes even more critical and challenging. AI application security needs to cover the entire lifecycle and technology stack: from secure management of training data, to protecting the model itself (especially parameters) from theft/tampering, to hardening the application systems (APIs, front-ends), and ensuring overall security of the operating environment (local or cloud). A lapse in any link can lead to severe security consequences.

7. Accountability

Core Requirement: Data processors are ultimately responsible for all their personal information processing activities and must be able to demonstrate that their activities consistently comply with legal requirements and the core principles outlined above.
Practical Meaning for AI Applications: Merely “claiming” compliance is insufficient. Organizations must be able to prove compliance through actions and documentation. This requires processors to:
- Establish comprehensive internal personal information protection management systems and procedures.
- Designate responsible departments and personnel for data protection.
- Conduct regular data protection training for employees.
- Proactively conduct risk assessments (e.g., PIAs/DPIAs) before processing.
- Implement effective technical and organizational security measures.
- Establish processes and mechanisms to respond to data subject rights requests.
- Develop data security incident response plans.
- Maintain necessary processing activity records and audit logs.
- Be able to effectively demonstrate and prove fulfillment of legal obligations when required (e.g., during regulatory investigations, handling user complaints, or in litigation). Accountability is the foundational guarantee ensuring all other principles are effectively implemented.

3. Processing Sensitive Personal Information: Treading Carefully in Legal and Ethical “High-Voltage Zones” in AI Applications

When AI applications need to collect, process, or utilize sensitive personal information (or special categories of personal data)—data which, if leaked or misused, could easily cause serious harm to individuals’ fundamental rights (especially dignity) or significant personal/property safety—the legal compliance requirements become much stricter, and potential risks escalate dramatically. This is akin to entering a “high-voltage zone” in data compliance and privacy protection that demands special caution.

Defining Sensitive Personal Information (Ref: PIPL Art. 28 & GDPR Art. 9): Specific definitions vary slightly by jurisdiction, but the core idea is to identify inherently high-risk data types. Under China’s PIPL, sensitive personal information explicitly includes:
- Biometric information: e.g., facial images/recognition features, fingerprints, palm prints, iris scans, voiceprints, gait recognition features. (Unique, permanent, unchangeable; high risk if compromised).
- Religious beliefs.
- Specific identity: e.g., (though not exhaustively listed, generally understood to include) race or ethnic origin, political opinions, trade union membership.
- Medical health information: e.g., medical records, genetic data, physical exam results, medication history, mental health status.
- Financial account information: e.g., bank account numbers, payment account details, transaction passwords, credit records, detailed income/expenditure data.
- Location tracking information: e.g., precise or approximate movement data collected over time via GPS, cell towers, WiFi, cameras.
- Personal information of minors under the age of 14: Given their vulnerability, their data receives special protection, generally treated as sensitive.
Stricter Compliance Requirements for Processing Sensitive Personal Information: Besides meeting all basic principles, processing sensitive personal information typically requires adhering to these additional, stricter conditions under PIPL (and similar laws like GDPR):
- Must Have a “Specific Purpose” and “Sufficient Necessity”: Processing must be for a very specific, clearly defined, lawful purpose, and absolutely necessary to achieve that purpose, with no less intrusive alternatives. Cannot collect sensitive data just because it “might be useful.” Necessity requires stronger justification.
- “Separate Consent” Generally Required: PIPL explicitly requires obtaining the individual’s “separate consent” for processing sensitive personal information. (Very limited exceptions might exist, e.g., vital interests during public health emergencies, interpreted strictly). This means consent cannot be bundled; specific, clear notice and distinct, explicit consent must be obtained for each instance of sensitive data processing, often via pop-ups or dedicated screens.
- (If involving minors’ info) Consent from Guardian Required: Processing data of minors under 14 must have explicit consent from a parent or guardian.
- Prior Personal Information Protection Impact Assessment (PIA) Mandatory: PIPL Article 55 mandates conducting a PIA before processing sensitive personal information. PIA systematically assesses risks to individual rights and verifies the effectiveness of protection measures. Reports must be kept for at least three years.
- Higher Level of Security Measures Required: Laws (like PIPL Art. 51) and standards implicitly or explicitly require stricter, enhanced technical and organizational security measures for sensitive data compared to general personal information, e.g., stronger encryption, tighter access controls, more frequent audits, specialized security training.
- Enhanced Notification Duties: Before obtaining separate consent, must additionally inform the individual about the necessity of processing the sensitive information and the specific potential impact on their rights and interests.
Typical High-Risk Scenarios Involving Sensitive Personal Information in AI Applications: Many cutting-edge, high-potential AI applications inherently require processing sensitive data, placing them squarely in the compliance “high-voltage zone”:
- All AI applications based on biometric technology: E.g., facial recognition (for verification, surveillance, even emotion analysis), gait recognition, voiceprint identification, iris scanning.
- AI-driven medical diagnosis, health management & genetic analysis: E.g., AI analyzing medical images, electronic health records, genetic sequencing data for diagnosis, risk prediction, personalized treatment plans, online health consultations.
- AI in financial risk control & precise profiling: E.g., AI for high-precision credit scoring (potentially using detailed financial account info), fraud detection models (analyzing transaction behavior, biometrics), or deep user profiling based on multi-dimensional data to infer financial status or risk appetite.
- AI applications requiring precise location information: E.g., autonomous vehicles (needing real-time high-precision location), intelligent traffic management, or certain Location-Based Services (LBS) if collecting detailed movement tracks.
- Any AI-driven application specifically targeting minors under 14 and potentially collecting their personal information (e.g., smart tutoring apps, children’s entertainment/social platforms).
- Applications using AI for emotion recognition or psychological state analysis: (Scientific validity often questioned, high ethical risks) If attempting to infer user emotions, stress levels, or potential mental health issues by analyzing facial expressions, voice tone, text content, or physiological signals.
Special Considerations for the Legal Industry Handling Sensitive Information & Using AI: Law firms, corporate legal departments, and judicial bodies inevitably handle vast amounts of various types of personal information, including large quantities of sensitive personal information (e.g., details of property division/emotional privacy/domestic violence evidence in divorce cases; detailed medical records/disability reports in personal injury cases; criminal records/victim privacy/minor information in criminal cases; employee salary/health/performance data in labor disputes; core technical personnel info in trade secret cases; etc.).
- Therefore, when considering using any AI tool to assist in processing this case-related data (for document review, information extraction, research, communication), legal service providers must give the highest level of attention and strictest protection to the sensitive personal information potentially involved.
- Clear internal processes and technical means are needed to effectively identify, flag, and (where possible) segregate sensitive information.
- Any AI tool planned for processing sensitive personal information (whether developed in-house or procured externally) must undergo the most rigorous security review and Personal Information Protection Impact Assessment (PIA) to ensure its technical and organizational measures fully meet all legal and compliance requirements for handling sensitive data (including obtaining necessary separate consent, if applicable).
- When in doubt about ensuring full security and compliance, it is better to refrain from using AI for sensitive information than to take any risks.

4. Compliance of Automated Decision-Making and Protection of User Rights: Ensuring Algorithmic Fairness, Transparency, and Accountability

AI, particularly machine learning models, is increasingly used for Automated Decision-Making. This typically refers to relying solely or significantly on machine algorithms to automatically analyze personal data, evaluate certain aspects of an individual (e.g., credit risk, job performance, behavior, interests, health status), and make decisions based on this analysis that produce legal effects or similarly significant impacts on the individual (e.g., denying a loan, giving a performance rating, pushing specific content/services, or even assisting judicial/administrative decisions).

While automation offers unprecedented efficiency and consistency (assuming unbiased algorithms), it also raises deep societal concerns about lack of transparency (“algorithmic black box”), potential unfairness (“algorithmic discrimination”), and individuals losing control over decisions affecting their lives without recourse. Modern data protection laws (like GDPR Article 22, PIPL Article 24) therefore usually impose specific, stricter rules on automated decision-making, aiming to safeguard individual rights and provide checks on algorithmic power.

Key Legal Regulatory Points & Individual Rights Protection:
- Ensuring Transparency & Right to Explanation:
  - Laws generally require processors using automated decision-making involving personal information to ensure considerable transparency. This means informing individuals about the existence of automated decision-making, the main logic or factors involved (even if full algorithm details aren’t disclosed), and the significant effects the decision may have on them.
  - Data subjects typically have the right to request an explanation from the processor regarding automated decisions affecting their rights. This implies processors need some capability to articulate, in an understandable way, the basic rationale and primary influencing factors behind their algorithmic decisions, rather than simply claiming “it’s a black box.”
- Providing the Right to Object to Solely Automated Decisions:
  - Many laws (notably GDPR and PIPL) grant individuals an important right to object to decisions based solely on automated processing (i.e., without any human intervention) which produce legal effects or similarly significantly affect them. (E.g., a fully automated decision leading to loss of a job opportunity, denial of essential credit, rejection from critical social benefits).
  - This right is not absolute in all cases. Laws usually provide exceptions, e.g., if the decision is necessary for entering into or performing a contract (like auto-calculating shipping fees), based on the individual’s prior explicit consent, or authorized by specific laws (like automated traffic violation enforcement). However, even in these exceptions, processors usually still need to safeguard other rights (like the right to human intervention below).
- Ensuring the Right to Human Intervention:
  - In critical automated decision-making scenarios, even if legally permissible under exceptions, laws (like GDPR explicitly, PIPL implicitly) often require providing individuals affected by the decision with access to human intervention. This means individuals have the right to request a review of the purely algorithmic decision by a qualified human employee, the right to express their point of view and reasoning, and the right to contest the decision and demand a human review (appeal). This right is a crucial safeguard against “algorithmic tyranny,” ensuring final decisions still involve human judgment and accountability.
- Explicit Prohibition of Unreasonable Differential Treatment:
  - China’s PIPL (Article 24) specifically addresses the widely criticized phenomenon of “big data price discrimination”. It mandates that when using automated decision-making for information push or commercial marketing, processors must simultaneously provide options not targeted to personal characteristics or convenient ways to refuse. More importantly, it stipulates that when using automated decision-making to determine transaction prices or other terms, it must be fair and just, and must not impose unreasonable differential treatment on individuals. This is a direct legal prohibition against discriminatory pricing or service provision via algorithms.
Profound Implications for AI Application Design, Deployment, and Governance: These legal requirements have critical design and compliance implications for all applications planning to use AI for automated (or significantly assisted) decision-making:
- Creates Hard Requirement for Model Explainability (XAI): To fulfill the legal obligation of providing explanations for decisions, the relevant AI models (even complex “black boxes”) must possess some degree of explainability. Developers need to invest in researching and applying various XAI techniques (e.g., LIME, SHAP, rule extraction, surrogate models) to, at minimum, identify and communicate to the user the main input features or factors that influenced a specific decision and roughly how they did so. Systems unable to provide any meaningful explanation face huge compliance risks.
- Must Design and Embed Effective Human Review & Appeal Processes: For all automated decision systems potentially having significant impacts on individuals, mechanisms must be built into the system design and business processes allowing qualified human employees (e.g., credit managers, HR specialists, senior customer service reps) to review, intervene in, modify, or even override the automated results. Furthermore, clear, accessible, effective appeal channels and procedures for requesting human review must be provided to individuals adversely affected by automated decisions. This human element is the ultimate risk control and rights protection valve.
- Requires Continuous, Rigorous Fairness Auditing & Monitoring: To ensure automated decisions are fair, just, non-discriminatory, and meet requirements against unreasonable differential treatment, organizations need to establish mechanisms for regularly and systematically auditing the relevant AI decision systems for algorithmic bias and fairness. This requires collecting necessary data (compliantly), using multiple fairness metrics for evaluation, identifying potential disparate impacts on different groups, and taking timely corrective actions.
Application Prospects & Limits in Legal Services & Judicial Scenarios:
- Direct Automated Legal Judgment Highly Unlikely: Currently, using AI to directly make final, legally binding judicial decisions determining parties’ rights/obligations or imposing criminal sentences (e.g., an “AI Judge” automatically deciding simple civil cases or sentencing) is impermissible in most jurisdictions (including China) and faces nearly insurmountable ethical and legal hurdles (lacking human judge’s independent judgment, value weighing, due process guarantees, accountability).
- Strict Application in Decision Support Scenarios: However, AI can be, and increasingly is being explored for, assisting human judicial officers (judges, prosecutors) in their decision-making process (e.g., providing sentencing references, recidivism risk scores, analogous case suggestions with relevance scores, evidence correlation hints). In such “decision support” scenarios, the aforementioned legal principles governing automated decision-making (esp. transparency, explanation, human oversight, fairness) still apply, perhaps even more stringently. It must be ensured that AI “suggestions” or “scores” do not improperly or excessively influence or replace the judicial officer’s final human decision based on the full case record and independent deliberation. Beware of the potential impact of “automation bias” on judicial judgment.
- Lawyers Advising Clients on Automated Decision Compliance: In the commercial sector (e.g., financial institutions using AI for loan approvals; insurers for risk assessment/pricing; large employers for hiring/performance management; internet platforms for content recommendation/user management), these automated decision activities must strictly comply with relevant laws. A key role for legal professionals (data compliance lawyers, corporate counsel) is to deeply understand and help clients navigate and comply with these complex requirements, design compliant processes, and manage related risks.

5. The Compliance Maze of Cross-Border Data Transfers: Navigating Complex Rules for Globalized AI Applications

The training, optimization, deployment, and service delivery of modern AI models often involve a highly globalized collaborative process. For example, an AI model might be trained by a US company using global data, deployed on cloud servers in Europe or Asia, and offered via APIs to users worldwide, including China. This transnational technical architecture and business model makes cross-border data transfers the norm for many advanced AI applications to function.

However, due to considerations of national security, cyber sovereignty, economic interests, and ensuring adequate protection for citizens’ personal information abroad, countries worldwide (especially China, the EU, and increasingly others) have established increasingly strict, complex, and distinct regulatory rules and approval/filing mechanisms for the export of critical data and personal information. This presents huge, sometimes business-impeding compliance challenges for AI applications (both developers and users) needing to transfer data across borders.

Main Regulatory Pathways & Compliance Requirements (Focusing on China’s PIPL, with comparisons to GDPR etc.):
- In China, if an AI application or its operator (personal information processor) needs to transfer personal information collected and generated within mainland China to recipients outside the PRC (whether to overseas affiliates, third-party service providers, or just using overseas servers for storage/processing), according to PIPL Article 38, they must first satisfy one of the following four statutory preconditions for the data to be lawfully exported:
  1. Passing a Security Assessment organized by the Cyberspace Administration of China (CAC): This pathway has the highest threshold, strictest scrutiny, and most complex procedure. According to the “Measures for Security Assessment of Data Exports,” this is mandatory in several situations:
    - Processors who are Critical Information Infrastructure Operators (CIIOs) exporting personal information or important data. (Scope of CIIOs determined by relevant authorities).
    - Processors exporting Important Data. (“Important Data” identification criteria and catalogs are being developed by regions/sectors; generally refers to data whose compromise could harm national security, economy, social stability, public health/safety).
    - Processors handling personal information reaching quantity thresholds set by CAC. Currently: processors handling personal information of over 1 million individuals cumulatively since Jan 1st of the previous year; OR processors exporting personal information of over 100,000 individuals OR sensitive personal information of over 10,000 individuals cumulatively since Jan 1st of the previous year.
    - Other situations requiring security assessment. This requires submitting a detailed self-assessment report and related materials to the CAC (usually via provincial CAC offices) and undergoing a substantive, comprehensive security and compliance review.
  2. Obtaining Personal Information Protection Certification from a professional institution according to CAC provisions: For processors not falling under the mandatory security assessment scope (e.g., handling smaller data volumes), they can choose to obtain personal information protection certification from a nationally recognized, designated professional body as the lawful basis for export. Relevant certification rules and bodies are gradually being established. This may offer a relatively standardized path for certain types of transfers.
  3. Concluding a Standard Contract formulated by the CAC with the overseas recipient: This is currently the primary, relatively convenient, and commonly used path for most SMEs or general business scenarios not triggering mandatory assessment and not involving CIIOs or important data export. The “Measures on the Standard Contract for Export of Personal Information” have been issued, providing a template standard contract. Processors need to:
    - Fully and accurately sign this official standard contract with the overseas recipient (substantive changes generally not allowed).
    - Before signing, conduct a Personal Information Protection Impact Assessment (PIA), focusing on the legal environment of the recipient’s country, recipient’s security capabilities, type/scale/sensitivity of data, ensuring risks are manageable.
    - Within 10 working days after the standard contract takes effect, file it with the provincial CAC office (along with the PIA report). Filing is not approval but a mandatory requirement. The standard contract details the rights, obligations, and responsibilities of both the domestic processor and overseas recipient regarding data protection (e.g., adhering to purpose/manner limits, security measures, cooperation with rights requests, liability).
  4. Meeting other conditions stipulated by laws, administrative regulations, or the CAC: An open-ended catch-all clause for future rules, international agreements, or special circumstances (e.g., transfers under treaties or judicial assistance).
- Other Core Requirements Besides Meeting One of the Four Conditions:
  - Obtain “Separate Consent” after Full Notification: PIPL Article 39 explicitly requires fully informing individuals about the overseas recipient’s details, purpose, manner, data types, retention period, rights exercise methods, etc., and obtaining their “separate consent” before exporting personal information.
  - Conduct Prior Personal Information Protection Impact Assessment (PIA): PIPL Article 55 lists cross-border transfer as a mandatory scenario requiring a prior PIA, assessing legality, necessity, impact, risks, and safeguards.
  - Ensure Overseas Processing Meets PIPL Standards: PIPL Article 38(2) requires domestic processors to take necessary measures to ensure the overseas recipient’s processing activities meet the protection standards stipulated by PIPL. This usually involves contractual obligations (e.g., in the standard contract) and potentially ongoing oversight.
Typical Cross-Border Data Compliance Challenges for AI Applications:
- Accurately Identifying and Mapping Data Export Scenarios: For complex AI applications (esp. those relying on global cloud services or involving multinational teams), the first step is to carefully map all data processing flows to accurately identify if and how personal information is transferred from mainland China overseas. This can be complex:
  - Directly using AI model services or APIs hosted overseas: E.g., a Chinese company calling OpenAI’s GPT-4 API hosted in the US, sending text containing Chinese customer/employee info.
  - Storing collected Chinese user data on overseas cloud servers: E.g., backing up or storing app usage data on AWS, Azure, or Google Cloud regional centers outside China.
  - Allowing overseas affiliates, parent companies, or third-party service providers (e.g., global analytics teams, model training centers, tech support, customer service) to remotely access or download databases containing personal information stored within China.
  - Using seemingly local software that integrates features calling overseas AI services (e.g., a document translation feature calling a foreign translation API). Accurately identifying and documenting all potential export paths is step one.
- Choosing Appropriate Compliance Paths and Completing Procedures: For each identified export activity, determine the correct path (security assessment? certification? standard contract?) based on data type (sensitive? important?), volume (thresholds?), processor identity (CIIO?), and recipient details. Then, invest necessary resources (time, personnel, budget) to diligently complete all required procedures: comprehensive self-assessment (PIA), negotiating/signing compliant legal documents (standard contract), obtaining separate consent, filing/reporting to regulators. This is often a complex, potentially lengthy process requiring close collaboration between legal, tech, and business teams, often needing professional support.
- Addressing Potential “Data Localization” Requirements in Other Jurisdictions: Note that other countries or sectors (e.g., Russia, India, Vietnam for certain data types; finance, health sectors in some countries) might also have mandatory data localization requirements, mandating certain data be stored/processed domestically. AI applications operating in these regions must design systems/data flows accordingly.
Special Considerations for the Legal Industry Regarding AI Data Transfers:
- Compliance of Using Overseas AI Tools for Domestic Data: If Chinese law firms or legal departments plan to use powerful overseas AI tools (e.g., advanced US contract review platforms, European legal research databases, global LLM APIs) to assist in processing case materials containing Chinese client/employee personal information or sensitive case details, they must first extremely cautiously assess and resolve the resulting cross-border data transfer compliance issues. Need to determine: Does this constitute personal information export? Is security assessment or standard contract needed? Can valid separate consent be obtained from all relevant individuals (clients, staff, opposing parties)? Does it increase risks of confidentiality breach? Such overseas services should not be readily used for sensitive domestic data unless full compliance and risk control are assured.
- Advising Multinational Clients on AI Cross-Border Compliance: As global data protection laws tighten and converge (while still differing significantly), and multinational corporations increasingly use AI globally (e.g., shared customer databases, centralized AI analytics platforms, overseas R&D for model training), ensuring their AI applications and related data transfers involving China fully comply with Chinese law is a major challenge. Providing expert legal advice to these clients on AI cross-border compliance strategy, risk assessment, pathway selection (assessment, standard contract), internal process development, and communication with regulators is becoming an increasingly important and challenging practice area for data protection lawyers and international legal counsel. This requires deep expertise not only in China’s PIPL and related rules but also a thorough understanding and comparative analysis capability regarding GDPR, relevant US laws, and data export rules in other key jurisdictions where the client operates.

6. Lawfulness and Content Compliance of AI Training Data: Governing Risks and Biases from the “Source”

The performance, reliability, safety, and even fairness (or bias) of AI models depend critically (arguably decisively) on the quality, scale, diversity, and compliance of the training data they “learn” from or are “fed.” Therefore, ensuring the data used to train AI models (especially those with broad societal impact or used for high-risk decisions) has lawful origins, compliant content, and has undergone necessary cleaning, de-biasing, and quality control is an extremely important “source” governance step for responsible AI development and deployment. Governing data well at the source is foundational to controlling downstream AI risks.

Legality of Training Data Sources & Copyright Compliance Challenges:
- Lawful Data Acquisition?: Were the original sources of the training data obtained lawfully?
  - If from public web scraping/crawling, did it respect robots.txt protocols? Did it potentially violate website Terms of Service prohibiting automated access/scraping? Could large-scale, high-frequency scraping interfere with website operation, constituting unfair competition or violating cybersecurity laws against unauthorized network access/interference?
  - If from third-party data vendors (purchase/license), was the vendor’s original acquisition lawful and compliant? Did they have the right to resell/license for AI training? Does the license clearly define scope and limitations?
  - If using internal organizational data (historical business data, customer interactions), was the original purpose of collection compatible with the new purpose of AI training? Or was new valid consent obtained for this secondary use (if personal information involved)?
- Infringement Risk from Copyrighted Works in Training Data: The most contentious issue currently (discussed in detail in Section 7.3). Training large AI models (esp. LLMs, image/audio generators) almost inevitably involves using massive amounts of data containing vast quantities of copyrighted works (text, images, code, music). Does this large-scale copying and use for training (esp. commercial) AI models without explicit permission from most rights holders constitute copyright infringement? Or can it fall under exceptions like “fair use” or “TDM exceptions”? Global legal rules are currently unclear, litigation outcomes highly uncertain, posing huge legal risks for all AI developers and users.
Compliance and Quality Review of Training Data Content:
- Filtering Illegal & Harmful Content: Does the training data (esp. from the internet) potentially contain large amounts of illegal information (inciting violence, terrorism, hate speech), harmful content (pornography, gambling, extremist views), or significant discriminatory biases and disinformation? Before feeding this data to models, were effective technical means (content filters, sensitive word lists) and necessary manual review used for thorough cleaning, filtering, and de-biasing? Models trained on “toxic” data are highly likely to generate illegal, harmful, or biased outputs. Ensuring “clean” and “compliant” training data is the first step in responsible AI development.
- Compliance for Secondary Use of Internal Data: If planning to use internal historical data (e.g., law firm using past case files for internal analysis model; legal dept using past contracts for internal review tool), special attention is needed:
  - Is the “secondary use” purpose of AI training compatible with the original purpose stated when collecting the data from relevant parties (clients, employees)? If not, is new explicit authorization or consent required?
  - Does this secondary use violate confidentiality clauses or data use restrictions in agreements with relevant parties (esp. clients)?
  - Before use, was the internal data (esp. containing client info, case details, trade secrets) thoroughly and effectively anonymized or pseudonymized to minimize privacy/confidentiality risks? (And is the anonymized data still useful for training?).
Accuracy, Consistency & Potential Bias in Data Labeling:
- Label Quality is Key for Supervised Learning: For AI models requiring Supervised Learning (e.g., training a model to identify risky contract clauses needs expert-labeled examples), the quality of training data labels (accuracy, consistency) directly determines the upper limit of the final model’s performance.
- Challenges in Ensuring Label Quality: Requires establishing clear, unambiguous labeling guidelines; adequately training labelers (internal staff or outsourced teams) on standards and relevant expertise; implementing effective quality control and cross-validation processes (e.g., multiple labelers, discrepancy resolution) to ensure label accuracy and consistency.
- Beware of Annotator Bias: Need to be mindful of how labelers’ own potential biases, stereotypes, or subjective judgments might influence their labeling. If the labeling process itself is systematically biased, even objective raw data will result in a biased training set, leading to a biased model. Actively identifying and mitigating annotator bias in guideline design, training, and QA is crucial.

Conclusion: Data Compliance and Privacy Protection are Prerequisites for Safely Unlocking AI’s Value and Responsibilities Legal Professionals Must Uphold

The immense potential and profound risks of AI-driven data processing activities are two sides of the same coin. To ensure AI technology can safely, reliably, and sustainably unleash its vast value in the legal field and society at large, it must be navigated carefully within a strict, robust, and dynamically adaptive legal compliance and ethical governance framework.

Within this framework, data compliance and personal information protection undoubtedly occupy the most central and challenging position. From ensuring every step of personal information processing has a clear, valid lawful basis (especially strict implementation of “informed consent”), to rigorously adhering to all core data processing principles throughout the AI application lifecycle (purpose limitation, data minimization, transparency, accuracy, storage limitation, security, accountability), to imposing stricter regulations on high-risk scenarios like processing sensitive personal information, automated decision-making, and cross-border data transfers, and governing the legality and compliance of training data from the very “source”—each link is filled with complex legal details, potential risk pitfalls, and value conflicts requiring careful balancing.

For those of us in the legal profession, our role and responsibility are twofold:

First, as potential users or deployers of AI technology, legal service providers (law firms, legal departments) and judicial bodies must hold themselves to the highest industry standards when exploring and applying AI to enhance internal efficiency and external services. They must ensure their own operations and technology use fully comply with all relevant data compliance and privacy laws, regulations, and professional ethics, setting a leading example of compliance and responsibility for clients and society.
Second, as professional legal service providers, we also bear the important responsibility of providing our clients (whether cutting-edge AI developers, platform providers, or traditional businesses actively embracing AI) with professional, accurate, timely, and forward-looking legal advice on AI data compliance and privacy protection. We need to help them accurately understand the increasingly complex global data protection legal landscape, identify the various compliance risks latent in their AI applications, design and implement effective compliance management systems and risk control measures, ensuring their business activities and technological innovations operate safely and sustainably within the bounds of the law.

As AI technology continues its rapid evolution and global data protection regulations constantly refine and converge (while inevitably retaining key national differences), data compliance and privacy protection will undoubtedly remain the most central, active, and challenging focal point in AI governance. We legal professionals must maintain high sensitivity, engage in continuous learning and research, actively participate in relevant rule-making and practical explorations, so that in this data-driven intelligent era, we can both help our clients and ourselves safely and effectively seize the tremendous development opportunities brought by AI, while steadfastly safeguarding fundamental individual rights, business ethical norms, and the rule of law foundation of our entire society.