10.4 Glossary of Core Terms

Glossary of AI & Law Core Terms

This glossary compiles key terms frequently encountered in discussions of artificial intelligence (AI), machine learning (ML), large language models (LLMs), legal technology (LegalTech), and related legal and ethical issues, providing concise explanations. It aims to help readers better understand the content of this encyclopedia and professional literature in related fields.

A

Accountability: In AI ethics and governance, the principle and mechanisms ensuring that it is possible to determine who is responsible and how they are responsible for the design, deployment, and outcomes of AI systems, especially when errors occur or harm is caused. It involves transparency, traceability, and establishing clear chains of responsibility.
Adversarial Attack: Maliciously designed inputs intended to deceive an AI model into making incorrect judgments or behaving improperly. Examples include adding imperceptible perturbations to images to cause misclassification, or using prompt injection to bypass LLM safety constraints.
Algorithm: A finite sequence of well-defined instructions or computational steps designed to solve a specific problem or perform a specific task. AI relies heavily on various complex algorithms.
Algorithmic Bias: Systematically unfair or discriminatory outcomes produced by an AI system against certain groups. Can originate from biased data, algorithm design, or deployment context.
Alignment / AI Alignment: A core challenge in AI research and ethics, referring to the goal of ensuring that the objectives, values, and behaviors of increasingly powerful and autonomous AI systems remain consistent with human designers’ intentions and the best interests of humanity. It involves effectively defining, communicating, evaluating, and embedding human values into AI systems to prevent unintended or harmful behaviors. RLHF is a key technique currently used for alignment.
Artificial Intelligence (AI): A broad field of computer science focused on the theory, methods, technology, and application systems that aim to simulate, extend, and expand human intelligence. An umbrella term encompassing machine learning, deep learning, etc.
Artificial Neural Network (ANN): Computational models inspired by the structure of biological brains, composed of numerous interconnected nodes (artificial neurons) organized in layers. The foundation of deep learning.
Artificial General Intelligence (AGI): Hypothetical AI possessing human-equivalent or superior cognitive abilities across a wide range of tasks, with general problem-solving skills, autonomous learning, and possibly consciousness. Does not currently exist; all existing AI is narrow AI (ANI).
API (Application Programming Interface): A set of predefined rules and protocols that allow different software systems to communicate and exchange data with each other.
Attention Mechanism: A mechanism in deep learning allowing a model, when processing sequential data, to dynamically focus on the importance of different parts of the input sequence and assign different weights. Self-Attention is core to the Transformer architecture.
Automated Decision-Making: Decisions made solely by algorithms or AI systems without meaningful human intervention, potentially having significant effects on individuals’ rights. Subject to specific regulations under data protection laws (like GDPR, PIPL), often requiring transparency, right to explanation, and right to object.
Autonomous Weapons Systems (LAWS / Lethal Autonomous Weapons Systems): Weapon systems capable of independently searching for, identifying, selecting, and attacking targets. Raises serious ethical and international law (esp. laws of war) controversies; regulation is a major international issue.

B

Backpropagation: The core algorithm for training artificial neural networks. It calculates the gradient of the loss function with respect to the network’s parameters and propagates the error signal backward through the network to update the weights.
Benchmark: Standardized datasets and evaluation metrics (e.g., accuracy, F1-score, BLEU score) used to evaluate and compare the performance of different AI models on specific tasks (text classification, image recognition, machine translation) in a standardized way. Helps objectively understand relative model capabilities.
Bias: See “Algorithmic Bias.” Also refers to statistical bias in machine learning models (the difference between the average prediction and the true average value), one component of model error (cf. Bias-Variance Tradeoff).
Big Data: Massive datasets characterized by the “Vs” (Volume, Velocity, Variety, Veracity, sometimes Value). Processing and analyzing big data requires new methods beyond traditional database technologies. The availability of big data is a key fuel driving modern AI, especially deep learning.
Black Box: A term describing AI models, especially deep neural networks, whose internal workings and decision-making processes are extremely complex and difficult for humans to directly observe, understand, or explain. A major cause of challenges in AI explainability, trust-building, and liability attribution.

C

Chatbot: An AI program capable of conducting conversations with human users via text or voice interfaces. Commonly used for customer service, information retrieval, initial consultations, task execution. Modern chatbots are often built upon LLMs.
Chain-of-Thought (CoT): A prompt engineering technique. By explicitly instructing or demonstrating in the prompt for the model to output its intermediate reasoning steps, logical chain, or thought process before giving the final answer, it can significantly improve the accuracy and reliability of large language models (LLMs) on tasks requiring multi-step reasoning, mathematical calculations, or complex logical analysis.
Classification: A core task in Supervised Learning. The goal is to assign an input data instance (e.g., a document, email, image) to one of several predefined, discrete categories. E.g., classifying contracts as lease, service, or confidentiality agreements.
CLIP (Contrastive Language-Image Pre-training): A powerful multimodal model developed by OpenAI. Trained on massive image-text pairs using contrastive learning, it learns a shared embedding space that captures deep semantic relationships between images and text. Foundational for many advanced text-to-image (e.g., DALL-E 2) and visual question answering models.
Cloud Computing: A model for delivering scalable computing resources (servers, storage, databases, networking, software, AI services) over the internet on demand. Cloud platforms (AWS, Azure, GCP, Alibaba Cloud, Tencent Cloud) significantly lower the barrier for accessing the powerful compute needed to train and run large AI models, acting as crucial infrastructure for AI adoption.
Clustering: A core task in Unsupervised Learning. The goal is to automatically group samples in a dataset into several “clusters” based on their inherent similarity (often distance in feature space), without predefined labels, such that samples within a cluster are similar, and samples between clusters are dissimilar. E.g., clustering judgments based on reasoning style or topic.
Computer Vision (CV): A major field of AI dedicated to enabling computers to “see” and “understand” visual information from digital images or videos, performing tasks like recognition, detection, tracking, segmentation, and scene understanding, similar to human vision.
Convolutional Neural Network (CNN): A type of deep learning model particularly adept at processing data with a grid-like topology, most notably images. Uses Convolutional Layers to effectively extract spatial hierarchies of features (from edges/textures to parts to objects) and often Pooling Layers for dimensionality reduction and robustness. Foundational for modern CV tasks.
Context Window / Context Length: The maximum length of text (usually measured in Tokens) that a large language model (LLM) can effectively consider and utilize in a single interaction (processing input prompt and generating output). Information beyond the context window is ignored. A critical performance bottleneck for tasks involving long legal documents. Context windows are rapidly expanding (from thousands to millions of tokens).
Continual Learning / Lifelong Learning: An AI capability where a system can continuously learn from new data streams or adapt to environmental changes after deployment, without catastrophically forgetting previously learned knowledge. A key challenge towards more human-like, adaptive AI, still an active research area.
ControlNet: A powerful mechanism designed for Diffusion Models (like Stable Diffusion) that allows users to provide an additional “control image” (e.g., human pose skeleton, room depth map, object edge map, sketch, semantic segmentation map) during image generation, enabling highly precise control over the final image’s spatial layout, subject pose, object shape, or overall structure. Significantly enhances the controllability and utility of AI image generation.

D

Data Augmentation: A technique used in training machine learning models (esp. deep learning) to artificially increase the size and diversity of the training dataset by applying various minor, meaning-preserving transformations to existing data (e.g., rotating/cropping/color-jittering images; synonym replacement/back-translation for text). Helps improve model generalization, enhance robustness, and mitigate overfitting due to insufficient data.
Data Annotation / Labeling: The process in Supervised Learning of adding informative labels or metadata to raw data (images, text, audio), typically representing the “correct answers” or “target outputs” the model needs to learn to predict. E.g., drawing bounding boxes and assigning class labels in object detection; tagging customer reviews as “positive” or “negative.” Crucial prerequisite for supervised training; often time-consuming and expensive.
Data Mining: The process of automatically or semi-automatically discovering useful, previously unknown, non-trivial patterns, associations, trends, or anomalies from large datasets. Often uses techniques from statistics and machine learning.
Data Masking / De-identification / Anonymization: Technical processes applied to data containing personally identifiable information (PII) or other sensitive info to remove, replace, encrypt, or obscure it, reducing privacy risks while trying to preserve data utility. An important compliance measure when processing sensitive data in AI applications (esp. training or using third-party services).
Decision Tree: A basic, intuitive Supervised Learning model. Learns a series of “IF-THEN” rules based on features, partitioning data hierarchically from a root node down to leaf nodes, which provide the classification result or regression prediction. Highly interpretable but prone to overfitting. Often used as base learners in Ensemble Methods.
Deepfake: Highly realistic, difficult-to-detect fake or manipulated audio, video, or image content created using deep learning (esp. generative models like GANs, Diffusion Models). Examples include swapping faces in videos (face-swapping), synthesizing speech in someone’s voice (voice cloning). Poses severe threats to information authenticity, personal reputation, social trust, and even national security.
Deep Learning (DL): A core subfield of Machine Learning characterized by using Artificial Neural Networks (ANNs) with multiple processing layers (“deep”). Its key advantage is the ability to automatically learn hierarchical feature representations from raw data without extensive manual feature engineering. Achieved breakthrough success on high-dimensional, unstructured data (images, speech, text), driving the current AI wave.
Diffusion Models: A class of deep learning generative models that have achieved state-of-the-art results in high-quality generation tasks (esp. images, audio, video). Core idea involves two processes: a forward (diffusion) process that gradually adds noise to real data until it becomes pure noise, and a reverse (denoising) process where a neural network (often U-Net) learns to precisely reverse this, starting from noise and iteratively removing it to generate realistic samples. Core tech behind tools like Stable Diffusion, DALL-E 2/3, Midjourney, Imagen.
Dimensionality Reduction: A class of techniques in Machine Learning aimed at transforming high-dimensional data (many features) into a lower-dimensional representation while preserving as much important information or structure as possible. Helps combat the “curse of dimensionality,” reduce computational cost, remove redundant features, and facilitate visualization. Common methods include PCA, t-SNE, LDA.
Digital Divide: The gap and inequality between different social groups in their access to, use of, and ability to benefit from digital technologies (internet, computers, smartphones, AI applications). The spread of AI, if not inclusive, risks exacerbating the digital divide, further marginalizing vulnerable groups.

E

e-Discovery / Electronic Discovery: The process in legal proceedings or investigations involving the identification, collection, preservation, processing, review, analysis, and production of Electronically Stored Information (ESI) (emails, documents, databases, social media, mobile data, etc.). AI technologies (esp. TAR/Predictive Coding) play an increasingly vital role in handling the massive volumes of data involved.
Electronic Personality / Legal Personhood for AI: A cutting-edge, controversial legal and philosophical concept exploring whether and how highly autonomous and intelligent AI systems might be granted some form of independent legal status, enabling them to hold rights, bear obligations, and be held liable. Not currently accepted by mainstream legal systems globally. (See Section 8.3)
Embedding: In ML (esp. NLP, CV), the mathematical representation of high-dimensional, often discrete or unstructured data objects (like words, sentences, documents, images, users) as dense, lower-dimensional vectors in a continuous vector space. Embeddings are designed to capture latent semantic relationships, similarities, or other important features. E.g., semantically similar words have closer vectors. A foundational input representation for many modern AI models.
Ensemble Methods: A powerful ML paradigm that combines the predictions of multiple base learners (often of the same or different types) to achieve better, more robust performance than any single base learner. Common strategies include Bagging (Random Forest), Boosting (AdaBoost, GBDT, XGBoost, LightGBM), and Stacking.
Ethical Impact Assessment (EIA): A systematic, forward-looking process to identify, analyze, and evaluate the potential ethical risks and societal impacts (e.g., on fairness, privacy, autonomy, human rights) of a new technology, policy, or project (esp. AI applications), and to develop corresponding mitigation measures or governance strategies. Considered important responsible governance practice for high-risk AI deployments.
Evaluation Metrics: Standards used to quantify and measure the performance of ML models on specific tasks. Choosing appropriate metrics is crucial for understanding model capabilities, comparing models, and determining fitness for purpose. Examples: Classification: Accuracy, Precision, Recall, F1-Score, AUC, Confusion Matrix. Regression: MSE, MAE, R-squared. Generation: BLEU (translation), ROUGE (summarization), FID (images).
Explainability / Interpretability (XAI): The degree to which a human can understand the reasons behind an AI model’s specific prediction or decision. Crucial for building trust, debugging models, ensuring fairness, enabling accountability, and meeting regulatory requirements. Providing meaningful explanations for complex “black box” models remains a major technical challenge. (See Section 6.4)
Expert System: An early form of AI (symbolic AI). Attempted to capture the knowledge, experience, and reasoning rules of human experts in a specific domain (medicine, chemistry) into a large “IF-THEN” rule base with an Inference Engine to mimic expert problem-solving. Had success in narrow, rule-based domains but faced limitations (knowledge acquisition bottleneck, handling uncertainty/common sense, lack of learning), largely superseded by ML approaches (though rule-based systems still have uses).

F

Fairness: A highly complex and central ethical and technical concept in AI. Aims to ensure AI systems’ processes and outcomes do not produce systemic, unjust discrimination or favoritism against individuals based on protected characteristics (race, gender, age, etc.). However, “fairness” has no single, universally agreed-upon mathematical definition. Multiple, often conflicting, fairness metrics exist (group vs. individual fairness; equal opportunity vs. equal treatment vs. demographic parity). Choosing which fairness notion to prioritize involves value judgments and trade-offs. (See Section 6.4)
Feature: In ML, a measurable, typically numerical variable extracted from raw input data used to describe some property or characteristic of that data point. Models learn relationships between features and target outputs. E.g., features for spam detection might include sender domain, presence of certain words, number of links.
Feature Engineering: A critical and often time-consuming step in traditional ML pipelines. Refers to the process where human experts (using domain knowledge and data analysis) manually design, extract, select, and transform raw data into features deemed most effective and predictive for the target task, before feeding them to the learning algorithm. Deep learning’s key advantage is largely automating this process.
Federated Learning: An emerging, privacy-preserving paradigm for distributed machine learning. Allows multiple parties (e.g., phones, hospitals, banks) to collaboratively train a global ML model without sharing their raw local data. Basic process: central server distributes initial model; parties train locally on their data, compute parameter updates; parties send only encrypted/aggregated updates (not raw data) back; server aggregates updates to improve global model; repeat. Promising for privacy-sensitive collaborative modeling.
Few-Shot Learning / Zero-Shot Learning: The ability of an ML model (esp. deep learning) to generalize and perform effectively on a specific task after being exposed to only a very small number (Few-Shot, e.g., 1-5 per class) or even zero (Zero-Shot) labeled training examples for that task. Relies heavily on the model leveraging broad general knowledge and pattern recognition capabilities learned during large-scale pre-training. Modern LLMs often exhibit strong zero-shot and few-shot learning abilities.
Fine-tuning: A common and effective technique for training large AI models (esp. Foundation Models). Involves taking a large model already pre-trained on massive general data (possessing broad base capabilities) and further training it specifically on a smaller, task-relevant or domain-specific dataset. Fine-tuning allows the general model to “adapt” better to the specific needs, knowledge, or instruction style of the downstream task or domain, achieving better performance. (See Section 2.4)
Foundation Model: Refers to large AI models trained on vast, diverse data (often unlabeled via self-supervised learning) that develop powerful general-purpose capabilities (understanding, generation, reasoning) and can be easily adapted (e.g., via fine-tuning) to a wide range of specific downstream tasks. Examples include LLMs (GPT-4, BERT, Llama), large vision models (CLIP, ViT), and multimodal foundation models. Their emergence is changing AI application development paradigms.
Function Calling / Tool Use: An advanced capability of LLMs allowing them not just to generate text, but also to understand user requests needing external tools or APIs (search engines, calculators, databases, calendars, e-commerce), generate structured requests (e.g., JSON) to call these functions, receive the function’s results, and integrate these results into generating a final, more accurate, actionable response. Greatly expands LLM applicability by connecting them to the external world.

G

Generative Adversarial Network (GAN): A powerful framework for generative modeling introduced by Ian Goodfellow et al. in 2014. Core idea: two competing neural networks—a Generator learns the distribution of real data and tries to create realistic fake samples, and a Discriminator learns to distinguish real samples from fake ones. They improve together in an adversarial “zero-sum game,” ideally resulting in a generator producing highly realistic data. GANs achieved great success in image generation (e.g., StyleGAN) but face training instability challenges.
Generative AI (GenAI): A broad class of AI systems capable of creating new, seemingly original content (rather than just analyzing or predicting). Generated content can be text, images, audio, video, code, music, 3D models, even molecular structures. The content typically resembles the patterns, styles, and structures learned from training data. Core technologies are usually based on large deep learning generative models like GANs, VAEs, Transformers (esp. LLMs), and Diffusion Models. (See Section 6.6)
General-Purpose AI Model (GPAI): AI models or systems designed with broad general capabilities applicable to multiple different purposes and downstream scenarios (contrasting with narrow AI for single tasks). LLMs are typical GPAIs. Due to their wide potential impact, regulatory frameworks like the EU AI Act are introducing specific governance requirements for them (esp. large GPAIs with “systemic risk”).
Gradient Descent: The core, most common optimization algorithm for training ML models (esp. neural networks). Basic idea: compute the gradient of the Loss Function with respect to all model parameters (gradient points direction of steepest ascent); update parameters iteratively by moving in the opposite direction of the gradient (steepest descent) with a step size called the Learning Rate, aiming to find parameters that minimize the loss. Practical variants like SGD, Adam, RMSprop are typically used for efficiency/stability.
Graphics Processing Unit (GPU): Specialized hardware originally designed for accelerating computer graphics rendering. Found to be extremely well-suited for the massive parallel computations (matrix/vector operations) involved in training and running deep learning models (esp. neural networks) due to their architecture with thousands of simple parallel cores. Wide availability of GPUs is considered a key hardware enabler of the modern deep learning revolution, allowing training of much larger, more complex models.

H

Hallucination: A common and extremely dangerous phenomenon in Generative AI (especially LLMs). Refers to the model confidently and fluently generating information that is factually incorrect, inaccurate, contradicts the input prompt or known facts, or is entirely fabricated, yet presented in a way that seems plausible, professional, even authoritative. E.g., making up non-existent legal cases, citing wrong statutes, distorting historical events. Caused by the LLM’s nature as a statistical pattern generator rather than a fact-checker. Identifying and mitigating hallucinations is a core challenge in serious applications like law. (See Sections 2.8, 6.1, 6.6)

Relying on hallucinatory outputs in legal work can lead to catastrophic consequences. Rigorous, independent verification of all factual and legal claims made by LLMs is absolutely essential! Never trust AI output just because it “sounds good.”
Human-AI Collaboration / Human-in-the-Loop / Human-on-the-Loop: Various models of humans and AI systems working together.
- Human-in-the-Loop (HITL): Humans play an essential, active role at critical decision points or stages in the AI process (e.g., reviewing AI outputs, correcting errors, providing feedback to improve the model).
- Human-on-the-Loop (HOTL): Humans primarily act as supervisors, monitoring the AI system’s operation and intervening only when necessary (e.g., system encounters uncertainty, flags an issue, output confidence is low) or making the final decision.
- Human-AI Collaboration: Broader concept emphasizing leveraging the respective strengths of humans (complex judgment, creativity, ethics) and AI (information processing, pattern recognition, automation) to achieve synergistic outcomes (“1+1>2”). In law, emphasizing human final judgment and responsibility, collaboration is the fundamental model for responsible AI use.
Hyperparameter: Parameters of a machine learning model that are set by the human developer or user before the training process begins (as opposed to Model Parameters learned from data during training). Hyperparameter choices directly influence the training process and final model performance. Examples include: learning rate, number of layers/neurons in a neural network, size/number of convolutional filters, regularization strength (L1/L2 coefficients), choice of optimizer (Adam, SGD), number of training epochs, batch size, etc. Finding optimal hyperparameter settings (Hyperparameter Tuning) often requires extensive experimentation and experience.

I

Information Extraction (IE): An important task in NLP. Aims to automatically identify, extract, and structure specific types of information from unstructured (free text) or semi-structured (webpages) text, such as entities (people, places, organizations), relationships between entities, or specific types of events and their participants. IE is a key prerequisite for many downstream NLP applications (knowledge graph construction, Q&A, relation extraction) and has wide use in legal document processing (extracting key terms from contracts, case elements from judgments).
Instruction Fine-tuning: A key fine-tuning technique for improving the capabilities and behavior of LLMs. Core idea: collect large, diverse datasets of “instruction-output” pairs (e.g., <“Translate this to French”, “French translation”>, <“Summarize main points”, “Summary”>, <“Write a poem about autumn”, “Poem”>). Use these examples for further supervised training of a pre-trained LLM. This helps the model better understand the intent behind various natural language instructions and learn to generate outputs that are more aligned with user expectations, more helpful, and safer. InstructGPT (basis for early ChatGPT) and many modern conversational LLMs heavily utilize instruction fine-tuning.
Intellectual Property (IP): A category of legal rights granting creators or owners exclusive rights over their creations of the mind, such as inventions, literary and artistic works, designs, symbols, names, and images used in commerce. Main types include Patents, Copyrights, Trademarks, and Trade Secrets. AI development poses profound challenges to traditional IP law, especially regarding training data use, ownership of AI-generated content, and patentability of AI-related inventions. (See Section 7.3)
Interpretability: Same as “Explainability (XAI)”. The degree to which a human can understand the reasons behind an AI model’s decision.

L

Large Language Model (LLM): Refers to deep learning models, typically based on the Transformer architecture, trained via large-scale self-supervised pre-training on extremely massive datasets (billions or trillions of tokens) of text and code, and possessing a huge number of learnable parameters (billions to trillions). LLMs exhibit remarkable capabilities in natural language understanding, generation, reasoning, translation, summarization, Q&A, and even coding, serving as the core engine driving the current generative AI wave. Examples: OpenAI’s GPT series, Google’s Gemini/PaLM, Anthropic’s Claude, Meta’s Llama, etc. (See Sections 2.4, 3.1)
Latent Space / Latent Representation: In certain ML models (esp. generative models like VAEs, some GANs, Latent Diffusion Models), refers to a lower-dimensional, abstract vector space. The model learns to Encode high-dimensional input data (images, text) into a point (Latent Vector) in this space, capturing the data’s core underlying features or “semantics.” Conversely, the model can sample a point from the latent space and use a Decoder to map it back to the original data space, generating new samples. Operations in latent space (interpolation, addition) often allow meaningful control over generated content.
Legal Tech / Legal Technology: An umbrella term for the industry, practice, and tools utilizing technology (including AI, big data, cloud, blockchain, etc.) to provide legal services, improve legal workflows, enhance judicial system efficiency, or increase access to legal information. A product of deep fusion between technology and law, increasingly reshaping the legal industry.
LoRA (Low-Rank Adaptation): A Parameter-Efficient Fine-Tuning (PEFT) technique, especially useful for fine-tuning large pre-trained models (LLMs, diffusion models for images). Core idea: freeze most of the original model’s massive weights; insert two small, low-rank trainable matrices (A, B) alongside certain key layers (e.g., attention, feed-forward); only train these small matrices. During inference, their product (A*B, still low-rank) is added to the original weights. Achieves effective model adaptation with minimal parameter increase. Widely used in open-source communities like Stable Diffusion for personalization (learning specific styles, characters) due to fast training, low memory use, and small adapter file size.
Loss Function / Cost Function / Objective Function: In ML model training (esp. supervised), a mathematical function that quantifies the discrepancy or error between the model’s predictions and the corresponding true labels (Ground Truth). The core goal of training is to adjust model parameters to Minimize the value of this loss function. Specific form depends on task type (e.g., Mean Squared Error (MSE) for regression, Cross-Entropy Loss for classification).

M

Machine Learning (ML): A core subfield and method of Artificial Intelligence. Focuses on algorithms that allow computer systems to automatically “learn” patterns or regularities from data and subsequently improve their performance on specific tasks based on that learning, without being explicitly programmed for every rule. Foundation of modern AI. (See Sections 1.3, 2.2)
Mel-Frequency Cepstral Coefficients (MFCCs): A type of acoustic feature once most commonly used in Speech Recognition (STT). Combines the Mel Scale (mimicking human non-linear frequency perception) and Cepstrum Analysis (decoupling source/filter) to effectively represent the spectral envelope of speech, capturing key aspects of timbre.
Model: In ML context, a mathematical representation or structure learned by a learning algorithm from training data. Encapsulates patterns, relationships, or input-output mappings discovered from data, used to make predictions, classifications, generate content, or perform other tasks on new, unseen data. Can take various forms (linear regression coefficients, decision tree structure, SVM hyperplane, deep neural network with numerous weights/biases).
Model Risk Management (MRM): Primarily in highly regulated industries like finance, a systematic set of policies, procedures, and practices to identify, assess, monitor, and control the various risks (financial loss, compliance risk, reputational risk, operational risk) arising from errors, flaws, misuse, or failure of models (including traditional statistical models and modern AI/ML models). Typically covers the entire model lifecycle from development/validation to deployment/use and monitoring/retirement. (See Section 8.5, Fintech part)
Multimodal AI: AI systems capable of processing, understanding, relating, and generating information from multiple different modalities simultaneously. Modalities refer to different forms of information, such as text, images, audio, video, tabular data, sensor signals. Aims to mimic human ability to integrate multiple senses to understand the world. A hot research area, examples include image-text understanding (CLIP, GPT-4V), Visual Question Answering (VQA), text-to-image/video generation. (See Section 2.7)

N

Named Entity Recognition (NER): A fundamental task in NLP. Aims to automatically identify and classify predefined categories of named entities within text, such as persons (PER), organizations (ORG), locations (LOC), dates/times (TIME), proper nouns, etc. An important preprocessing step for many downstream NLP applications (information extraction, knowledge graph construction, Q&A).
Natural Language Processing (NLP): An important interdisciplinary field of AI, computer science, and linguistics focused on enabling computers to understand, interpret, process, manipulate, and generate human natural language (like English, Chinese). Key for enabling human-computer interaction via language and extracting knowledge from vast text data. The most core and foundational tech area for legal AI applications. (See Section 1.3)
Neural Network: See “Artificial Neural Network (ANN)”.
Neural Vocoder: In modern Text-to-Speech (TTS), the component responsible for synthesizing the final, audible, high-quality audio waveform from intermediate acoustic feature representations (e.g., Mel-spectrograms) generated by the acoustic model. Neural vocoders (like WaveNet, WaveGlow, HiFi-GAN) were a key breakthrough significantly improving the naturalness and fidelity of synthetic speech. (See Section 2.6)
N-gram Model: A simple yet common statistical language model (SLM). Based on a simplified Markov assumption: the probability of a word depends only on the preceding N-1 words. Estimates sequence probabilities by counting frequencies of N-grams (sequences of N words) in large text corpora. Common N=2 (Bigram) or 3 (Trigram). Simple, efficient, but struggles with long-range dependencies. Widely used in NLP before deep learning models (RNNs, Transformers).

O

Optical Character Recognition (OCR): Technology using computer software to recognize printed or (sometimes) handwritten characters within scanned document images or pictures, converting the text into machine-readable, editable, searchable electronic text format. A crucial foundational step for digitizing paper legal documents and enabling subsequent AI text analysis. Modern OCR often combines CV (text location, preprocessing) and NLP (character recognition, contextual correction).
Overfitting: A common problem in ML model training. Occurs when a model performs extremely well on the training data (low loss, high accuracy) but significantly worse on new, unseen data (test set, real-world application). Usually happens when the model is too complex relative to the data/task, causing it to “memorize” noise, random fluctuations, or unrepresentative details in the training data instead of learning generalizable underlying patterns. Preventing overfitting (using techniques like Regularization, Early Stopping, Data Augmentation, Dropout) is a core challenge in ML.

P

Parameters: In ML models (esp. neural networks), the variables whose values are learned and optimized from data during the training process. E.g., connection Weights and node Biases in neural networks. The model’s capability and behavior are embodied in the specific values of these parameters. Large models can have billions or trillions of parameters.
Personal Information Protection Impact Assessment (PIA / Data Protection Impact Assessment, DPIA): A structured, proactive risk assessment process designed to identify, analyze, and evaluate the potential risks to data subjects’ rights and freedoms arising from a planned personal information processing activity (especially those involving new technologies, sensitive data, automated decision-making, or otherwise likely high risk), and to develop effective measures to mitigate those risks. Conducting a PIA/DPIA is a mandatory legal requirement under many data protection laws (like GDPR Article 35, PIPL Article 55) for certain types of high-risk processing.
Privacy-Enhancing Technologies (PETs): A class of technologies and methods designed to protect personal data privacy and confidentiality while enabling data processing or analysis. They aim to minimize exposure of raw sensitive information. Examples include Differential Privacy, Homomorphic Encryption, Secure Multi-Party Computation (SMPC), Zero-Knowledge Proofs, and Federated Learning. Using PETs in AI applications (esp. with sensitive data) helps meet data protection requirements.
Prompt: The input text provided by a user to a generative AI model (especially LLM), containing instructions, questions, text to be completed, or contextual information and examples to guide the generation process. The prompt is the trigger and basis for the AI’s output. Prompt quality heavily influences output quality. (See Section 4.1)
Prompt Engineering: An emerging, highly practical discipline and skill focused on designing, constructing, testing, optimizing, and iterating on prompts to maximally guide LLMs or other generative AI models to understand user intent and generate high-quality, relevant, accurate, safe, and requirement-compliant outputs. Key to effectively utilizing and controlling modern generative AI. (See Part 4)
Prompt Injection: A novel security attack targeting LLM-based applications. Attackers embed hidden, malicious instructions within seemingly harmless user input (prompts) to try to override, bypass, or manipulate the developer’s original system prompt or safety guardrails. Aims to trick the LLM into performing unintended or harmful actions, like leaking sensitive internal knowledge, generating prohibited content, or executing malicious commands via connected external tools (Function Calling). A major security risk for LLM applications. (See Section 6.1, Adversarial Attacks part)

R

Random Forest: A powerful and widely used Ensemble Learning algorithm of the Bagging type. Constructs many Decision Trees, introducing randomness during each tree’s training (e.g., random sample subsets, random feature subsets). Aggregates predictions from all trees (e.g., voting for classification, averaging for regression) for the final output. Generally exhibits high accuracy, good robustness, resistance to overfitting, and can estimate feature importance.
Retrieval-Augmented Generation (RAG): A key technical framework aimed at improving LLM answer accuracy, reducing hallucinations, and enabling them to utilize specific external knowledge. Core idea is “retrieve then generate”: When an LLM receives a question, it first uses a Retriever to fetch the most relevant text snippets (Context) from an external, trusted knowledge base (internal documents, regulations, case files). Then, this retrieved context is “augmented” into the prompt along with the original question. Finally, the LLM is instructed to generate the answer primarily based on this provided context. One of the most important and effective paradigms for applying LLMs in enterprise/professional domains currently. (See Section 4.6)
Regression: A core task in Supervised Learning. The goal is to predict a continuous numerical output value. E.g., predicting house prices, future stock values, time required for a legal project.
Regularization: A class of techniques in ML used to prevent model Overfitting. Core idea: add a Penalty Term to the model’s Loss Function, related to model complexity (e.g., size of parameters). Minimizing loss + penalty encourages simpler models, improving generalization. Common methods: L1 (Lasso), L2 (Ridge) regularization.
Reinforcement Learning (RL): One of the three main ML paradigms. Studies how an Agent learns an optimal Policy (behavior strategy) by interacting with an Environment, taking Actions, receiving feedback (Rewards or Punishments), and learning through Trial-and-Error to maximize cumulative long-term reward. Suited for Sequential Decision Making tasks with potentially delayed feedback. (See Section 2.2)
Reinforcement Learning from Human Feedback (RLHF): A key AI Alignment technique widely used to optimize LLM behavior to be more aligned with human preferences (helpful, honest, harmless). Typical process: (1) Collect human preference data by having humans rank multiple AI responses to prompts. (2) Train a separate Reward Model (RM) to predict human preference scores. (3) Use Reinforcement Learning (e.g., PPO algorithm) to fine-tune the LLM, using the RM’s score as the reward signal, guiding the LLM to generate higher-scoring (human-preferred) responses. Core tech behind models like ChatGPT. (See Section 2.4, Fine-tuning part)
Robotic Process Automation (RPA): Use of software robots (Bots) to mimic human user actions on computer interfaces (clicking, typing, copy-pasting, logging in, opening apps) to automate rule-based, repetitive business processes spanning multiple systems. RPA itself isn’t necessarily complex AI, but often integrates with AI technologies (OCR, NLP, ML) to handle more complex tasks (e.g., extracting data from scans into ERP).
Robustness: The ability of an AI model to maintain its performance (e.g., prediction accuracy, decision stability) when faced with minor perturbations or noise in input data, or new data slightly different from its training distribution (Out-of-Distribution Data). Many deep learning models exhibit relative fragility in robustness, susceptible to Adversarial Attacks. (See Section 2.8)

S

Semantic Search: Information retrieval technique based on understanding the meaning (semantics) of queries and documents, rather than just matching keywords. Better handles synonyms, related concepts, complex natural language queries, providing more relevant results. Often relies on vector embeddings and similarity calculations.
Self-Attention Mechanism: The core innovation and foundation of the Transformer architecture. A special type of attention mechanism allowing the model, when processing each element (e.g., word) in an input sequence (e.g., sentence), to simultaneously compute and attend to the relevance or importance of all other elements (including itself) in the sequence, dynamically generating a context-rich representation for each element. Effectively captures long-range dependencies and is highly parallelizable, key to LLM success. (See Section 2.4)
Self-supervised Learning (SSL): An ML paradigm (sometimes seen between supervised/unsupervised) where models learn from large amounts of unlabeled data. Instead of human-provided labels, “supervision” comes from “pretext tasks” designed so the model can automatically generate pseudo-labels from the data itself. E.g., in NLP pre-training for LLMs: Masked Language Modeling (MLM) (predicting masked words from context) or Next Token Prediction (predicting the next word given prefix). Allows learning rich representations (language structure, world knowledge) without costly human labeling. Key for training large Foundation Models.
Sequence-to-Sequence (Seq2Seq): A deep learning architecture designed for tasks where both input and output are variable-length sequences. Typically consists of an Encoder (reads input sequence, compresses to context vector) and a Decoder (generates output sequence step-by-step from context vector). Was mainstream (using RNNs/LSTMs) for tasks like machine translation, summarization, dialogue, largely superseded by Transformer but the basic encode-decode idea remains influential.
Speech Synthesis / Text-to-Speech (TTS): AI technology converting input Text into natural-sounding, expressive human Speech. Modern TTS uses deep learning for high realism. (See Section 2.6)
Speech Recognition / Speech-to-Text (STT): AI technology converting human Speech signals into corresponding written Text. Key for voice interaction and audio data processing. (See Section 2.6)
Supervised Learning: One of the three main ML paradigms. Algorithm learns from a training dataset containing “labeled” examples with “correct answers.” Goal is to learn a mapping function (model) to predict outputs for new inputs. Main tasks: Classification and Regression. (See Section 2.2)
Support Vector Machine (SVM): A powerful, classic Supervised Learning algorithm for classification (also regression). Core idea: find the optimal hyperplane in feature space that best separates data points of different classes, typically by maximizing the margin (distance) between the hyperplane and the closest points of each class (the Support Vectors). Performs well in high dimensions, handles non-linear data via the Kernel Trick.

T

Technology Assisted Review (TAR) / Predictive Coding: A technique widely used in e-Discovery, based on machine learning (usually supervised). Core process: human lawyers review and code a small seed set of documents for relevance; an AI model is trained on these codes; the model then predicts relevance scores for the remaining large volume of unreviewed documents; documents are ranked by predicted relevance, prioritizing the most likely relevant ones for human review. TAR can significantly reduce manual review volume, improving efficiency and cost-effectiveness. (See Section 5.3)
Temperature: A sampling parameter used in generative AI models (esp. LLMs) to control the randomness or creativity of the output text. Lower temperature values (near 0) make the model favor the highest probability next token, resulting in more deterministic, focused, conservative text. Higher temperature values (>1) increase the likelihood of sampling lower probability tokens, making the output more random, diverse, creative (but potentially less coherent or relevant). Tuning temperature balances predictability and creativity.
Tensor Processing Unit (TPU): Application-Specific Integrated Circuit (ASIC) hardware designed by Google specifically to accelerate machine learning computations (esp. large-scale tensor/matrix operations in deep learning). Similar to GPUs, TPUs use massive parallelism but are hardware-optimized for neural network ops (esp. matrix multiplication, convolution), often offering higher performance per watt for certain AI tasks. Primarily available via Google Cloud.
Token: In NLP, typically the smallest meaningful unit into which text is divided for processing and generation by models. Can be a word, a subword (e.g., “lawyer” -> “law”, “##yer”), or sometimes a character. The process is Tokenization. LLM context window lengths and API costs are usually measured in tokens. Understanding tokens is crucial for using LLMs.
Training Data: The dataset used to train a machine learning model. The model learns patterns and relationships from this data to perform its task. The quantity, quality, representativeness, and (for supervised learning) label accuracy of training data critically determine the final model’s performance and reliability.
Transformer: A revolutionary deep learning model architecture introduced by Google researchers in 2017 (“Attention Is All You Need”). It completely abandons recurrence (RNNs) and convolution (CNNs) traditionally used for sequences, relying entirely on the powerful Self-Attention Mechanism. Transformers effectively capture long-range dependencies in sequence data and are highly parallelizable, enabling efficient training on massive datasets. They transformed the field of NLP and are the foundation for nearly all modern LLMs and many advanced models in other modalities (vision, speech). (See Section 2.4)
Transfer Learning: An important ML strategy. Core idea: leverage knowledge/capabilities learned by a model pre-trained on a source task (usually with abundant data) and apply them to a different but related target task (which might have limited data). Typically involves reusing part or all of the pre-trained model’s structure/parameters and fine-tuning on target task data. Transfer learning significantly reduces data/time needs for the target task and improves performance. Core to the Foundation Model application paradigm.
Transparency: The degree to which an AI system’s internal mechanisms, data usage, decision processes, and related performance/risk information are visible, understandable, and accessible. Essential for building trust, enabling accountability, effective regulation, but challenging for complex AI models. (See Section 6.4)
Turing Test: A famous thought experiment proposed by Alan Turing in 1950 to provide an operational criterion for whether a machine can “think.” Basic form: a human judge communicates via text with both a human and a machine. If the judge cannot reliably distinguish which is which after sufficient time, the machine is said to have passed the Turing Test, exhibiting intelligent behavior indistinguishable from a human. Influential for AI development, but debated as a true measure of intelligence (focuses on imitation, not understanding).

U

Unsupervised Learning: One of the three main ML paradigms. Algorithm learns from data without any “labels” or “correct answers.” Goal is to discover inherent structures, patterns, associations, or distributions within the data itself, e.g., grouping similar data points (Clustering), reducing data dimensionality (Dimensionality Reduction), or finding frequent co-occurring items (Association Rule Mining). (See Section 2.2)
User Interface (UI): The medium through which human users interact and communicate with a computer system or software. Can be graphical (GUI: windows, buttons), command-line (CLI), voice-based (VUI), or natural language-based (LUI: chat interface). Good UI design is crucial for AI tool usability.
User Experience (UX): A person’s overall perceptions, feelings, and responses resulting from using a product, system, or service (e.g., legal AI software). Encompasses not just interface aesthetics or functionality, but also task flow smoothness, ease of use, alignment with expectations, and overall satisfaction or frustration with the interaction. Considering UX is important in AI tool selection and design.

V

Validation Data / Validation Set: A separate dataset (distinct from training and test sets) used during ML model training. Not used to directly train model parameters (weights), but to evaluate model performance at different training stages for purposes like tuning Hyperparameters (learning rate, network size, regularization) and monitoring for Overfitting to decide when to apply Early Stopping. Crucial for building models that generalize well.
Vector Database / Vector Store: A type of database system specifically designed for efficiently storing, indexing, and querying high-dimensional vector data (like embeddings generated by AI). Core capability is performing fast Approximate Nearest Neighbor (ANN) search—quickly finding vectors most similar (e.g., closest distance) to a given query vector within a massive collection. Key infrastructure for large-scale semantic search, recommendation systems, and Retrieval-Augmented Generation (RAG).
Vocoder: In Text-to-Speech (TTS), the component that synthesizes the final, audible, 1D audio waveform from intermediate acoustic feature representations (e.g., Mel-spectrograms) generated by the acoustic model. Vocoder quality directly impacts the naturalness and fidelity of the synthesized speech. Modern Neural Vocoders have greatly improved TTS quality. (See Section 2.6)

W

Word Embedding: An important Embedding technique in NLP. Maps each word in a vocabulary to a dense, low-dimensional, continuous vector space such that words with similar meanings or usage patterns have nearby vectors. Captures semantic relationships (unlike traditional One-hot Encoding) and serves as foundational input representation for many deep learning NLP models. Famous methods include Word2Vec, GloVe, FastText.

X

Explainable AI (XAI): A major research area and practical direction in AI focused on developing techniques and methods that enable humans (developers, users, regulators, affected individuals) to understand, interpret, and trust the predictions or decisions made by AI systems, especially complex “black box” models. Aims to answer “Why did the AI reach this result?” to enhance transparency, fairness, accountability, and human-AI collaboration. (See Section 6.4)

Z

Zero-Shot Learning: See “Few-Shot Learning / Zero-Shot Learning”.

(This glossary aims to provide quick reference for core concepts. Readers are encouraged to consult the main text sections for deeper understanding. As technology and applications evolve, new terms will emerge, and this glossary will strive for continuous updates.)