2.1 AI Principles and Development History

The Origins and Evolution of AI Thinking: From Philosophical Speculation to Computational Realization

Artificial Intelligence (AI), the technological concept currently sparking immense imagination and profound change, did not appear magically out of thin air. Its birth and development represent a grand narrative spanning centuries, interwoven with philosophical exploration, mathematical foundations, engineering practices, and commercial ups and downs. Tracing its intellectual origins and evolutionary path is not merely about satisfying curiosity for legal professionals in this intelligent era. It helps dispel the mystical aura surrounding the technology and fosters a deeper understanding of modern AI’s capability boundaries, internal logic, potential risks, and possible future trajectories.

This section will take readers on a journey through time, recounting AI’s long march from ancient dreams to computational reality.

I. Sparks of Thought: Philosophical and Logical Foundations (Before Mid-20th Century)

Long before the advent of computers, contemplation on the nature of “intelligence” resonated within the halls of philosophy.

Ancient Philosophical Inquiry: As early as Ancient Greece, pioneers like Aristotle studied formal logic and rules of inference, attempting to symbolize and regularize rigorous human thought processes. This was the earliest attempt to deconstruct the seemingly mysterious process of “thinking.” In modern times, René Descartes’ “Cogito, ergo sum” (“I think, therefore I am”) sparked profound discussions on the mind-body relationship, while Gottfried Wilhelm Leibniz dreamed of creating a “universal language” and a “calculus ratiocinator,” hoping to resolve all disputes through computation. These philosophical speculations about whether thought could be formalized and whether machines could possess intelligence planted the intellectual seeds for AI’s eventual emergence.
Forging the Language of Mathematics: For machines to simulate thought, a precise language describing thought processes was needed first. From the 19th to the early 20th century, George Boole’s Boolean algebra, the modern predicate logic developed by Gottlob Frege, Bertrand Russell, and others, along with the formalism championed by David Hilbert, collectively laid the solid mathematical foundation for formal logic and mathematical logic systems. This work allowed complex reasoning processes to be expressed using rigorous mathematical symbols and explicit rules of inference, paving the way for later simulation of logical reasoning via computer programs.
Birth of Computation Theory: Alan Turing, the father of computer science and artificial intelligence, proposed the abstract model of the “Turing Machine” in 1936. This not only theoretically defined what constitutes a “computable” problem but also provided the core blueprint for the design philosophy of modern general-purpose computers. Even more significantly, in his landmark 1950 paper “Computing Machinery and Intelligence,” Turing boldly proposed the famous “Turing Test.”

Turing’s work, along with contributions from other pioneers like John von Neumann on computer architecture, formed the theoretical and engineering bedrock upon which AI research could begin.

II. The Birth of AI and the Early “Golden Age” (1950s - 1970s)

With the advent of modern computers, the conditions for putting philosophical speculation and logical theory into practice matured, and AI officially emerged as an independent academic field.

Dartmouth Workshop: AI’s “Inauguration” (1956): A historic workshop held at Dartmouth College in the summer of 1956 is widely recognized as the birth of AI as an independent discipline. Visionary scientists including John McCarthy (who coined the term “Artificial Intelligence” in the proposal for this workshop), Marvin Minsky, Claude Shannon (father of information theory), and Herbert Simon (later a Nobel laureate in Economics) gathered. Filled with immense enthusiasm and optimism, they discussed the feasibility of using machines to simulate human learning, thinking, decision-making, and other intelligent behaviors, establishing the grand goal of AI research—“making machines think like humans.” This conference not only defined AI’s name but also outlined its early research landscape, ushering in AI’s first “Golden Age.”
Symbolicism: Intelligence as Symbol Manipulation (Symbolicism / GOFAI): The dominant paradigm in early AI research was Symbolicism, also known as “Good Old-Fashioned AI” (GOFAI). Its core philosophical belief was that intelligent human behavior essentially involves representing information about the external world symbolically and manipulating these symbols according to logical rules (reasoning). Therefore, intelligence could be replicated on a computer if appropriate symbolic representations and powerful logical reasoning engines could be developed. This idea resonated strongly with the intuitive human experience of thinking through language, mathematics, and other symbolic systems.

Milestones of Early Symbolicism

Guided by symbolic thinking, early AI researchers achieved a series of remarkable successes, significantly boosting confidence in AI’s potential:

Logic Theorist (1956): Developed by Allen Newell, Herbert Simon, and Cliff Shaw, widely considered the first true AI program. It successfully proved 38 theorems from Chapter 2 of Russell and Whitehead’s Principia Mathematica, demonstrating machines’ capability for logical reasoning.
General Problem Solver (GPS, 1959): Also developed by Newell and Simon, GPS attempted to mimic the general strategies humans use to solve various problems, particularly “means-ends analysis”—comparing the current state with the goal state and selecting operations that reduce the difference. Although its “generality” was limited, it represented an early exploration of general intelligence mechanisms.
Early Natural Language Understanding Attempts: Terry Winograd’s SHRDLU system (1972) was representative. Operating in a virtual micro-world of blocks (“Blocks World”), it could understand user commands in natural language (e.g., “Put the red block on top of the blue pyramid”), plan and execute corresponding actions, and answer questions about the world’s state. This demonstrated AI’s potential for language processing and planning/reasoning in constrained environments.

Expert Systems: The Glory and Limitations of Rules

The most significant commercial application resulting from symbolic thinking was Expert Systems, which emerged in the 1970s and peaked in the 1980s.

Core Idea: Expert systems attempted to encode the knowledge, experience, and reasoning processes of human experts in specific domains (like medical diagnosis, chemical analysis, equipment troubleshooting, and even legal consultation) into numerous “IF condition THEN conclusion/action” rules. The system comprised a large knowledge base (storing rules and facts) and an inference engine (performing logical deductions based on input information and rules in the knowledge base to reach conclusions or recommendations).
Early Exploration in Law: Expert systems seemed naturally suited to certain aspects of legal reasoning, particularly judgments based on explicit statutory provisions. Researchers developed early legal expert system prototypes, such as:
- Systems simulating tax experts determining an individual’s tax residency status.
- Systems assisting in drafting or reviewing simple contract clauses (e.g., leases).
- Systems providing advice based on specific regulations (e.g., welfare eligibility). These systems demonstrated some value in providing consistent, transparent decision support for structured, rule-based legal problems. Their ability to clearly show their reasoning steps was particularly important for the law, which demands explainability.

Although symbolic expert systems did not achieve widespread adoption in law, their attempts left valuable lessons and insights:

Feasibility Demonstrated: Proved that AI assistance is viable for specific legal tasks that are rule-based and logical.
Core Challenges Exposed:
- Knowledge Acquisition Bottleneck: Formalizing the vast, complex, ambiguous, and sometimes internally conflicting body of legal knowledge (especially principles, policy considerations, and value judgments in case law) into precise IF-THEN rules proved to be an extremely difficult, time-consuming, and costly “knowledge engineering” problem.
- Lack of Common Sense and Context: Legal application relies heavily on understanding social common sense, cultural context, and specific case circumstances, which purely rule-based systems struggled with.
- Poor Learning and Adaptability: Expert systems found it hard to automatically learn from new precedents, regulatory changes, or practical experience, making maintenance costly.
- Handling Ambiguity and Exceptions: Legal language is inherently ambiguous, polysemous, and full of exceptions, which rigid rule systems handled poorly. These fundamental limitations highlighted that achieving truly powerful legal AI solely through the symbolic path was unlikely, setting the stage for the later rise of machine learning approaches.

III. Winter Arrives & New Paths Emerge: The Rise and Fall of Connectionism (Late 1970s - 1980s)

Despite the early successes and media hype, researchers had made many overly optimistic predictions (e.g., suggesting general AI could be achieved within decades). Reality soon proved that intelligence was far more complex than anticipated.

Expectations Dashed & First AI Winter: By the mid-to-late 1970s, it became clear that early AI systems often only worked on very narrow, highly simplified “toy problems.” Confronting real-world complexity, challenges like combinatorial explosion (computation growing exponentially with problem size) and common sense knowledge representation (how to endow machines with self-evident human knowledge) proved insurmountable. High-profile projects (like early machine translation initiatives) fell far short of expectations. These difficulties led to drastic cuts in research funding, and public/industry interest cooled, marking the first “AI Winter.”
Resurgence of Connectionism & Parallel Exploration: While Symbolicism faced bottlenecks, another long-standing but previously quieter technical approach—Connectionism—began regaining attention. Unlike Symbolicism’s top-down attempt to simulate high-level cognitive functions, Connectionism drew direct inspiration from simulating the structure of biological neural networks in the brain.

Early Foundations: Warren McCulloch and Walter Pitts proposed the first formal model of a neuron (the MP model) in 1943, proving simple neural networks could perform logical operations. Frank Rosenblatt invented the Perceptron in 1958, a learnable single-layer neural network model capable of solving simple pattern classification problems, sparking the first wave of neural network research.
Major Setback: In 1969, AI pioneers Marvin Minsky (himself leaning towards Symbolicism) and Seymour Papert published the book Perceptrons, which rigorously proved that single-layer perceptrons couldn’t solve fundamental problems (like the classic XOR problem) and pessimistically inferred that multi-layer networks would also be difficult to train. This book dealt a heavy blow to neural network research at the time, plunging it into a slump for over a decade.
Key to Revival: The Backpropagation Algorithm: The resurgence of neural network research owed much to the rediscovery and popularization of the Backpropagation (BP) algorithm. Although its core idea dates back earlier (e.g., Paul Werbos’s 1974 PhD thesis), it wasn’t until 1986, when David Rumelhart, Geoffrey Hinton, and Ronald Williams published a paper in Nature systematically explaining how BP could effectively train multi-layer neural networks (also known as Multi-Layer Perceptrons, MLPs), that the flames of the Connectionist revival were truly ignited. Backpropagation solved the problem of efficiently adjusting connection weights throughout the network based on output error, making it feasible to train deeper, more complex neural networks and laying the crucial algorithmic foundation for the later deep learning revolution.

IV. Another Chill: Expert System Bubble & Stealthy Progress of Machine Learning (Late 1980s - Early 1990s)

Despite the hope brought by backpropagation, AI’s overall development was not smooth. The commercial wave of expert systems in the 1980s ultimately proved unsustainable.

Bursting of the Expert System Commercial Bubble: Encouraged by early successes, numerous companies invested heavily in developing and deploying expert systems in the 1980s, hoping to replace or augment expensive human experts. Various expert system “shells” and development tools emerged. However, the boom was short-lived as the inherent flaws of expert systems became apparent:
- High Maintenance Costs: Knowledge bases needed constant updates to reflect domain changes, but manually maintaining rules was difficult and expensive.
- Knowledge Acquisition Bottleneck Remained: Even with better tools, extracting and formalizing knowledge from experts’ heads remained the biggest hurdle.
- Limited Application Domains: Expert systems typically performed well only in highly specialized, clearly defined domains and were hard to scale to broader, fuzzier problems.
- Integration Difficulties: Often standalone systems, difficult to integrate with existing corporate information systems.
- Overly High User Expectations: Marketing often exaggerated the systems’ actual capabilities. These problems led to many expert system project failures, eroding market confidence and causing investment to plummet. The AI field entered its second, longer “winter.” Many former AI star companies went bankrupt or were forced to pivot.
Steady Advance of Machine Learning: Against the backdrop of AI’s damaged reputation and tight funding, Machine Learning (ML), as a key branch of AI, did not completely stagnate during this period. Instead, it focused more on practicality and mathematical rigor. Researchers developed and refined a range of algorithms capable of automatically learning patterns from data, without relying on large-scale manual knowledge encoding. Examples include:
- Decision Trees: Such as ID3, C4.5 algorithms.
- Support Vector Machines (SVM): Proposed by Vapnik and others in the 1990s, becoming one of the most powerful classification algorithms of the time.
- Bayesian Networks: Used for reasoning under uncertainty. These ML algorithms began finding practical applications in Data Mining, Pattern Recognition, and Statistical Learning. Although not pursuing general intelligence like early AI, they accumulated important techniques and methodologies for AI’s later resurgence.

V. Modern Renaissance: Convergence of Big Data, Computing Power & Algorithms (Early 21st Century - Present)

Entering the 21st century, especially after 2010, the field of AI experienced an unprecedented explosive growth, entering a new “spring,” driven by the confluence of three key factors:

Big Data: The “Fuel” for Intelligence: The popularization of the internet, the rise of social media, the widespread adoption of mobile smart devices, and the development of the Internet of Things (IoT) led to society generating massive amounts of diverse data (text, images, video, audio, sensor data, etc.) at an unprecedented speed and scale. This Big Data provided ample “fuel” for machine learning algorithms, especially deep learning models, which need to learn patterns from vast numbers of examples. Without sufficient data, even the most powerful algorithms struggle.
Computing Power Revolution: From CPU to GPU/Cloud Computing: Training machine learning models, particularly deep learning ones, requires extremely intensive computation. Traditional Central Processing Units (CPUs), while versatile, are inefficient for large-scale parallel computations. Providentially, Graphics Processing Units (GPUs), originally designed for video games, were found to possess thousands of small computing cores, making them ideally suited for the massive matrix and vector operations involved in neural network training. GPUs dramatically reduced the training time for deep learning models (from weeks or months to days or even hours), making it feasible to train larger, deeper, and more complex models. Furthermore, the rise of Cloud Computing platforms allowed researchers and developers to access powerful computing resources on demand at relatively low costs, further lowering the barrier to AI R&D.
Algorithmic Breakthroughs: The Era of Deep Learning & Transformers:
- The Return of Deep Learning (c. 2006 onwards): Work by Geoffrey Hinton, Yann LeCun, Yoshua Bengio (who later shared the 2018 Turing Award), particularly Hinton’s team’s introduction of Deep Belief Networks (DBNs) and effective unsupervised pre-training methods, successfully addressed the long-standing problem of effectively training deep neural networks (those with many layers), such as vanishing/exploding gradients. This marked the true rise of Deep Learning.
- Convolutional Neural Networks (CNNs) achieved revolutionary breakthroughs in image recognition. For instance, in 2012, the AlexNet model (developed by Hinton’s student Alex Krizhevsky et al.) won the prestigious ImageNet Large Scale Visual Recognition Challenge with accuracy far exceeding traditional methods, stunning the academic and industrial worlds and initiating the golden age of deep learning in computer vision.
- Recurrent Neural Networks (RNNs) and improved versions like Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) demonstrated powerful capabilities in processing sequential data (like speech recognition, machine translation, natural language text).
- Transformer Architecture & the Wave of Large Language Models (c. 2017 onwards): The Transformer model, introduced by Google in the 2017 paper “Attention Is All You Need,” and its core Self-Attention Mechanism, completely changed the landscape of Natural Language Processing (NLP). The Transformer architecture better captures long-range dependencies in text and is highly parallelizable. Based on this architecture, researchers were able to train extremely large Pre-trained Language Models (PLMs) with billions to trillions of parameters, such as Google’s BERT, OpenAI’s GPT series (GPT-3, GPT-4), Meta’s Llama series, etc. These Large Language Models (LLMs) achieved astonishing performance on nearly all NLP tasks and fueled the current Generative AI (GenAI) wave, enabling fluent conversation, article writing, code generation, image creation, and more.
- Major Advances in Reinforcement Learning (RL): DeepMind’s AlphaGo program, combining deep learning, reinforcement learning, and Monte Carlo tree search, defeated world champion Go player Lee Sedol in 2016. This became another landmark event in AI history, showcasing the immense potential of Deep Reinforcement Learning (Deep RL) for complex decision-making and strategic planning problems.

VI. The Contemporary Mainstream: An Era Led by Machine Learning

Looking back at AI’s history, it’s clear that while symbolic ideas and methods (like rule engines, knowledge graphs) still hold a place and continue to develop in specific scenarios (e.g., domains requiring high transparency, strong logical guarantees, or leveraging existing structured knowledge), the mainstream technological paradigm in AI today has undeniably shifted towards methods centered around Machine Learning, particularly Deep Learning.

The powerful capabilities demonstrated by modern AI systems in handling complex, high-dimensional, unstructured real-world data (like natural language text, images, sound) are primarily attributed to their ability to automatically learn features and patterns from massive data, rather than relying on humans hand-crafting exhaustive rules. This has expanded the application domains of AI like never before.

VII. Echoes of History: Insights for Legal Professionals

Understanding the long and winding journey of AI from philosophical speculation to computational reality is not just about satisfying curiosity for legal professionals; it holds significant practical meaning:

Demystification and Rational Understanding: AI is not “magic” that appeared overnight but the culmination of wisdom from generations of scientists, with a history full of peaks, troughs, expectations, and reality checks. Understanding this helps us view the current AI hype rationally. We should neither excessively deify its capabilities nor fall into unnecessary panic, but recognize the phased nature of technological development and its inherent limitations.
Insight into Technical Essence: Understanding the core ideas, strengths, and weaknesses of different technical paradigms like Symbolicism, Connectionism, Machine Learning, and Deep Learning helps legal professionals better comprehend the underlying logic of various AI legal tools on the market. For example, distinguishing between a rule-based expert system and a deep learning-based document review tool allows for more accurate assessment of their suitable scenarios, reliability, interpretability, and potential risks.
Grasping the Developmental Context: AI history is full of paradigm shifts and breakthroughs. Knowing the past challenges (like the knowledge acquisition bottleneck, common sense reasoning problem, AI winters) and key drivers of progress (data, computing power, algorithms) can help us more keenly anticipate the future direction of AI evolution. This allows us to contemplate its long-term impact on legal service models, professional ethics, and even the legal system itself, enabling us to better embrace change and navigate challenges.

In the next section, we will focus on the core engine of modern AI—Machine Learning—delving into its fundamental principles, major types, and key concepts, building a more solid foundation for legal professionals to understand current mainstream AI technologies.