Have you ever thought about a question, what is the difference between LLM and children?
You might say, LLM has so many training data sets and has gone through so many fine-tunings, but it still doesn’t kill children in all aspects?
However, a recent UC Berkeley paper showed that compared with children, LLM lacks a very important ability.
That is – the ability to learn causal structure from experience.
Of course, researchers are not without tricks. RLHF can solve this problem to some extent. But the logic of solution is completely different from children’s learning.
LeCun also retweeted this study and wrote, “Those things that children can do, but LLM can’t.”
Is LLM anthropomorphic?
First, we know that discussions about large-scale language models and language and vision models mainly focus on whether these models are agents.
Researchers at UC Berkeley put forward a different perspective.
They believe that these AI models are efficient and powerful imitation engines.
They then explored what AI models can inspire researchers in terms of imitation and innovation by testing whether these AI models can discover new tools and novel causal structures, and comparing their responses to instructions with human children.
Many people said that these LLMs are just one agent after another. s, text, everything can be generated, how clever it is.
They even hinted at this kind of anthropomorphic praise in their spoken expressions – “an” AI, just like we say a person.
Researchers at UC Berkeley think that would be a mistake.
LLM is like those technologies we have seen in history, such as writing, printing, libraries, the Internet, and even language itself.
Large language and vision models provide a new way to easily and efficiently access large amounts of text written by others and images generated by others.
In other words, these AI systems provide a new means for cultural production and evolution, allowing information to be efficiently transferred between different groups. They aggregate large amounts of information previously generated by human agents and extract patterns from it.
Therefore, it is not anthropomorphic.
This contrasts with perceptual and action systems that intervene in the external world and generate information about it. In other words, the human pattern.
It should be noted here that this contrast is not limited to the perception and action systems themselves, but also includes the causal relationships embodied in scientific or intuitive theories. They relate to the external world and make predictions about that world and influence actions in that world.
At the same time, new evidence obtained later from this external world can fundamentally modify previous causal relationships.
Of course, these truth-seeking cognitive processes are also the basis of some AI systems. For example, reinforcement learning systems, especially model-based systems, can be understood as systems that take actions in the world to solve something like an inverse problem.
They accumulate data to build models of the world, enabling broad and novel generalizations. This is especially true in the field of robotics, where these systems interact with the outside world and change their models, allowing new actions and generalization, albeit to a limited extent.
Similarly, some AI methods have integrated causal inference and theory formation into their learning mechanisms to design more human-like systems.
However, these systems differ significantly from the relatively simple, large-scale language and vision models we are familiar with, which rely on large amounts of existing data.
Cognitive processes that seek truth and processes that faithfully convey representations will always be at odds with each other, regardless of their relationship to the external world. This transmission is critical for abilities such as language learning and social coordination.
Currently, researchers have abundant evidence that this mechanism of faithful transmission already exists in early development and plays a particularly important role in human cognition and culture.
However, these mechanisms may also have some subtle relationship with mechanisms of truth-seeking causal inference and theory formation, for both good and bad reasons.
For example, in the phenomenon of “overimitation”, human children (and adults) reproduce all the details that occurred in a complex sequence of actions, even if these details are not causally related to the outcome of the action. .
Over-imitation may increase the fidelity and efficiency of complex action delivery. However, this also means that the transmission is not rooted in a causal understanding that is modified by changes in the environment. There is also evidence that children accept other people’s views of the world without critical reflection and change their views only if and only when they are confronted with another person’s different views.
This point is quite resonant. An analogy is that children start out with a blank piece of paper and draw whatever they want. Only with new knowledge will the original colors be overwritten.
The researchers believe that large language models powerfully facilitate this type of transfer by summarizing and generalizing from existing text.
However, their training process or objective function is not designed to perform any cognitive functions of truth-seeking systems such as perception, causal inference or theory formation.
Even the most advanced LLMs, their output predicted probabilities do not distinguish epistemic uncertainty (this is actually related to the lack of knowledge and can be solved with more training data), and Aleatoric uncertainty.
This brings up the problem of “illusion”.
This contrast between communication and objective truth is closely related to the contrast between imitation/innovation in the evolution of human culture. The evolution of culture depends on the balance between these two different cognitive mechanisms, while imitation allows knowledge or skills to be transferred from one person to another; innovation allows new knowledge or skills to be generated through contact with a changing world.
Simply put, imitation means that each individual does not have to innovate – they can simply exploit the knowledge of others. But if some individuals do not have the ability to innovate, imitation alone will be useless. That is to say, it is the combination of innovation and imitation that can achieve cultural and technological progress.
Of course, imitation and transmission may also involve certain kinds of generalization and novelty. LLM also produces similar generalizations, sometimes generalizing from known actions to produce some kind of innovation.
However, if you want to output innovation that is sufficient to deal with new problems and new environments, LLM needs to go beyond the information obtained and what can be inferred from this given information. These inferences may start from existing causal models, generate new causes and effects that are very different from previously observed causes and effects, or may inspire new explorations of the external world.
From an AI perspective, imitation involves a kind of interpolative generalization, where skills and knowledge are exploited, simulated, and shared in a variety of contexts, within a known range.
Innovation, on the other hand, reflects a more extrapolative, or out-of-distribution, generalization.
But it’s not easy to determine which cognitive mechanism produces a particular type of representation or behavior, knowledge, or skill in any given situation.
If an LLM trained only on internal language statistics can replicate specific abilities, such as generating grammatically correct text in response to prompts, this suggests that such abilities can develop through imitation. But if not, that means those capabilities may require innovation, i.e. extracting knowledge from the outside world.
Therefore, LLM and large-scale visual models provide researchers with an opportunity to discover which capabilities require imitation and which require innovation. This is also a long-standing problem in cognitive science.
LLM V.S CHILDREN
The researchers compared the performance of LLM models trained on large amounts of text data or text and image data with the performance of children (weird to say that, haha).
The researchers found that LLM’s imitation may differ from children’s imitative behavior in important ways.
For children, there is much debate in the existing literature about how much of our childhood imitation is faithful cultural transmission (i.e., excessive imitation) and how much is driven by broader truth-seeking processes. , such as understanding other people’s goals and intentions.
Whether LLM can innovate depends on whether it can innovate new tools.
People can discover and create new tools, so tools are one of the best examples of solving the balance problem between imitation and innovation. Technologies in the fields of AI and robotics, such as “behavioral cloning,” use similar methods.
However, it needs to be emphasized again that imitation, and the ability to use existing tools in an interpolative way, depends on the parallel ability to discover new tools in an extrapolative way.
Tool innovation is an integral part of human life and is also observed in a variety of non-human animals, so tool innovation is often considered a distinctive sign of intelligence in biological systems.
Then, tool use is also an important comparison point for understanding LLM and children’s imitation and innovation.
Both LLMs and humans can encode information about objects, but their abilities in tool imitation versus tool innovation may differ. The researchers predict that these models may well capture familiar tool use (such as a hammer).
However, these systems struggle to generate correct feedback when it comes to unusual or novel tools, which rely on discovering and using new causal connections, functional analogies, and applicability.
But are children able to innovate on their own? Do you need clear guidance and experience?
The reality is that building a new tool from scratch can be a difficult task even for children. However, children may more easily recognize novel functions in everyday objects and select appropriate substitutes to solve various tasks when typical tools are not available.
In the study, the researchers investigated whether human children and adults are able to use familiar objects in new ways to achieve specific outcomes and compared the results with the output of large deep learning models such as GPT-3 and GPT-4. A comparison was made.
The study has two components: an imitation component (interpolative judgments based on existing knowledge of a known object) and an innovation component (extrapolative judgments about new ways in which the object can be used).
In the innovation section, the researchers asked a series of problems that required performing a goal without typical tools (e.g., draw a circle without a compass).
The researchers then provided participants with alternative item choices:
(a) An item that is more similar to a typical tool but is irrelevant to the context (say, a ruler).
(b) items that look different on the surface but have the same fitness and causal properties as typical tools (e.g., a teapot with a rounded base).
(c) Completely unrelated items.
In the imitation portion of the study, the researchers presented the same set of items but asked participants to choose which item option best matched the typical tool.
The researchers found that children and adults aged 3 to 7 years (mean age = 27.80 years, standard deviation = 5.54) could identify common surface relationships between objects when asked which items should be placed together.
At the same time, they can also discover new functions of everyday objects to solve novel problems, and therefore also choose seemingly unrelated but functionally related items.
Next, using the exact same setup as the text input from human participants in the test, the researchers wanted to look at OpenAI’s GPT-4, Gpt-3.5-turbo, and text-davinci-003 models, as well as Anthropic’s Claude, Google’s FLAN -How the T5 (XXL) performs.
Because the researchers noticed that the models changed their output depending on the order of options, they ran the models six times for each scenario, taking into account the six different orders generated by the three options.
The researchers set the model output to deterministic, the temperature to 0, and kept all other parameters at their default values. The researchers then averaged the scores (1 for selecting the relevant object and 0 for selecting other responses) across the six repeated trials.
As predicted, the researchers found that these LLMs were nearly as capable as humans at identifying surface commonalities between objects.
They showed sensitivity to surface associations between objects and performed well in imitation tasks (GPT-4 average 83.3%, GPT-3.5-turbo average 73.1%, davinci average 59.9%, Claude average 69.9%, Flan average 74.8 %).
However, when they were asked to choose a new functional tool to solve the problem, they were not as capable as humans (GPT-4 averaged 75.9%, gpt-3.5-turbo averaged 58.9%, davinci averaged 8.87%, Claude averaged 58.16%, Flan average 45.7%).
This suggests that simply learning from a large number of languages may not be enough to achieve tool innovation.
Unfortunately, the charts related to this study have not been made public.
So, can LLM discover new causal relationships and exploit them to design new tools? We have repeatedly mentioned that the ability to discover new tools depends on being able to infer new causal relationships.
Numerous studies show that even very young children are good at detecting this relationship.
Because information about causal structures can be transferred through imitation and cultural transmission. Causal discovery is a good example of how a cognitive process can solve inverse problems and discover new truths through perception and action.
The latest versions of GPT, GPT-4 and GPT-3.5, are fine-tuned through reinforcement learning from human feedback.
This is also problematic. Reinforcement learning from human feedback may itself be considered a method of enabling cultural transmission, which is half cheating, LoL.