05. Nov. 2024
Prof. Richard Sutton, Uni Alberta, has noticed in an essay "The bitter lesson" (2019), that a universal methodology, which is applicable for computation scaling, consistently trumps any human-invented sophistication and one-off cleverness. An extrapolation leads to the conclusion, that humans are not competitive and soon to be obsolete, as the proponents of singularity stipulate. Humans are closing up their bitter end by advancing AI.
As a techno optimist, Sutton extrapolates the lessons, drawn from several specialized cognitive functions, on copying the complete mind. It would be futile to mimic mind architectures. Instead he proposes to bet on sheer computation brute force.
There is a catch tough. Rich has presented examples, which have a constraint. Within a constrained task the computational efficiency does not matter, an inefficient universal algorithm can quickly grow up to the task. Quick learning (he probably meant generalization) and search will outperform slow decision making. Learning chess within rules and board constraints is a combinatorial task. A similar task is playing Go. The space of human language has broader delimiters, but is still a combinatorial task that can be mastered by a pattern generalization. Vision is attributing shapes to the same vocal generalizations, and therefore is approximatively constrained.
Open ended tasks, like exploration and getting hold of constrained systems, that are much larger than any attainable compute capacity allows to represent, would remain out of rich for brute force. Good example is information bottleneck for LLM training. Pablo Villalobos and colleagues have published a paper "Position: Will we run out of data? Limits of LLM scaling based on human-generated data" (2024). They indicate that text corpus constraints are real. Among the principled responses are:
synthetic data generation,
reduction of tokens/parameter ratio.
It is clear that synthetic data would not curry us further without an extra effort. A synthetically interpolated data brings neither novelty, nor precision. The first experiments show that models trained on synthetic data do collapse. A synthetically extrapolated data requires validation. Its extrapolative power is proportional to the generalized patterns and has the same grain size as interpolative, potentially lesser than doubling the system. That is a linear improvement. Brute force requires a power function to defend its point.
Better data utilization and the power function enablement require reduction on tokens/parameter ratio. Currently it is 100x to 20x. Pablo stipulates it could be brought down to 5x for ASI. Humans can draw conclusions on as many as 2x data samples. This is where computational efficiency plays a huge role. Navigating kinky, potentially lethal, environments requires efficiency. If substantial material loss counters any computation growth, the cleverness comes again to the forefront. Little stake (human body) will win over big stakes (humongous data centers and energy facilities). AI will need human efficiency to push frontiers, to resolve unconstrained tasks.
A cooperation is human's future, if we embrace the competitive advantage of resolving open ended, unconstrained tasks; embrace inventiveness and capability to fathom concepts on minimal data sets. Do you get it, or need a refrain exposed by 5 other articles?