I find the OP very difficult to comprehend, to the point that I question whether it has content at all. One difficulty is in understanding their use of the word "embedding", defined (so to speak) as "internal representations (embeddings)", and their free use of the word to relate, and even equate, LLM internal structure to brain internal structure. They are simply assuming that there is a brain "embedding" that can be directly compared to the matrix of numerical weights that comprise an LLM's training. That seems a highly dubious assumption, to the point of being hand-waving.
They mention a profound difference in the opening paragraph, "Large language models do not depend on symbolic parts of speech or syntactic rules". Human language models very obviously and evidently do. On that basis alone, it can't be valid to just assume that a human "embedding" is equivalent to an LLM "embedding", for input or output.
> They are simply assuming that there is a brain "embedding" that can be directly compared to the matrix of numerical weights that comprise an LLM's training.
If there were no such structure, then their methods based on aligning neural embeddings with brain "embeddings" (really just vectors of electrode values or voxel activations) would not work.
> They mention a profound difference in the opening paragraph, "Large language models do not depend on symbolic parts of speech or syntactic rules". Human language models very obviously and evidently do. On that basis alone, it can't be valid to just assume that a human "embedding" is equivalent to an LLM "embedding", for input or output.
This feels like "it doesn't work the way I thought it would, so it must be wrong."
I think actually their point here is mistaken for another reason: there's good reason to think that LLMs do end up implicitly representing abstract parts of speech and syntactic rules in their embedding spaces.
>They mention a profound difference in the opening paragraph, "Large language models do not depend on symbolic parts of speech or syntactic rules. "Human language models very obviously and evidently do.
Honestly do they ? To me, they clearly don't. Grammar is not how language works. It's useful fiction. Language even in humans seems to be a very statistical process.
Yes! As somebody who speaks 2 languages, and sort of reads/understands 2 more, I cannot agree more. Human spoken languages do not follow any grammars. Grammars are just simplified representations of reality that is probabilistic in nature.
This is something that Chomsky got very wrong, and the statistical/ML crowd got very right.
Languages definitely follow grammars. They don't follow the grammars that were written by observing them, but you can discover unwritten grammatical structures that are nevertheless followed by everyone who speaks a language, and who if asked wouldn't even be able to articulate the rules that they are following. It's the following that defines the grammar, not the articulation of the rules.
Statistics are just another way to record a grammar, all the way down to the detail of how one talks about bicycles, or the Dirty War in Argentina.
If a grammar is defined as a book that enumerates the rules of a language, then of course language doesn't require following a grammar. If a grammar is defined as a set of rules for communicating reasonably well with another person who knows those same rules, then language follows grammars.
But it's the other way around! Grammars follow languages. Or, more precisely, grammars are (very lossy) language models.
They describe typical expectations of an average language speaker. Grammars try to provide a generalized system describing an average case.
I prefer to think of languages as a set of typical idioms used by most language users. A given grammar is an attempt to catch similarities between idioms within the set and turn 'em into a formal description.
A grammar might help with studying a language, and speed up the process of internalizing idioms, but the final learning stage is a set of things students use in certain situations aka idioms. And that's it.
> Statistics are just another way to record a grammar
I almost agree.
But it should be "record a language". These are two approaches to the problem of modeling human languages.
Grammars are an OK model. Statistical models are less useful to us humans but given the right amount of compute they do show much better (see LLMs).
This is a terminological difference. Linguists use "grammar" as a technical term for a speaker's implicit knowledge of how their language works. That knowledge could be statistical or rule-based in nature, although most linguistic theories say that it's rule-based. You're using grammars to mean human-produced descriptions of that knowledge.
Grammars the way I understand them are are a family of human language models. Typically discrete in nature. The approach was born out of Chomsky's research culminating in the Universal Grammar idea.
This is just wrong. Languages follow certain inviolable rules, most notably, hierarchical structure dependence. There are experiments (Moro, the subject "Chris") that show that humans don't process synthetic languages that violate these rules the same as synthetic languages that do (specifically it takes them longer to process and they use non-language parts of the brain to do so).
This does not mean that language in humans isn't probabilistic in nature. You seem to think that because there is structure then it must be rule based but that doesn't follow at all.
When a group of birds fly, each bird discovers/knows that flying just a little behind another will reduce the amount of flaps it needs to fly. When you have nearly every bird doing this, the flock form an interesting shape.
'Birds fly in a V shape' is essentially what grammar is here - a useful fiction of the underlying reality. There is structure. There is meaning but there is no rule the birds are following to get there. No invisible V shape in the sky constraining bird flight.
First, there is no evidence of any probabilistic processing at the level of syntax in humans (it's irrelevant what computers can do).
Second, I didn't say that, in language, structure implies deterministic rules, I said that there is a deterministic rule that involves the structure of a sentence. Specifically, sentences are interpreted according to their parse tree, not the linear order of words.
As for the birds analogy, the "rules" the birds follow actually does explain the V-shape that the flock forms. You make an observation "V-shaped flock" ask the question "why a V-shape and not some other shape" and try to find a explanation (the relative bird positions make it easier to fly [because of XYZ]). In the case of language you observe that there is structure dependence, you ask why it's that way and not another (like linear order) and try to come up with an explanation. You are trying to suggest that the observation that language has structure dependence is like seeing an image of an object in a cloud formation: an imagined mental projection that doesn't have any meaningful underlying explanation. You could make the same argument for pretty much anything (e.g. the double-slit experiment is just projecting some mental patterns onto random behavior) and I don't think it's a serious argument in this case either.
And research on syntactic surprisal—where more predictable syntactic structures are processed faster—shows a strong correlation between the probability of a syntactic continuation and reading times.
>In the case of language you observe that there is structure dependence, you ask why it's that way and not another (like linear order) and try to come up with an explanation. You are trying to suggest that the observation that language has structure dependence is like seeing an image of an object in a cloud formation: an imagined mental projection that doesn't have any meaningful underlying explanation.
No I'm suggesting that all you're doing here is cooking up some very nice fiction like Newton did when he proposed his model of gravity. Grammar does not even fit into rule based hierarchies all that well. That's why there are a million strange exceptions to almost every 'rule'. Exceptions that have no sensible explanations beyond, 'well this is just how it's used' because of course that's what happens when you try to break down an inherently probabilistic process into rigid rules.
> And research on syntactic surprisal—where more predictable syntactic structures are processed faster—shows a strong correlation between the probability of a syntactic continuation and reading times.
I'm not sure what this is supposed to show? If I can predict what you are going to say so what. I can predict you are going to pick something up too if you are looking at it and start moving your arm. So what?
The third paper looks like a similar argument. As far as I can tell neither paper 1 or 2 propose a probabilistic model for language. 1 talks about how certain language features are acquired faster with more exposure (that isn't inconsistent with a deterministic grammar). I believe 2 is the same.
> No I'm suggesting that all you're doing here is cooking up some very nice fiction like Newton did when he proposed his model of gravity.
Absolutely bonkers to describe Newton's model of gravity as "fiction". In that sense every scientific breakthrough is fiction: Bohr's model of the atom is fiction (because it didn't use quantum effects), Einstein's gravity will be fiction too when physics is unified with quantum gravity. No sane person uses the word "fiction" to describe any of this, it's just scientific refinement: we go from good models to better ones, patching up holes in our understanding, which is an unceasing process. It would be great if we could have a Newton-level "fictitious" breakthrough in language.
> Grammar does not even fit into rule based hierarchies all that well. That's why there are a million strange exceptions to almost every 'rule'. Exceptions that have no sensible explanations beyond, 'well this is just how it's used' because of course that's what happens when you try to break down an inherently probabilistic process into rigid rules.
No one is saying grammar has been solved, people are trying to figure out all the things that we don't understand.
>I'm not sure what this is supposed to show? If I can predict what you are going to say so what.
If the speed of your understanding varies with how frequent and predictable syntactic structures are then your understanding of syntax is a probabilistic process. A strictly non-probabilistic process would have a fixed, deterministic way of processing syntax, independent of how often a structure appears or how predictable it is.
>I can predict you are going to pick something up too if you are looking at it and start moving your arm. So what?
Ok ? This is very interesting. Do you seriously think this prediction right now isn't probabilistic ? You estimate not from rigid rules but past experience that it's likely I will pick it up. What if i push it off the table ? You think that isn't possible? What if i grab the knife in my bag while you're distracted and stab you instead? Probability is the reason you picked that option instead of the myriad of options.
>Absolutely bonkers to describe Newton's model of gravity as "fiction". In that sense every scientific breakthrough is fiction: Bohr's model of the atom is fiction (because it didn't use quantum effects), Einstein's gravity will be fiction too when physics is unified with quantum gravity. No sane person uses the word "fiction" to describe any of this, it's just scientific refinement: we go from good models to better ones, patching up holes in our understanding, which is an unceasing process. It would be great if we could have a Newton-level "fictitious" breakthrough in language.
"All models are wrong. Some are useful" - George Box.
There's nothing insane with calling a spade a spade. It is fiction and many academics do view it in such a light. It's useful fiction, but fiction none the less. And yes, Einstein's theory is more useful fiction.
Grammar is a model of language. It is not language.
> If the speed of your understanding varies with how frequent and predictable syntactic structures are then your understanding of syntax is a probabilistic process.
In what sense? I don't see how it tells you anything if you have the sentence "The cat ___ " and then you expect a verb like "went" but you could get a relative clause like "that caught the mouse". The sentence is interpreted deterministically not by what what follows after a fragment might contain but what it does contain. If you are more "surprised" by the latter it doesn't tell you that the process is not deterministic.
> Ok ? This is very interesting. Do you seriously think this prediction right now isn't probabilistic ? You estimate not from rigid rules but past experience that it's likely I will pick it up. What if i push it off the table ? You think that isn't possible. What if i grab the gun in my bag while you're distracted and shoot you instead?
I think you are confusing multiple things. I can predict actions and words, that doesn't mean sentence parsing/production is probabilistic (I'm not even sure exactly what a person might mean by that, especially with respect to production) nor does it mean arm movement is.
> "All models are wrong. Some are useful" - George Box. There's nothing insane with calling a spade a spade. It is fiction and many academics do view it in such a light. It's useful fiction, but fiction none the less. And yes, Einstein's theory is more useful fiction. Grammar is a model of language. It is not language.
I have no idea what you are saying: calling grammar a "fiction" was supposed to be a way to undermine it but now you are saying that it was some completely trivial statement that applies to the best science?
>In what sense? I don't see how it tells you anything if you have the sentence "The cat ___ " and then you expect a verb like "went" but you could get a relative clause like "that caught the mouse". The sentence is interpreted deterministically not by what what follows after a fragment might contain but what it does contain. If you are more "surprised" by the latter it doesn't tell you that the process is not deterministic.
The claim isn't about whether the ultimate interpretation is deterministic-it’s about the process of parsing and expectation-building as the sentence unfolds.
The idea is that language processing (at least in humans and many computational models) involves predictions about what structures are likely to come next. If the brain (or a model) processes common structures more quickly and experiences more difficulty and higher processing times with less frequent ones, then the process of parsing sentences is very clearly probabilistic.
Being "surprised" isn't just a subjective experience here - it manifests as measurable processing costs that scale with the degree of unexpectedness. This graded response to probability is not explainable with purely deterministic models that would parse every sentence with the same algorithm and fixed steps.
>I have no idea what you are saying: calling grammar a "fiction" was supposed to be a way to undermine it but now you are saying that it was some completely trivial statement that applies to the best science?
None of my comments undermine grammar beyond saying it is not how language works. I preface 'fiction' with the word useful multiple times and make comparisons to Newton.
> If the brain (or a model) processes common structures more quickly ... then the process of parsing sentences is very clearly probabilistic.
This isn't true. For one more common sentences are probably structurally simpler and structurally simpler sentences are faster to process. You also get in bizarre territory when you can predict what someone is going to say before they say it: Obviously no "parsing" has occurred there so the fact that you predicted it cannot be evidence that parsing is probabilistic. If that is the case then a similar argument is true if you have only a sentence fragment. The probabilistic prediction is some ancillary process just as if I can predict that a cup is going to fall doesn't make my vision a probabilistic process in any meaningful sense. If for some reason I couldn't predict I could still see and I could still parse sentences.
Furthermore, you can obviously parse sentences and word sequences you have never seen before (and sentences can be arbitrarily complex/nested, at least up to your limits on memory). You can also parse sentences with invented terms.
Most importantly it's not clear how sentences are produced in the mind in this model. Is the claim that you somehow start with a word and produce some random most-likely next word? Do you not believe in syntax parse trees?
Finally, (as Chomsky points out in the video I linked) this model doesn't account for structure dependence. For example why is the question form of the sentence "The man who is tall is happy" "Is the man who is tall happy?" and not "is the man who tall is happy?". Why not move the first "is" that you come across?
> In a strictly deterministic model, both continuations ("went" or "that caught the mouse") would be processed through the same fixed algorithm with the same computational steps, regardless of frequency. The parsing mechanism wouldn't be influenced by prior expectations
Correct. You seem to imply that is somehow unreasonable. Computer parsers work this way.
> Being "surprised" isn't just a subjective experience here - it manifests as measurable processing costs that scale with the degree of unexpectedness. This graded response to probability is not explainable with purely deterministic models.
Again, there are two orthogonal concepts: Do I know what you are going to say next or how you are going to finish your sentence (and possibly something like strain or slowed processing when faced with an unusual concept) and what process do I use to interpret the thing you actually said.
> None of my comments undermine grammar beyond saying it is not how language works. I preface 'fiction' with the word useful multiple times and make comparisons to Newton.
Again, I have no idea what the point of describing universal grammar as fiction is if you say the term applies to all other great scientific theories.
>This isn't true. For one more common sentences are probably structurally simpler and structurally simpler sentences are faster to process.
Common sentences are not necessarily structurally simpler and those still get processed faster so yes it's pretty true.
>You also get in bizarre territory when you can predict what someone is going to say before they say it: Obviously no "parsing" has occurred there so the fact that you predicted it cannot be evidence that parsing is probabilistic.
Of course parsing has occurred. Your history with this person (and people in general) and what you know he likes to say, his mood and body language. Still probabilistic.
>Furthermore, you can obviously parse sentences and word sequences you have never seen before (and sentences can be arbitrarily complex/nested, at least up to your limits on memory). You can also parse sentences with invented terms.
So? LLMs can do this. I'm not even sure why you would think probabilistic predictors couldn't.
>Most importantly it's not clear how sentences are produced in the mind in this model. Is the claim that you somehow start with a word and produce some random most-likely next word? Do you not believe in syntax parse trees?
That's one way to do it yeah. Why would I 'believe in it' ? Computers that rely on it don't work anywhere near as well as those that don't. What evidence is there to it being anything more than a nice simplification ?
>Finally, (as Chomsky points out in the video I linked) this model doesn't account for structure dependence. For example why is the question form of the sentence "The man who is tall is happy" "Is the man who is tall happy?" and not "is the man who tall is happy?". Why not move the first "is" that you come across?
Why does a LLM that encounters a novel form of that sentence generate the question form correctly ?
You are giving examples that probalistic approaches are clearly handling as if they are examples that probalistic approaches cannot explain. It's bizarre
>Correct. You seem to imply that is somehow unreasonable. Computer parsers work this way.
I'm not implying it's unreasonable. I'm telling you the brain clearly does not process language this way because even structurally simple but uncommon syntax is processed slower.
>Again, I have no idea what the point of describing universal grammar as fiction is if you say the term applies to all other great scientific theories
What's the point of describing Newton's model as fiction if I still teach it in high schools and Universities? Because erroneous models can still be useful.
>Again, there are two orthogonal concepts: Do I know what you are going to say next or how you are going to finish your sentence (and possibly something like strain or slowed processing when faced with an unusual concept) and what process do I use to interpret the thing you actually said.
The brain does not comprehend a sentence without trying to predict its meaning. They aren't orthogonal. They're intrinsically linked
> "Of course parsing has occurred. Your history with this person (and people in general) and what you know he likes to say, his mood and body language. Still probabilistic."
This is just redefining terms to be so vague as to make rationality inquiry or discussion impossible. I don't know what re-definition of parsing you could be using that would still be in any way useful or to what "probabilistic" in that case is supposed to apply to.
If you are saying that the brain is constantly predicting various things so that it automatically imbues some process that doesn't involve prediction as probabilistic then that is just useless.
> Common sentences are not necessarily structurally simpler and those still get processed faster so yes it's pretty true.
Well, I'll have to take your word for it as you haven't cited the paper but I would point to the reasonable explanation of different processing times that has nothing to do with parsing I gave further below. But I will repeat the vision analogy: If I had an experiment that showed that I took longer to react to an unusual visual sequence we would not immediately conclude that the visual system was probabilistic. The more parsimonious explanation is that the visual system is deterministic and some other part of cognition takes longer (or is recomputed) because of the "surprise".
> So? LLMs can do this. I'm not even sure why you would think probabilistic predictors couldn't.
It's not about capturing it in a statistics or having an LLM produce it, it's about explaining why that rule occurs and not some other. That's the difference between explanation and description.
> That's one way to do it yeah. Why would I 'believe in it' ? Computers that rely on it don't work anywhere near as well as those that don't. What evidence is there to it being anything more than a nice simplification ?
Because producing one token at a time cannot produce arbitrary recursive structures like sentences can be? Because no language uses linear order? Because when we express a thought it usually can't be reduced to a single start word and statistically most-likely next word continuations? It's also irrelevant what computers do, we are talking about what humans do.
> Why does a LLM that encounters a novel form of that sentence generate the question form correctly ?
That isn't the question. The question is why it's that way and not another. It's as if I ask why do the planets move in a certain pattern and you respond with "well why does my deep-neural-net predict it so well?". It's just nonsense.
> You are giving examples that probalistic approaches are clearly handling as if they are examples that probalistic approaches cannot explain. It's bizarre
No probabilistic model has explained anything. You are confusing predicting with explaining.
> I'm not implying it's unreasonable. I'm telling you the brain clearly does not process language this way because even structurally simple but uncommon syntax is processed slower.
I explained why you would expect that to be the case even with deterministic processing.
> What's the point of describing Newton's model as fiction if I still teach it in high schools and Universities? Because erroneous models can still be useful.
Well as I said this is also true of Einstein's theory of gravity and you presumably brought up the point to contrast universal grammar with that theory rather than point out the similarities.
> The brain does not comprehend a sentence without trying to predict its meaning. They aren't orthogonal. They're intrinsically linked
The brain is doing lots of things, we are talking about the language system. Again, if instead we were talking about the visual system no one would dispute that the visual system is doing the "seeing" and other parts of the brain are doing predicting.
In fact they must be orthogonal because once you get to the end of the sentence, where there are no next words to predict, you can still parse it even if all your predictions were wrong. So the main deterministic processing bit (universal grammar) still needs to be explained and the ancillary next-word-prediction "probabilistic" part is not relevant to its explanation.
What exactly is wrong? The fact that grammars are very limited models of human languages? My key thesis is that human languages operate in a way that non-probabilistic models (i.e. grammars) can only describe it in a very lossy way.
Sure, LLMs are also lossy but also much more scalable.
I've spent quite a lot of time with 90s/2000s papers on the topic, and I don't remember any model useful in generating human language better than "stohastic parrots" do.
As I said there are universal rules that human language processing follows (like hierarchical structure dependence); you can't have arbitrary syntax/grammars. It's true that science hasn't solved the main puzzles about how to characterize these rules.
The fact that statistical models are better predictors than the-"true"-characterization-that-we-haven't-figured-out-yet is completely irrelevant, just as it would be irrelevant if your deep-learning net was a better predictor of the weather: it wouldn't imply that the weather doesn't follow rules in physics, regardless of whether we knew what those rules were.
> As I said there are universal rules that human language processing follows (like hierarchical structure dependence); you can't have arbitrary syntax/grammars.
GP didn't say anything about grammars being arbitrary. In fact, his claim that grammars are models of languages would mean the complete opposite.
I don't think they have a consistent understanding of the word "grammar": they seem to use it in the grade-school sense (grammar for English, grammar for French) but then refer to Chomsky's universal grammar which is different (grammar rules that are common to all languages).
The main point of contention is their statement that "grammar follows language" which, in the Chomsky sense, is false: (universal) grammar/syntax describes the human language faculty (the internal language system) from which external languages (English, French, sign language) are derived, so (external) languages follow grammar.
Yes, I was a bit vague. If we are to be serious then we would have to come with definitions of grammar-based approaches vs stohastic approaches.
All I am saying is that grammars (as per Chomsky) or even high-school rule-based stuff are imperfect and narrow models of human languages. They might work locally, for a given sentence, but fall apart when applied to the problem at scale. They also (by definition) fail to capture both more subtle and more general complexities of languages.
And the universal grammar hypothesis is just that - a hypothesis. It might be convenient at times to think about languages in this way in certain contexts but that's about it.
Also, remember, this is Hacker News, and I am just a programmer who loves his programming/natural languages so I look at everything from a computational point of view.
All this comes down to is that language is not a solved problem. By the same logic why not just stop doing any research in physics and just put everything through a neural net which is going to give better predictions than the current best theories?
The fact that a deep-neural-net can predict the weather better than a physics-based model does not mean that the weather is not physics-based. Furthermore deep-neural-nets predict but don't explain while a physics-based model tries to explain (and consequently predict).
How do you explain syntactic islands, binding rules or any number of arcane linguistic rules that humans universally follow? Children can generalise outside of their training set in a way that LLMs simply cannot (e.g. Nicaraguan sign language or creolization)
Linguists however know that grammar is, indeed, important for linguistic comprehension. For example, the German "Ich sehe die Frau mit dem Fernglas" (I see the woman with the binoculars) is _unambiguous_ because "die Frau" and "mit dem Fernglas" match in both gender and case. If this weren't the case, it could be either "I see (the woman with the binoculars)" or "I see (the woman) with [using] the binoculars". Even in German you might encounter this e.g. if you instead had to say "Ich sehe das Mädchen mit dem Fernglas", as das Mädchen (the girl) is neuter rather than feminine in gender.
My point is that Grammar is to language what Newton was to gravity i.e useful fiction that works well enough for most scenarios, not that language has no structure.
> For example, the German "Ich sehe die Frau mit dem Fernglas" (I see the woman with the binoculars) is _unambiguous_ because "die Frau" and "mit dem Fernglas" match in both gender and case. If this weren't the case, it could be either "I see (the woman with the binoculars)" or "I see (the woman) with [using] the binoculars".
My German is pretty rusty, why exactly is it unambiguous?
I don't see how changing the noun would make a difference. "Ich sehe" followed by any of these: "den Mann mit dem Fernglas", "die Frau mit dem Fernglas", "das Mädchen mit dem Fernglas" sounds equally ambiguous to me.
Die Frau and dem Fernglass don’t bind tightly though.
In my view, this phrase is only unambiguous to those who feel the preposition tradition, and all the heavy lifting is done here by “mit” (and “durch” in the opposite case, if one wants to make it clear). Articles are irrelevant and are dictated by the verb and the preposition, whose requirements are sort of arbitrary (sehen Akk., mit Dat.) and fixed. There’s no article-controlled variation that could change meaning, to my knowkedge it would be simply incorrect.
I’m also quite rusty on Deutsch, aber habe es nicht völlig vergessen, it seems.
I don’t disagree with any of your particular points, but I think you’re missing the forest here: their argument is primarily based in empirical results, not a theoretical framework/logical deduction. In other words, they’re trying to explain why LLMs work so well for decoding human neural content, not arguing that they do!
I think any reasonable scientist would a-priori react the same way to these claims as claims that neural networks alone can possibly crack human intuition: “that sounds like sci-fi speculation at best”. But that’s the crazy world we live in…
This is interesting. The blog post links several papers, and I recommend reading them.
Responses here however seem not commensurate with the evidence presented. Two of the papers[0][1] that provide the sources for the illustration in the blog post are about research conducted on a very small group of subjects. They measure neural activity when listening to a 30 minutes podcast (5000 words). Participants tried to guess next words. All the talk about "brain embedding" is derived from interpreting neuronal activity and sensor data geometrically. It is all very contrived.
Very interesting stuff from a neuroscience, linguistics and machine learning perspective. But I will quote from the conclusion of one of the papers[1]: "Unlike humans, DLMs (deep language models) cannot think, understand or generate new meaningful ideas by integrating prior knowledge. They simply echo the statistics of their input"
>"Unlike humans, DLMs (deep language models) cannot think, understand or generate new meaningful ideas by integrating prior knowledge. They simply echo the statistics of their input"
[Citation needed]. Actually, the paper does give a citation (G.F. Marcus, the Algebraic Mind), that is from 2019 according to their citation list (i.e. before gpt3), but actually seems to be from the early 2000s.
I view this as compelling evidence that current models are more than "stochastic parrots," because as the OP shows, they are learning to model the world in ways that are similar (up to a linear transformation) to those exhibited by the human brain. The OP's findings, in short:
* A linear transformation of a speech encoder's embeddings closely aligns them with patterns of neural activity in the brain's speech areas in response to the same speech sample.
* A linear transformation of a language decoder's embeddings closely aligns them with patterns of neural activity in the brain's language areas in response to the same language sample.
Yeah, I have always firmly maintained that there is less fundamental difference between LLMs and human brains than most people seems to assume.
Going a bit further, I'll speculate that the actions made by a human brain are simply a function of the "input" from our ~5 senses combined with our memory (obviously there are complications such as spinal reflexes, but I don't think those affect my main point). Neural nets are universal function approximators, so can't a sufficiently large neural net approximate a full human brain? In that case, is there any merit to saying that a human "understands" something in a way that a neural net doesn't? There's obviously a huge gap between the two right now, but I don't see any fundamental difference besides "consciousness" which is not well defined to begin with.
Computers can compute anything, with the only limitations being time, space, energy, programming, etc. So far, it seems like all of those things can be increased without bound. I see no reason to believe that there will be some fundamental limitation that prevents us from improving computers to the point that they can compute the equivalent of a human brain. Yes I agree we don't understand the human brain well enough yet, but empirically we can observe that AI agents and human agents behave in similar ways. Sure, it may not end up being a complete copy of a human brain, but if the inputs and outputs are the same, does it make a difference?
The UAT is a pretty weak result in practice. A lot of systems have the same property, and most of them are pretty poor approximators in practice. It may very well be that no reasonable amount of computing power allows approximating the "function of consciousness". Plus, if you're a certain kind of dualist the entire idea of a compact, smooth "consciousness" function may be something you reject philosophically.
I agree there are issues with the UAT, but I feel like my conclusion is still valid: a neural net, given the memories and senses that a humans has, is capable of approximating a human's response accurately enough to be indistinguishable from another human, at least to another human.
I philosophically reject the notion that consciousness is an important factor here. The question of whether or not you have a consciousness doesn't affect what I take away from this conversation, and similarly the question of whether an AI has a consciousness doesn't affect what I take away from my actions with it. If the (non-)existence of others' consciousnesses doesn't materially affect my life—and we assume that it's a fundamentally unanswerable question—why should I care other than curiosity?
>a neural net, given the memories and senses that a humans has, is capable of approximating a human's response accurately enough to be indistinguishable from another human, at least to another human.
That doesn't remotely follow from the UAT and is also almost certainly false.
I am not saying that follows from the formal UAT, I'm saying that in general, if we assume that the human brain is simply producing outputs in response to inputs, I have no reason to believe that a neural net couldn't be constructed to approximate the same outputs given the same inputs. And specifically, the threshold of "good enough approximation" I chose is effectively just a stronger form of the turing test. AI has arguably already passed a weaker form of the turing test (most people in the world cannot tell the difference between a human-written and an AI-written article, for example).
> current models are more than "stochastic parrots"
I believe the same, and also I'm willing to accept that the human brain can intentionally operate in an stochastic parrot mode.
Some people have the ability to fluently speak non-stop, completely impromptu. I wonder if it's similar to an LLM pipeline, where there'a constant stream of thoughts being generated based on very recent context, which are then passed through various output filters.
I do not understand what you find convincing about this that changes your mind.
We have a closed system that we designed to operate in a way that is similar to our limited understanding of how a portion of the brain works, based on how we would model that part of the brain if it had to traverse an nth-dimensional array. We have loosely observed it working in a way that could roughly be defined as similar to our limited understanding of how a portion of the brain works given that limitation that we know is not true of the human brain, with a fairly low confidence level.
Even if you put an extreme level of faith into those very subpar conclusions and take them to be rigid... That does not make it actually similar to the human brain, or any kind of brain at all.
The correlations are 0.25-0.5, which is quite poor (Gaussian distribution plots with those correlations look like noise). That's before analyzing the methodology and assumptions.
Correlation of 0.25-0.5 being poor is very problem dependent.
For example, in difficult perceptual tasks ("can you taste which of these three biscuits is different" [one biscuit is made with slightly less sugar]), a correlation of 0.3 is commonplace and considered an appropriate amount of annotator agreement to make decisions.
Yes for certain things like statistical trading (assuming some kind of "nice" Gaussian-like distribution) where you have lots of trades and just need to be more right than wrong it's probably useful.
Not here though, where you are trying to prove a (near) equivalence.
> I view this as compelling evidence that current models are more than "stochastic parrots,"
More evidence against "stochastic parrots"
- zero shot translation, where LLMs can translate between unseen pairs of languages
- repeated sampling of responses from the same prompt - which shows diversity of expression with convergence of semantics
- reasoning models - solving problems
But my main critique is that they are better seen as pianos, not parrots. Pianos don't make music, but we do. And we play the LLMs on the keyboard like regular pianos.
That's 100% false, dogs and pigeons can obviously think, and it is childish to suppose that their thoughts are a sequence of woofs or coos. Trying to make an AI that thinks like a human without being able to think like a chimpanzee gives you reasoning LLMs that can spit out proofs in algebraic topology, yet still struggle with out-of-distribution counting problems which frogs and fish can solve.
How human brains process thoughts is non-uniform across population. There’s imagery, written language, sound, speech, tactile, etc. Not everything that you think about is readily expressible in your language. There are definitely people with and without “internal screen”, and probably few more types with/without X, where X is a set of things we’ve never talked about, either assuming everyone has it or not realizing that it’s a non-mandatory part of how you think.
That's not really what I'm saying. What I'm saying is how does the brain look when you do both ? Is there a clear difference ? There's no 'thinking mode' followed by a 'language processing mode'.
Language processing is thinking as far the brain is concerned and there's no evidence that these are 2 cleanly separated processes whether you 'think' in words or not.
That's not true, the Moro experiments show they use different capacities as do similar experiments on people who have certain severe cognitive deficiencies that don't impact language processing (e.g. the subject "Chris")
My argument is that “thinking” and “language processing” are not two sequential or clearly separated modes in the brain but deeply intertwined.
Language is a lot more than parsing syntax, whatever your thoughts are on the matter, even LLMs are clearly doing more than that. Are there any experiments where subjects had severe cognitive deficiencies and language in its full breadth or maybe i should say communication ? came out unscathed ?
The chris experiments don't seem to go into much detail in that front.
I just gave one: "Chris". Here's Chomsky describing the "Chris"-experiments ([1]) as part of a broader answer about how language is distinct from general cognition which I paraphrased above.
> That doesn't contradict the argument that “thinking” and “language processing” are not two sequential or clearly separated modes in the brain but deeply intertwined.
It's not an argument, it's an assertion, that is, in fact, contradicted by the experimental evidence I described (Moro and "Chris"). Of course they are "deeply intertwined" but because of the evidence it's probably an interface between two distinctive systems rather than one general system doing two tasks.
Like i said, these experiments stop at a vague 'Chris can still learn languages'. No comment on actual proficiency or testing. For all i know i can't have a meaningful conversation with this guy beyond syntactically correct speech. Or maybe the best proficiency he's ever managed is still pretty poor compared to the average human. I have no idea.
There's no contradiction because i never argued/asserted the brain didn't have parts tuned for language, which is really all this experiment demonstrates.
It's irrelevant to the experiment: he could learn synthetic language with human-like grammar and could not learn synthetic languages with non-human-like grammar. Regular people could solve the non-human-like languages with difficulty. Because his language ability is much higher than his general problem solving ability it gives strong evidence that 1. human language capacity a special function, not a general purpose cognitive function and 2. it obeys a certain structure.
> There's no contradiction because i never argued/asserted the brain didn't have centers tuned for language, which is really all this experiment demonstrates.
>Because his language ability is much higher than his general problem solving ability
I don't see how you can say his language ability is much higher than his general problem solving ability if you don't know what proficiency of language he is capable of reaching.
When you are learning say English as a second language, there are proficiency tiers you get assigned when you get tested - A1, A2 etc
If he's learning all these languages but maxing out at A2 then his language ability is only slightly better than his general problem solving ability.
This is the point i'm trying to drive home. Maybe it's because i've been learning a second language for a couple years and so i see it more clearly but saying 'he learned x language' says absolutely nothing. People say that to mean anything from 'well he can ask for the toilet' to 'could be mistaken for a native'.
>I don't know what you are trying to say then.
The brain has over millions of years been tuned to speak languages with certain structures. Deviating from these structures is more taxing for the brain. True statement. But how on earth does that imply the brain isn't 'thinking' for the structures it is used to ? Do you say you did not think for question 1 just because question 2 was more difficult ?
As I said it's not relevant but if you wanted to know you could put in the bare minimum of effort into doing your own research. From Smith and Tsimpli's "The Mind of a Savant": "On the [Gapadol Reading Comprehension Test] Christopher scored at the maximum level, indicating a reading comprehension of 16 years and 10 months". They describe the results of a bunch of other language tests, where he scores average to above average, including his translations of passages from a dozen different languages.
> But how on earth does that imply the brain isn't 'thinking' for the structures it is used to ? Do you say you did not think for question 1 just because question 2 was more difficult ?
The point isn't to define the word "thinking" it is to show that the language capacity is a distinct faculty from other cognitive capacities.
Is there some theorem stating something like random few-hot vectors can always be combined linearly to match any signal with a low p-value?
I thought I encountered it sometimes in my experiments and that this might be happening in this llm x neuroscience trend of matching llm internals to brain signals.
Reply to myself to add more context as I'm at work: what I mean is that I suspect that the average "shape" of vectors in the llm is dependent on the depth inside the llm. In effect: vectors initially are gaussian-random but as the model gets trained they end up closer and closer to few-shot vectors, that are in effect easier to correlate to any signal including brain signals.
In effect: any kind of llm activation could be correlated to brain signals even though it's just a sophisticated p mapping and would not correspond to anything useful.
Iirc the team at meta ai of jean remi king shown that even randomly trained LLMs could be fitted.
It is somewhat ironic that they had to use an OpenAI model for this research. At the same time, this gives nice continuity from earlier works that demonstrated similar, smaller scale, results using GPT-2.
You're mostly driven by bodily conditions and hormones. A computer recording of you isn't going to behave the same because it has no particular motivation to behave any specific way in the first place.
To be noted, if you accept a brain upload made gradually you should also accept a brain upload made discontinuously. If the same brain state comes out, the process can't actually matter.
ok, that pretty cool research from Google, hope this leads to even more discoveries around the brain, hopefully it's time we get a better understanding of our brains and how to hack them.
Due to my current condition. I feel that I could do more both for myself and the world but unfortunately motivation plays a big role or otherwise I have to trick myself into feeling stressed in order to do things like work that might be boring or feeling observer.
So many reasons: absorb information faster; improve spatial visualization; motivation, intrinsic motivation hacking ; simulations...etc
Give me the code to my brain and let me edit it, with version control please :D
meditation, if you want to try a drug free approach.
Make it simple. Stare at a clock with a big second hand. Take one breath every 15 seconds. Then, after a minute or so, push it out to 20 seconds, then one every 30 seconds.
For the 30, my pattern tends to stabilize on inhale for 5-7, hold for 5-7, and then a slow exhale. I find that after the first exhale, if I give a little push I can get more air out of my lungs.
Do this once a day, 7-10 minutes session, for a week, and see if things aren't a little different.
Brains are already hackable in multiple senses. Namely through exogenous chemicals, electrical stimulation, hypnosis, and combinations of these. They aren’t necessarily reverse engineerable, which is what computational models like LLM-tied ones would enable, but they are hackable. We have the Cold War to thank for that.
So its neuronal activity from intercranial electrodes, during an active conversation. And, they found there are causal chains type patterns in the neuronal activity to produce the speech (and presumed thought) in the conversation which compare "favourably" with the LLM.
Ok. I buy it. The sequencing necessary to translate thought to words, necessarily imposes a serialisation which in consequence marshalls activity into a sequence, which in turn matches the observed statistically derived LLM sequences.
I tend to say the same things. I often say "this AGI is bullshit" and the ocurrence of Bullshit after the acronym AGI is high. I would be totally unsurprised if the linear sequence of neuronal signalling to both think, and emote as speech or even "charades" physical movements to say "AGI is bullshit" would not in some way mimic that of an LLM, or vice versa.
My mildly grumpy opinion: this is not the first paper to show correlation between brain activity and the layers of a transformer. I know that Wang et. al (2024) have done it last year[1], but I doubt they're the only ones - I just have them in my head because I was reading their paper last week. Bonus fact: Wang et. al's paper also shows that test scores are a relevant factor in said correlation.
The point that always comes to mind is: correlation does not imply causation. I guess the main contribution would be a better mapping of the areas of the brain associated with speech production, but jumping from "these two things correlate" to "these two things are essentially the same" seems to me a bit of a stretch.
I find the OP very difficult to comprehend, to the point that I question whether it has content at all. One difficulty is in understanding their use of the word "embedding", defined (so to speak) as "internal representations (embeddings)", and their free use of the word to relate, and even equate, LLM internal structure to brain internal structure. They are simply assuming that there is a brain "embedding" that can be directly compared to the matrix of numerical weights that comprise an LLM's training. That seems a highly dubious assumption, to the point of being hand-waving.
They mention a profound difference in the opening paragraph, "Large language models do not depend on symbolic parts of speech or syntactic rules". Human language models very obviously and evidently do. On that basis alone, it can't be valid to just assume that a human "embedding" is equivalent to an LLM "embedding", for input or output.
> They are simply assuming that there is a brain "embedding" that can be directly compared to the matrix of numerical weights that comprise an LLM's training.
If there were no such structure, then their methods based on aligning neural embeddings with brain "embeddings" (really just vectors of electrode values or voxel activations) would not work.
> They mention a profound difference in the opening paragraph, "Large language models do not depend on symbolic parts of speech or syntactic rules". Human language models very obviously and evidently do. On that basis alone, it can't be valid to just assume that a human "embedding" is equivalent to an LLM "embedding", for input or output.
This feels like "it doesn't work the way I thought it would, so it must be wrong."
I think actually their point here is mistaken for another reason: there's good reason to think that LLMs do end up implicitly representing abstract parts of speech and syntactic rules in their embedding spaces.
>They mention a profound difference in the opening paragraph, "Large language models do not depend on symbolic parts of speech or syntactic rules. "Human language models very obviously and evidently do.
Honestly do they ? To me, they clearly don't. Grammar is not how language works. It's useful fiction. Language even in humans seems to be a very statistical process.
Yes! As somebody who speaks 2 languages, and sort of reads/understands 2 more, I cannot agree more. Human spoken languages do not follow any grammars. Grammars are just simplified representations of reality that is probabilistic in nature.
This is something that Chomsky got very wrong, and the statistical/ML crowd got very right.
But still, grammars are a very useful model.
Languages definitely follow grammars. They don't follow the grammars that were written by observing them, but you can discover unwritten grammatical structures that are nevertheless followed by everyone who speaks a language, and who if asked wouldn't even be able to articulate the rules that they are following. It's the following that defines the grammar, not the articulation of the rules.
Statistics are just another way to record a grammar, all the way down to the detail of how one talks about bicycles, or the Dirty War in Argentina.
If a grammar is defined as a book that enumerates the rules of a language, then of course language doesn't require following a grammar. If a grammar is defined as a set of rules for communicating reasonably well with another person who knows those same rules, then language follows grammars.
> Languages definitely follow grammars
But it's the other way around! Grammars follow languages. Or, more precisely, grammars are (very lossy) language models.
They describe typical expectations of an average language speaker. Grammars try to provide a generalized system describing an average case.
I prefer to think of languages as a set of typical idioms used by most language users. A given grammar is an attempt to catch similarities between idioms within the set and turn 'em into a formal description.
A grammar might help with studying a language, and speed up the process of internalizing idioms, but the final learning stage is a set of things students use in certain situations aka idioms. And that's it.
> Statistics are just another way to record a grammar
I almost agree.
But it should be "record a language". These are two approaches to the problem of modeling human languages.
Grammars are an OK model. Statistical models are less useful to us humans but given the right amount of compute they do show much better (see LLMs).
This is a terminological difference. Linguists use "grammar" as a technical term for a speaker's implicit knowledge of how their language works. That knowledge could be statistical or rule-based in nature, although most linguistic theories say that it's rule-based. You're using grammars to mean human-produced descriptions of that knowledge.
That's correct.
Grammars the way I understand them are are a family of human language models. Typically discrete in nature. The approach was born out of Chomsky's research culminating in the Universal Grammar idea.
This is just wrong. Languages follow certain inviolable rules, most notably, hierarchical structure dependence. There are experiments (Moro, the subject "Chris") that show that humans don't process synthetic languages that violate these rules the same as synthetic languages that do (specifically it takes them longer to process and they use non-language parts of the brain to do so).
This does not mean that language in humans isn't probabilistic in nature. You seem to think that because there is structure then it must be rule based but that doesn't follow at all.
When a group of birds fly, each bird discovers/knows that flying just a little behind another will reduce the amount of flaps it needs to fly. When you have nearly every bird doing this, the flock form an interesting shape.
'Birds fly in a V shape' is essentially what grammar is here - a useful fiction of the underlying reality. There is structure. There is meaning but there is no rule the birds are following to get there. No invisible V shape in the sky constraining bird flight.
First, there is no evidence of any probabilistic processing at the level of syntax in humans (it's irrelevant what computers can do).
Second, I didn't say that, in language, structure implies deterministic rules, I said that there is a deterministic rule that involves the structure of a sentence. Specifically, sentences are interpreted according to their parse tree, not the linear order of words.
As for the birds analogy, the "rules" the birds follow actually does explain the V-shape that the flock forms. You make an observation "V-shaped flock" ask the question "why a V-shape and not some other shape" and try to find a explanation (the relative bird positions make it easier to fly [because of XYZ]). In the case of language you observe that there is structure dependence, you ask why it's that way and not another (like linear order) and try to come up with an explanation. You are trying to suggest that the observation that language has structure dependence is like seeing an image of an object in a cloud formation: an imagined mental projection that doesn't have any meaningful underlying explanation. You could make the same argument for pretty much anything (e.g. the double-slit experiment is just projecting some mental patterns onto random behavior) and I don't think it's a serious argument in this case either.
>First, there is no evidence of any probabilistic processing at the level of syntax in humans (it's irrelevant what computers can do).
There is plenty evidence for to suggest this
https://pubmed.ncbi.nlm.nih.gov/27135040/
https://pubmed.ncbi.nlm.nih.gov/25644408/
https://www.degruyter.com/document/doi/10.1515/9783110346916...
And research on syntactic surprisal—where more predictable syntactic structures are processed faster—shows a strong correlation between the probability of a syntactic continuation and reading times.
>In the case of language you observe that there is structure dependence, you ask why it's that way and not another (like linear order) and try to come up with an explanation. You are trying to suggest that the observation that language has structure dependence is like seeing an image of an object in a cloud formation: an imagined mental projection that doesn't have any meaningful underlying explanation.
No I'm suggesting that all you're doing here is cooking up some very nice fiction like Newton did when he proposed his model of gravity. Grammar does not even fit into rule based hierarchies all that well. That's why there are a million strange exceptions to almost every 'rule'. Exceptions that have no sensible explanations beyond, 'well this is just how it's used' because of course that's what happens when you try to break down an inherently probabilistic process into rigid rules.
> And research on syntactic surprisal—where more predictable syntactic structures are processed faster—shows a strong correlation between the probability of a syntactic continuation and reading times.
I'm not sure what this is supposed to show? If I can predict what you are going to say so what. I can predict you are going to pick something up too if you are looking at it and start moving your arm. So what?
The third paper looks like a similar argument. As far as I can tell neither paper 1 or 2 propose a probabilistic model for language. 1 talks about how certain language features are acquired faster with more exposure (that isn't inconsistent with a deterministic grammar). I believe 2 is the same.
> No I'm suggesting that all you're doing here is cooking up some very nice fiction like Newton did when he proposed his model of gravity.
Absolutely bonkers to describe Newton's model of gravity as "fiction". In that sense every scientific breakthrough is fiction: Bohr's model of the atom is fiction (because it didn't use quantum effects), Einstein's gravity will be fiction too when physics is unified with quantum gravity. No sane person uses the word "fiction" to describe any of this, it's just scientific refinement: we go from good models to better ones, patching up holes in our understanding, which is an unceasing process. It would be great if we could have a Newton-level "fictitious" breakthrough in language.
> Grammar does not even fit into rule based hierarchies all that well. That's why there are a million strange exceptions to almost every 'rule'. Exceptions that have no sensible explanations beyond, 'well this is just how it's used' because of course that's what happens when you try to break down an inherently probabilistic process into rigid rules.
No one is saying grammar has been solved, people are trying to figure out all the things that we don't understand.
>I'm not sure what this is supposed to show? If I can predict what you are going to say so what.
If the speed of your understanding varies with how frequent and predictable syntactic structures are then your understanding of syntax is a probabilistic process. A strictly non-probabilistic process would have a fixed, deterministic way of processing syntax, independent of how often a structure appears or how predictable it is.
>I can predict you are going to pick something up too if you are looking at it and start moving your arm. So what?
Ok ? This is very interesting. Do you seriously think this prediction right now isn't probabilistic ? You estimate not from rigid rules but past experience that it's likely I will pick it up. What if i push it off the table ? You think that isn't possible? What if i grab the knife in my bag while you're distracted and stab you instead? Probability is the reason you picked that option instead of the myriad of options.
>Absolutely bonkers to describe Newton's model of gravity as "fiction". In that sense every scientific breakthrough is fiction: Bohr's model of the atom is fiction (because it didn't use quantum effects), Einstein's gravity will be fiction too when physics is unified with quantum gravity. No sane person uses the word "fiction" to describe any of this, it's just scientific refinement: we go from good models to better ones, patching up holes in our understanding, which is an unceasing process. It would be great if we could have a Newton-level "fictitious" breakthrough in language.
"All models are wrong. Some are useful" - George Box. There's nothing insane with calling a spade a spade. It is fiction and many academics do view it in such a light. It's useful fiction, but fiction none the less. And yes, Einstein's theory is more useful fiction. Grammar is a model of language. It is not language.
> If the speed of your understanding varies with how frequent and predictable syntactic structures are then your understanding of syntax is a probabilistic process.
In what sense? I don't see how it tells you anything if you have the sentence "The cat ___ " and then you expect a verb like "went" but you could get a relative clause like "that caught the mouse". The sentence is interpreted deterministically not by what what follows after a fragment might contain but what it does contain. If you are more "surprised" by the latter it doesn't tell you that the process is not deterministic.
> Ok ? This is very interesting. Do you seriously think this prediction right now isn't probabilistic ? You estimate not from rigid rules but past experience that it's likely I will pick it up. What if i push it off the table ? You think that isn't possible. What if i grab the gun in my bag while you're distracted and shoot you instead?
I think you are confusing multiple things. I can predict actions and words, that doesn't mean sentence parsing/production is probabilistic (I'm not even sure exactly what a person might mean by that, especially with respect to production) nor does it mean arm movement is.
> "All models are wrong. Some are useful" - George Box. There's nothing insane with calling a spade a spade. It is fiction and many academics do view it in such a light. It's useful fiction, but fiction none the less. And yes, Einstein's theory is more useful fiction. Grammar is a model of language. It is not language.
I have no idea what you are saying: calling grammar a "fiction" was supposed to be a way to undermine it but now you are saying that it was some completely trivial statement that applies to the best science?
>In what sense? I don't see how it tells you anything if you have the sentence "The cat ___ " and then you expect a verb like "went" but you could get a relative clause like "that caught the mouse". The sentence is interpreted deterministically not by what what follows after a fragment might contain but what it does contain. If you are more "surprised" by the latter it doesn't tell you that the process is not deterministic.
The claim isn't about whether the ultimate interpretation is deterministic-it’s about the process of parsing and expectation-building as the sentence unfolds.
The idea is that language processing (at least in humans and many computational models) involves predictions about what structures are likely to come next. If the brain (or a model) processes common structures more quickly and experiences more difficulty and higher processing times with less frequent ones, then the process of parsing sentences is very clearly probabilistic.
Being "surprised" isn't just a subjective experience here - it manifests as measurable processing costs that scale with the degree of unexpectedness. This graded response to probability is not explainable with purely deterministic models that would parse every sentence with the same algorithm and fixed steps.
>I have no idea what you are saying: calling grammar a "fiction" was supposed to be a way to undermine it but now you are saying that it was some completely trivial statement that applies to the best science?
None of my comments undermine grammar beyond saying it is not how language works. I preface 'fiction' with the word useful multiple times and make comparisons to Newton.
> If the brain (or a model) processes common structures more quickly ... then the process of parsing sentences is very clearly probabilistic.
This isn't true. For one more common sentences are probably structurally simpler and structurally simpler sentences are faster to process. You also get in bizarre territory when you can predict what someone is going to say before they say it: Obviously no "parsing" has occurred there so the fact that you predicted it cannot be evidence that parsing is probabilistic. If that is the case then a similar argument is true if you have only a sentence fragment. The probabilistic prediction is some ancillary process just as if I can predict that a cup is going to fall doesn't make my vision a probabilistic process in any meaningful sense. If for some reason I couldn't predict I could still see and I could still parse sentences.
Furthermore, you can obviously parse sentences and word sequences you have never seen before (and sentences can be arbitrarily complex/nested, at least up to your limits on memory). You can also parse sentences with invented terms.
Most importantly it's not clear how sentences are produced in the mind in this model. Is the claim that you somehow start with a word and produce some random most-likely next word? Do you not believe in syntax parse trees?
Finally, (as Chomsky points out in the video I linked) this model doesn't account for structure dependence. For example why is the question form of the sentence "The man who is tall is happy" "Is the man who is tall happy?" and not "is the man who tall is happy?". Why not move the first "is" that you come across?
> In a strictly deterministic model, both continuations ("went" or "that caught the mouse") would be processed through the same fixed algorithm with the same computational steps, regardless of frequency. The parsing mechanism wouldn't be influenced by prior expectations
Correct. You seem to imply that is somehow unreasonable. Computer parsers work this way.
> Being "surprised" isn't just a subjective experience here - it manifests as measurable processing costs that scale with the degree of unexpectedness. This graded response to probability is not explainable with purely deterministic models.
Again, there are two orthogonal concepts: Do I know what you are going to say next or how you are going to finish your sentence (and possibly something like strain or slowed processing when faced with an unusual concept) and what process do I use to interpret the thing you actually said.
> None of my comments undermine grammar beyond saying it is not how language works. I preface 'fiction' with the word useful multiple times and make comparisons to Newton.
Again, I have no idea what the point of describing universal grammar as fiction is if you say the term applies to all other great scientific theories.
>This isn't true. For one more common sentences are probably structurally simpler and structurally simpler sentences are faster to process.
Common sentences are not necessarily structurally simpler and those still get processed faster so yes it's pretty true.
>You also get in bizarre territory when you can predict what someone is going to say before they say it: Obviously no "parsing" has occurred there so the fact that you predicted it cannot be evidence that parsing is probabilistic.
Of course parsing has occurred. Your history with this person (and people in general) and what you know he likes to say, his mood and body language. Still probabilistic.
>Furthermore, you can obviously parse sentences and word sequences you have never seen before (and sentences can be arbitrarily complex/nested, at least up to your limits on memory). You can also parse sentences with invented terms.
So? LLMs can do this. I'm not even sure why you would think probabilistic predictors couldn't.
>Most importantly it's not clear how sentences are produced in the mind in this model. Is the claim that you somehow start with a word and produce some random most-likely next word? Do you not believe in syntax parse trees?
That's one way to do it yeah. Why would I 'believe in it' ? Computers that rely on it don't work anywhere near as well as those that don't. What evidence is there to it being anything more than a nice simplification ?
>Finally, (as Chomsky points out in the video I linked) this model doesn't account for structure dependence. For example why is the question form of the sentence "The man who is tall is happy" "Is the man who is tall happy?" and not "is the man who tall is happy?". Why not move the first "is" that you come across?
Why does a LLM that encounters a novel form of that sentence generate the question form correctly ?
You are giving examples that probalistic approaches are clearly handling as if they are examples that probalistic approaches cannot explain. It's bizarre
>Correct. You seem to imply that is somehow unreasonable. Computer parsers work this way.
I'm not implying it's unreasonable. I'm telling you the brain clearly does not process language this way because even structurally simple but uncommon syntax is processed slower.
>Again, I have no idea what the point of describing universal grammar as fiction is if you say the term applies to all other great scientific theories
What's the point of describing Newton's model as fiction if I still teach it in high schools and Universities? Because erroneous models can still be useful.
>Again, there are two orthogonal concepts: Do I know what you are going to say next or how you are going to finish your sentence (and possibly something like strain or slowed processing when faced with an unusual concept) and what process do I use to interpret the thing you actually said.
The brain does not comprehend a sentence without trying to predict its meaning. They aren't orthogonal. They're intrinsically linked
> "Of course parsing has occurred. Your history with this person (and people in general) and what you know he likes to say, his mood and body language. Still probabilistic."
This is just redefining terms to be so vague as to make rationality inquiry or discussion impossible. I don't know what re-definition of parsing you could be using that would still be in any way useful or to what "probabilistic" in that case is supposed to apply to.
If you are saying that the brain is constantly predicting various things so that it automatically imbues some process that doesn't involve prediction as probabilistic then that is just useless.
> Common sentences are not necessarily structurally simpler and those still get processed faster so yes it's pretty true.
Well, I'll have to take your word for it as you haven't cited the paper but I would point to the reasonable explanation of different processing times that has nothing to do with parsing I gave further below. But I will repeat the vision analogy: If I had an experiment that showed that I took longer to react to an unusual visual sequence we would not immediately conclude that the visual system was probabilistic. The more parsimonious explanation is that the visual system is deterministic and some other part of cognition takes longer (or is recomputed) because of the "surprise".
> So? LLMs can do this. I'm not even sure why you would think probabilistic predictors couldn't.
It's not about capturing it in a statistics or having an LLM produce it, it's about explaining why that rule occurs and not some other. That's the difference between explanation and description.
> That's one way to do it yeah. Why would I 'believe in it' ? Computers that rely on it don't work anywhere near as well as those that don't. What evidence is there to it being anything more than a nice simplification ?
Because producing one token at a time cannot produce arbitrary recursive structures like sentences can be? Because no language uses linear order? Because when we express a thought it usually can't be reduced to a single start word and statistically most-likely next word continuations? It's also irrelevant what computers do, we are talking about what humans do.
> Why does a LLM that encounters a novel form of that sentence generate the question form correctly ?
That isn't the question. The question is why it's that way and not another. It's as if I ask why do the planets move in a certain pattern and you respond with "well why does my deep-neural-net predict it so well?". It's just nonsense.
> You are giving examples that probalistic approaches are clearly handling as if they are examples that probalistic approaches cannot explain. It's bizarre
No probabilistic model has explained anything. You are confusing predicting with explaining.
> I'm not implying it's unreasonable. I'm telling you the brain clearly does not process language this way because even structurally simple but uncommon syntax is processed slower.
I explained why you would expect that to be the case even with deterministic processing.
> What's the point of describing Newton's model as fiction if I still teach it in high schools and Universities? Because erroneous models can still be useful.
Well as I said this is also true of Einstein's theory of gravity and you presumably brought up the point to contrast universal grammar with that theory rather than point out the similarities.
> The brain does not comprehend a sentence without trying to predict its meaning. They aren't orthogonal. They're intrinsically linked
The brain is doing lots of things, we are talking about the language system. Again, if instead we were talking about the visual system no one would dispute that the visual system is doing the "seeing" and other parts of the brain are doing predicting.
In fact they must be orthogonal because once you get to the end of the sentence, where there are no next words to predict, you can still parse it even if all your predictions were wrong. So the main deterministic processing bit (universal grammar) still needs to be explained and the ancillary next-word-prediction "probabilistic" part is not relevant to its explanation.
What exactly is wrong? The fact that grammars are very limited models of human languages? My key thesis is that human languages operate in a way that non-probabilistic models (i.e. grammars) can only describe it in a very lossy way.
Sure, LLMs are also lossy but also much more scalable.
I've spent quite a lot of time with 90s/2000s papers on the topic, and I don't remember any model useful in generating human language better than "stohastic parrots" do.
As I said there are universal rules that human language processing follows (like hierarchical structure dependence); you can't have arbitrary syntax/grammars. It's true that science hasn't solved the main puzzles about how to characterize these rules.
The fact that statistical models are better predictors than the-"true"-characterization-that-we-haven't-figured-out-yet is completely irrelevant, just as it would be irrelevant if your deep-learning net was a better predictor of the weather: it wouldn't imply that the weather doesn't follow rules in physics, regardless of whether we knew what those rules were.
> As I said there are universal rules that human language processing follows (like hierarchical structure dependence); you can't have arbitrary syntax/grammars.
GP didn't say anything about grammars being arbitrary. In fact, his claim that grammars are models of languages would mean the complete opposite.
I don't think they have a consistent understanding of the word "grammar": they seem to use it in the grade-school sense (grammar for English, grammar for French) but then refer to Chomsky's universal grammar which is different (grammar rules that are common to all languages).
The main point of contention is their statement that "grammar follows language" which, in the Chomsky sense, is false: (universal) grammar/syntax describes the human language faculty (the internal language system) from which external languages (English, French, sign language) are derived, so (external) languages follow grammar.
Yes, I was a bit vague. If we are to be serious then we would have to come with definitions of grammar-based approaches vs stohastic approaches.
All I am saying is that grammars (as per Chomsky) or even high-school rule-based stuff are imperfect and narrow models of human languages. They might work locally, for a given sentence, but fall apart when applied to the problem at scale. They also (by definition) fail to capture both more subtle and more general complexities of languages.
And the universal grammar hypothesis is just that - a hypothesis. It might be convenient at times to think about languages in this way in certain contexts but that's about it.
Also, remember, this is Hacker News, and I am just a programmer who loves his programming/natural languages so I look at everything from a computational point of view.
All this comes down to is that language is not a solved problem. By the same logic why not just stop doing any research in physics and just put everything through a neural net which is going to give better predictions than the current best theories?
The fact that a deep-neural-net can predict the weather better than a physics-based model does not mean that the weather is not physics-based. Furthermore deep-neural-nets predict but don't explain while a physics-based model tries to explain (and consequently predict).
Moro is apparently a reference to Andrea Moro, but I can't find any writing of his titled 'The Subject "Chris"'.
https://www.cambridge.org/core/books/abs/signs-of-a-savant/i...
It's a separate study done by someone else:
https://www.youtube.com/watch?v=Rgd8BnZ2-iw&t=6735s
How do you explain syntactic islands, binding rules or any number of arcane linguistic rules that humans universally follow? Children can generalise outside of their training set in a way that LLMs simply cannot (e.g. Nicaraguan sign language or creolization)
Linguists however know that grammar is, indeed, important for linguistic comprehension. For example, the German "Ich sehe die Frau mit dem Fernglas" (I see the woman with the binoculars) is _unambiguous_ because "die Frau" and "mit dem Fernglas" match in both gender and case. If this weren't the case, it could be either "I see (the woman with the binoculars)" or "I see (the woman) with [using] the binoculars". Even in German you might encounter this e.g. if you instead had to say "Ich sehe das Mädchen mit dem Fernglas", as das Mädchen (the girl) is neuter rather than feminine in gender.
Both example sentences are equally ambiguous. The gender of the sentence's object is irrelevant. It does not affect the prepositional phrase.
Am German, can confirm. If there's a rule here, it exists only in the heads of linguists.
My point is that Grammar is to language what Newton was to gravity i.e useful fiction that works well enough for most scenarios, not that language has no structure.
The first 5 minutes of this video do good job of explaining what i'm getting at - https://www.youtube.com/watch?v=YNJDH0eogAw
Wow a 1 hour video by some crank, guess all of linguistics and cognitive science has been a waste of time.
I said you need only watch the first 5 minutes to see what I was getting at.
You would also think emphasizing grammar's usefulness would make it plain that I do not think it is a waste of time.
> For example, the German "Ich sehe die Frau mit dem Fernglas" (I see the woman with the binoculars) is _unambiguous_ because "die Frau" and "mit dem Fernglas" match in both gender and case. If this weren't the case, it could be either "I see (the woman with the binoculars)" or "I see (the woman) with [using] the binoculars".
My German is pretty rusty, why exactly is it unambiguous?
I don't see how changing the noun would make a difference. "Ich sehe" followed by any of these: "den Mann mit dem Fernglas", "die Frau mit dem Fernglas", "das Mädchen mit dem Fernglas" sounds equally ambiguous to me.
It is indeed ambiguous. I don't understand which alternative the parent is implying.
Die Frau and dem Fernglass don’t bind tightly though.
In my view, this phrase is only unambiguous to those who feel the preposition tradition, and all the heavy lifting is done here by “mit” (and “durch” in the opposite case, if one wants to make it clear). Articles are irrelevant and are dictated by the verb and the preposition, whose requirements are sort of arbitrary (sehen Akk., mit Dat.) and fixed. There’s no article-controlled variation that could change meaning, to my knowkedge it would be simply incorrect.
I’m also quite rusty on Deutsch, aber habe es nicht völlig vergessen, it seems.
I don’t disagree with any of your particular points, but I think you’re missing the forest here: their argument is primarily based in empirical results, not a theoretical framework/logical deduction. In other words, they’re trying to explain why LLMs work so well for decoding human neural content, not arguing that they do!
I think any reasonable scientist would a-priori react the same way to these claims as claims that neural networks alone can possibly crack human intuition: “that sounds like sci-fi speculation at best”. But that’s the crazy world we live in…
This is interesting. The blog post links several papers, and I recommend reading them.
Responses here however seem not commensurate with the evidence presented. Two of the papers[0][1] that provide the sources for the illustration in the blog post are about research conducted on a very small group of subjects. They measure neural activity when listening to a 30 minutes podcast (5000 words). Participants tried to guess next words. All the talk about "brain embedding" is derived from interpreting neuronal activity and sensor data geometrically. It is all very contrived.
Very interesting stuff from a neuroscience, linguistics and machine learning perspective. But I will quote from the conclusion of one of the papers[1]: "Unlike humans, DLMs (deep language models) cannot think, understand or generate new meaningful ideas by integrating prior knowledge. They simply echo the statistics of their input"
[0] Alignment of brain embeddings and artificial contextual embeddings in natural language points to common geometric patterns (https://www.nature.com/articles/s41467-024-46631-y)
[1] Shared computational principles for language processing in humans and deep language models (https://www.nature.com/articles/s41593-022-01026-4)
>"Unlike humans, DLMs (deep language models) cannot think, understand or generate new meaningful ideas by integrating prior knowledge. They simply echo the statistics of their input"
[Citation needed]. Actually, the paper does give a citation (G.F. Marcus, the Algebraic Mind), that is from 2019 according to their citation list (i.e. before gpt3), but actually seems to be from the early 2000s.
I view this as compelling evidence that current models are more than "stochastic parrots," because as the OP shows, they are learning to model the world in ways that are similar (up to a linear transformation) to those exhibited by the human brain. The OP's findings, in short:
* A linear transformation of a speech encoder's embeddings closely aligns them with patterns of neural activity in the brain's speech areas in response to the same speech sample.
* A linear transformation of a language decoder's embeddings closely aligns them with patterns of neural activity in the brain's language areas in response to the same language sample.
Yeah, I have always firmly maintained that there is less fundamental difference between LLMs and human brains than most people seems to assume.
Going a bit further, I'll speculate that the actions made by a human brain are simply a function of the "input" from our ~5 senses combined with our memory (obviously there are complications such as spinal reflexes, but I don't think those affect my main point). Neural nets are universal function approximators, so can't a sufficiently large neural net approximate a full human brain? In that case, is there any merit to saying that a human "understands" something in a way that a neural net doesn't? There's obviously a huge gap between the two right now, but I don't see any fundamental difference besides "consciousness" which is not well defined to begin with.
> I have always firmly maintained that there is less fundamental difference between LLMs and human brains than most people seems to assume.
What is your basis for this? Do you have any evidence or expertise in neuroscience to be able to make this claim?
> Neural nets are universal function approximators, so can't a sufficiently large neural net approximate a full human brain?
We do not understand the brain well enough to make this claim.
> but I don't see any fundamental difference besides "consciousness" which is not well defined to begin with.
Yeah besides the gaping hole in our current understanding of neuroscience, you have some good points I guess.
Computers can compute anything, with the only limitations being time, space, energy, programming, etc. So far, it seems like all of those things can be increased without bound. I see no reason to believe that there will be some fundamental limitation that prevents us from improving computers to the point that they can compute the equivalent of a human brain. Yes I agree we don't understand the human brain well enough yet, but empirically we can observe that AI agents and human agents behave in similar ways. Sure, it may not end up being a complete copy of a human brain, but if the inputs and outputs are the same, does it make a difference?
Ah, so there's two of us now :)
The UAT is a pretty weak result in practice. A lot of systems have the same property, and most of them are pretty poor approximators in practice. It may very well be that no reasonable amount of computing power allows approximating the "function of consciousness". Plus, if you're a certain kind of dualist the entire idea of a compact, smooth "consciousness" function may be something you reject philosophically.
I agree there are issues with the UAT, but I feel like my conclusion is still valid: a neural net, given the memories and senses that a humans has, is capable of approximating a human's response accurately enough to be indistinguishable from another human, at least to another human.
I philosophically reject the notion that consciousness is an important factor here. The question of whether or not you have a consciousness doesn't affect what I take away from this conversation, and similarly the question of whether an AI has a consciousness doesn't affect what I take away from my actions with it. If the (non-)existence of others' consciousnesses doesn't materially affect my life—and we assume that it's a fundamentally unanswerable question—why should I care other than curiosity?
>a neural net, given the memories and senses that a humans has, is capable of approximating a human's response accurately enough to be indistinguishable from another human, at least to another human.
That doesn't remotely follow from the UAT and is also almost certainly false.
I am not saying that follows from the formal UAT, I'm saying that in general, if we assume that the human brain is simply producing outputs in response to inputs, I have no reason to believe that a neural net couldn't be constructed to approximate the same outputs given the same inputs. And specifically, the threshold of "good enough approximation" I chose is effectively just a stronger form of the turing test. AI has arguably already passed a weaker form of the turing test (most people in the world cannot tell the difference between a human-written and an AI-written article, for example).
> current models are more than "stochastic parrots"
I believe the same, and also I'm willing to accept that the human brain can intentionally operate in an stochastic parrot mode.
Some people have the ability to fluently speak non-stop, completely impromptu. I wonder if it's similar to an LLM pipeline, where there'a constant stream of thoughts being generated based on very recent context, which are then passed through various output filters.
That certainly would explain how Trump can rattle on for hours and hours, despite a notable lack of high quality training data.
I do not understand what you find convincing about this that changes your mind.
We have a closed system that we designed to operate in a way that is similar to our limited understanding of how a portion of the brain works, based on how we would model that part of the brain if it had to traverse an nth-dimensional array. We have loosely observed it working in a way that could roughly be defined as similar to our limited understanding of how a portion of the brain works given that limitation that we know is not true of the human brain, with a fairly low confidence level.
Even if you put an extreme level of faith into those very subpar conclusions and take them to be rigid... That does not make it actually similar to the human brain, or any kind of brain at all.
I view this as further evidence that we less different from "stochastic parrots" than we would like to believe.
<user corysama squawks indignantly and flies away...>
The correlations are 0.25-0.5, which is quite poor (Gaussian distribution plots with those correlations look like noise). That's before analyzing the methodology and assumptions.
Correlation of 0.25-0.5 being poor is very problem dependent.
For example, in difficult perceptual tasks ("can you taste which of these three biscuits is different" [one biscuit is made with slightly less sugar]), a correlation of 0.3 is commonplace and considered an appropriate amount of annotator agreement to make decisions.
Yes for certain things like statistical trading (assuming some kind of "nice" Gaussian-like distribution) where you have lots of trades and just need to be more right than wrong it's probably useful.
Not here though, where you are trying to prove a (near) equivalence.
Isn't correlation a wrong way of measuring statistical dependence anyway? I thought that entropy-based method like Relative Entropy did this better.
Yeah, Taleb has some pretty convincing arguments that correlation is generally bad/misleading and entropy is better.
> I view this as compelling evidence that current models are more than "stochastic parrots,"
More evidence against "stochastic parrots"
- zero shot translation, where LLMs can translate between unseen pairs of languages
- repeated sampling of responses from the same prompt - which shows diversity of expression with convergence of semantics
- reasoning models - solving problems
But my main critique is that they are better seen as pianos, not parrots. Pianos don't make music, but we do. And we play the LLMs on the keyboard like regular pianos.
Galaxies, hurricanes, whirlpools. Black Spleenwort, Barnsley Fern. Power laws every bloody where.
(My utterly uninformed knee-jerk reaction here, but even if I was a true believer I don't think I'd reach for "compelling".)
natural consequence when results are the sum of random processes.
if you add random rolls, you get a gaussian, thanks central limit theorem.
if you sum them, you get a lognormal distribution, which approximates a power law up to a cutoff
I view this as the language processing is similar but we’re not talking about thinking, just processing language.
I see this as maybe it’s not a statistical parrot, but it’s still only some kind of parrot, Maybe a sleep deprived one.
There isn't really any clear delineation between 'thinking' and 'language processing' in the brain.
That's 100% false, dogs and pigeons can obviously think, and it is childish to suppose that their thoughts are a sequence of woofs or coos. Trying to make an AI that thinks like a human without being able to think like a chimpanzee gives you reasoning LLMs that can spit out proofs in algebraic topology, yet still struggle with out-of-distribution counting problems which frogs and fish can solve.
How human brains process thoughts is non-uniform across population. There’s imagery, written language, sound, speech, tactile, etc. Not everything that you think about is readily expressible in your language. There are definitely people with and without “internal screen”, and probably few more types with/without X, where X is a set of things we’ve never talked about, either assuming everyone has it or not realizing that it’s a non-mandatory part of how you think.
That's not really what I'm saying. What I'm saying is how does the brain look when you do both ? Is there a clear difference ? There's no 'thinking mode' followed by a 'language processing mode'.
Language processing is thinking as far the brain is concerned and there's no evidence that these are 2 cleanly separated processes whether you 'think' in words or not.
That's not true, the Moro experiments show they use different capacities as do similar experiments on people who have certain severe cognitive deficiencies that don't impact language processing (e.g. the subject "Chris")
My argument is that “thinking” and “language processing” are not two sequential or clearly separated modes in the brain but deeply intertwined.
Language is a lot more than parsing syntax, whatever your thoughts are on the matter, even LLMs are clearly doing more than that. Are there any experiments where subjects had severe cognitive deficiencies and language in its full breadth or maybe i should say communication ? came out unscathed ?
The chris experiments don't seem to go into much detail in that front.
I just gave one: "Chris". Here's Chomsky describing the "Chris"-experiments ([1]) as part of a broader answer about how language is distinct from general cognition which I paraphrased above.
> That doesn't contradict the argument that “thinking” and “language processing” are not two sequential or clearly separated modes in the brain but deeply intertwined.
It's not an argument, it's an assertion, that is, in fact, contradicted by the experimental evidence I described (Moro and "Chris"). Of course they are "deeply intertwined" but because of the evidence it's probably an interface between two distinctive systems rather than one general system doing two tasks.
https://youtu.be/Rgd8BnZ2-iw?si=_UdBfG1InmpjbDPB&t=6735
Like i said, these experiments stop at a vague 'Chris can still learn languages'. No comment on actual proficiency or testing. For all i know i can't have a meaningful conversation with this guy beyond syntactically correct speech. Or maybe the best proficiency he's ever managed is still pretty poor compared to the average human. I have no idea.
There's no contradiction because i never argued/asserted the brain didn't have parts tuned for language, which is really all this experiment demonstrates.
It's irrelevant to the experiment: he could learn synthetic language with human-like grammar and could not learn synthetic languages with non-human-like grammar. Regular people could solve the non-human-like languages with difficulty. Because his language ability is much higher than his general problem solving ability it gives strong evidence that 1. human language capacity a special function, not a general purpose cognitive function and 2. it obeys a certain structure.
> There's no contradiction because i never argued/asserted the brain didn't have centers tuned for language, which is really all this experiment demonstrates.
I don't know what you are trying to say then.
>Because his language ability is much higher than his general problem solving ability
I don't see how you can say his language ability is much higher than his general problem solving ability if you don't know what proficiency of language he is capable of reaching.
When you are learning say English as a second language, there are proficiency tiers you get assigned when you get tested - A1, A2 etc
If he's learning all these languages but maxing out at A2 then his language ability is only slightly better than his general problem solving ability.
This is the point i'm trying to drive home. Maybe it's because i've been learning a second language for a couple years and so i see it more clearly but saying 'he learned x language' says absolutely nothing. People say that to mean anything from 'well he can ask for the toilet' to 'could be mistaken for a native'.
>I don't know what you are trying to say then.
The brain has over millions of years been tuned to speak languages with certain structures. Deviating from these structures is more taxing for the brain. True statement. But how on earth does that imply the brain isn't 'thinking' for the structures it is used to ? Do you say you did not think for question 1 just because question 2 was more difficult ?
As I said it's not relevant but if you wanted to know you could put in the bare minimum of effort into doing your own research. From Smith and Tsimpli's "The Mind of a Savant": "On the [Gapadol Reading Comprehension Test] Christopher scored at the maximum level, indicating a reading comprehension of 16 years and 10 months". They describe the results of a bunch of other language tests, where he scores average to above average, including his translations of passages from a dozen different languages.
> But how on earth does that imply the brain isn't 'thinking' for the structures it is used to ? Do you say you did not think for question 1 just because question 2 was more difficult ?
The point isn't to define the word "thinking" it is to show that the language capacity is a distinct faculty from other cognitive capacities.
I need you guys help.
Is there some theorem stating something like random few-hot vectors can always be combined linearly to match any signal with a low p-value?
I thought I encountered it sometimes in my experiments and that this might be happening in this llm x neuroscience trend of matching llm internals to brain signals.
Reply to myself to add more context as I'm at work: what I mean is that I suspect that the average "shape" of vectors in the llm is dependent on the depth inside the llm. In effect: vectors initially are gaussian-random but as the model gets trained they end up closer and closer to few-shot vectors, that are in effect easier to correlate to any signal including brain signals.
In effect: any kind of llm activation could be correlated to brain signals even though it's just a sophisticated p mapping and would not correspond to anything useful.
Iirc the team at meta ai of jean remi king shown that even randomly trained LLMs could be fitted.
Did they try to predict a person's thoughts ? That would be more compelling to me than 500ms delay between the model prediction and the spoken word.
It is somewhat ironic that they had to use an OpenAI model for this research. At the same time, this gives nice continuity from earlier works that demonstrated similar, smaller scale, results using GPT-2.
Could this lead us to being able to upload our brains onto computers? To kill death. Very cool.
Would your brain uploaded onto computer still be you?
A copy of you, not the same instance.
Do you create a new instance every time you awaken from sleep? If not, why not?
Because the same you has been active the whole time. Your brain didn't cease function through sleep. Kind of like sleep mode vs. rebooting a computer.
You're mostly driven by bodily conditions and hormones. A computer recording of you isn't going to behave the same because it has no particular motivation to behave any specific way in the first place.
If you can simulate a brain, you can simulate hormones.
How do you decide what to set them to? You don't have intrinsic motivations anymore!
Eh? You can have as many intrinsic motivations as you like. You just copy the procedure from the original brain.
The original ones are wrong though (well, not adaptive). Hunger is pointless when you can't eat.
Hans Moravec has an incrementalist approach to this which I think actually does accomplish eliminating the copy issue.
To be noted, if you accept a brain upload made gradually you should also accept a brain upload made discontinuously. If the same brain state comes out, the process can't actually matter.
No.
ok, that pretty cool research from Google, hope this leads to even more discoveries around the brain, hopefully it's time we get a better understanding of our brains and how to hack them.
Why do you want hackable brains?
Due to my current condition. I feel that I could do more both for myself and the world but unfortunately motivation plays a big role or otherwise I have to trick myself into feeling stressed in order to do things like work that might be boring or feeling observer.
So many reasons: absorb information faster; improve spatial visualization; motivation, intrinsic motivation hacking ; simulations...etc
Give me the code to my brain and let me edit it, with version control please :D
meditation, if you want to try a drug free approach.
Make it simple. Stare at a clock with a big second hand. Take one breath every 15 seconds. Then, after a minute or so, push it out to 20 seconds, then one every 30 seconds.
For the 30, my pattern tends to stabilize on inhale for 5-7, hold for 5-7, and then a slow exhale. I find that after the first exhale, if I give a little push I can get more air out of my lungs.
Do this once a day, 7-10 minutes session, for a week, and see if things aren't a little different.
Have you ever considered or tried nootropics before?
Nootropics may help with stimulation, as well as memory and cognition in general. There is a whole community dedicated to it with effective stacks.
The problem is that it would also let others edit your brain. Even indirectly.
You can do that already, it is called behavior modification. :P And medications!
Brains are already hackable in multiple senses. Namely through exogenous chemicals, electrical stimulation, hypnosis, and combinations of these. They aren’t necessarily reverse engineerable, which is what computational models like LLM-tied ones would enable, but they are hackable. We have the Cold War to thank for that.
Turn off chronic pain
To make cleaning the house as fun as doing drugs.
upgradez 8D
So its neuronal activity from intercranial electrodes, during an active conversation. And, they found there are causal chains type patterns in the neuronal activity to produce the speech (and presumed thought) in the conversation which compare "favourably" with the LLM.
Ok. I buy it. The sequencing necessary to translate thought to words, necessarily imposes a serialisation which in consequence marshalls activity into a sequence, which in turn matches the observed statistically derived LLM sequences.
I tend to say the same things. I often say "this AGI is bullshit" and the ocurrence of Bullshit after the acronym AGI is high. I would be totally unsurprised if the linear sequence of neuronal signalling to both think, and emote as speech or even "charades" physical movements to say "AGI is bullshit" would not in some way mimic that of an LLM, or vice versa.
My mildly grumpy opinion: this is not the first paper to show correlation between brain activity and the layers of a transformer. I know that Wang et. al (2024) have done it last year[1], but I doubt they're the only ones - I just have them in my head because I was reading their paper last week. Bonus fact: Wang et. al's paper also shows that test scores are a relevant factor in said correlation.
The point that always comes to mind is: correlation does not imply causation. I guess the main contribution would be a better mapping of the areas of the brain associated with speech production, but jumping from "these two things correlate" to "these two things are essentially the same" seems to me a bit of a stretch.
[1] https://arxiv.org/pdf/2407.10376
[dead]