Nabokov's second-rate brand of English
In Lolita, Nabokov’s uses his afterword’s parting paragraph to remind us he’d pulled the trick off one hand behind his back (written Nov. 12 1956):
After Olympia Press, in Paris, published the book, an American critic suggested that Lolita was the record of my love affair with the romantic novel. The substitution “English language” for “romantic novel” would make this elegant formula more correct. But here I feel my voice rising to a much too strident pitch None of my Amencan friends have read my Russian books and thus every appraisal on the strength of my English ones is bound to be out of focus. My private tragedy, which cannot, and indeed should not, be anybody’s concern, is that I had to abandon my natural idiom, my untrammeled, rich, and infinitely docile Russian tongue for a second-rate brand of English, devoid of any of those apparatuses the baffling mirror, the black velvet backdrop, the implied associations and traditions-which the native illusionist, fractails flying, can magically use to transcend the heritage in his own way.
…
Jeff Dean and Noam Shazeer interviewed by Dwarkesh Patel
Solid Dwarkesh interview with the legends Jeff Dean (Google’s Chief Scientist) and Noam Shazeer (co-inventor of LLMs). Here are some takeaways I found interesting:
- We have models that can deal with millions of tokens of context, which is quite a lot. It’s hundreds of pages of PDF, or 50 research papers, or hours of video, or tens of hours of audio, or some combination of those things, which is pretty cool. But it would be really nice if the model could attend to trillions of tokens. Could it attend to the entire internet and find the right stuff for you? Could it attend to all your personal information for you? (Dean)
- We have a little bit of that in Mixture of Experts models, but it’s still very structured. I feel like this kind of more organic growth of expertise, and when you want more expertise of that, you add some more capacity to the model there and let it learn a bit more on that kind of thing. (Dean)
- [Current MoE1] is still a very regular structure… I want something more organic, where if we need more capacity for math, we add math capacity. If we need more for Southeast Asian languages, we add it. Let each piece be developed somewhat independently, then stitched together. (Dean) 2
- this notion of adapting the connectivity of the model to the connectivity of the hardware is a good one. I think you want incredibly dense connections between artificial neurons in the same chip and the same HBM because that doesn’t cost you that much. But then you want a smaller number of connections to nearby neurons. So, like a chip away, you should have some amount of connections and then, like many, many chips away, you should have a smaller number of connections where you send over a very limited kind of bottlenecky thing. (Dean)
- We’re already doing multi-data-center training fully synchronously. If each step is a couple seconds, the 50ms latency between data centers doesn’t kill you, as long as there’s enough bandwidth. (Dean)
- A related thing is I feel like we need interesting learning techniques during pre-training. I’m not sure we’re extracting the maximal value from every token we look at with the current training objective. Maybe we should think a lot harder about some tokens. (Dean)
- I think the good news is that analyzing text seems to be easier than generating text. So I believe that the ability of language models to actually analyze language model output and figure out what is problematic or dangerous will actually be the solution to a lot of these control issues. (Shazeer)3
-
Noam Shazeer is first author on the original mixture of experts paper ↩︎
…
Resources Against the Day
Like most of Pynchon’s “histories”, Against the Day [AtD] is better with friends. Here are some of the resources that helped me find my way.
There are quite a few reading guides:
- The Pynchon Wiki: I particularly appreciate the page-by-page the spoiler-free annotations
- Mapping the Zone: A podcast which began covering AtD in 2024. They should wrap some time in 2026.
- Otolithium’s Reading Guide: A PDF with one-pagers for each chapter aiming to answer “where am I?”
- The Chumps of Choice: “A Congenial Spot for the Discussion of Against the Day, by Thomas Ruggles Pynchon, Cornell ‘59, and Any Other Damned Thing That Comes Into Our Heads”
AtD has also received Pynchonite analysis. Here’s a small selection on mathematical symbolism:
…A takeaway from How to Know a Person
I reflexively story-share when someone else opens up, to show that I’ve had a similar experience, I can empathize, and some of my own vulnerability. In How to Know a Person, David Brooks specifically calls out this conversational gambit, noting that it can shift focus away from the speaker. Instead, he suggests giving space to the speaker, continuing to be a listener.
Lo and behold, I was on the phone with my Mom – she was recounting coming out of Goodwill with a bag-full of books, much to my Dad’s dismay. My conversational reflex was to jump in with my own book-related story, but this time I tried to leave the spotlight on her with some “books seem to run in the family” attempt at a volley. And, Mom told me, it does run in the family. Her father grew up in a farm in Illinois, and his father in turn loved books. All details new to me.
…Bullets on Gray Matters: A Biography of Brain Surgery
Cancer got its biography with the Emperor of All Maladies, and now brain surgery gets a similarly subtitled treatment with Dr. Theodore H. Schwartz’s Gray Matters: A Biography of Brain Surgery. I was a neuroscience undergrad until 2010, but didn’t keep up – this book is full of new findings and old anecdotes:
- Craniectomy bone flaps are often stored in the patients abdomen to preserve them until needed.
- Lobotomies made psychosurgery infamous, but it’s seeing a revival with more modern techniques and better knowledge of how to apply them. Modern psychosurgery is still often ablating parts of the brain.
- Dr Jose Delgado, a neurophysiologist, stood in the ring with a charging, stimoceiver implanted Córdoban bull. Delgado stopped the bull in its tracks via a neural shock.
- The Utah Array is a well characterized, long-lasting 96 electrode BCI (up to six are implanted) that was used starting in 2008 to enable paralyzed patients to control electronic arms.
- NeuralLink has 1024 electrodes across 64 threads, must be surgically implanted using a robot, and has been implanted in two quadriplegic patients in Canada. The first operation had issues with threads retracting, but there is little information otherwise.
Tyler Cowen will be doing a Conversation with Dr. Schwartz. I’d ask:
…