OpenRouter brings the hub & spoke model to LLMs
Stymied by not being able to run Llama 4 on my desktop’s 24GB of VRAM, I signed up for OpenRouter. OpenRouter serves as a marketplace where providers can sell LLM generated tokens to folk like us. Every popular model, closed or open, is available and the prices are low. Instead of having an account with each provider, you can get away with just the one.
Gptel supports OpenRouter out of the box. I set it up with the API key pulled from 1Password:
…Zafón's Intertextuality
The first sentence of Carlos Ruiz Zafón’s La sombra del viento, a book I’d started knowing little about, reads: Todavía recuerdo el día que mi padre me llevó por primera vez al cementerio de los libros olvidados. Familiar and resonant 1, this Gothic remix of Cien anos de soledad’s opener2 sends the reader down an intertextual rabbit-hole. Just a few pages further in, and we find ourselves in an impossible library filled with forgotten books and mirrored galleries. The references and Borgesian symbols aren’t subtle: Zafón is taking the postmodern approach by making La sombra del viento that which it describes: a library of other works, a nesting doll of stories.
…Seeing Invisible Cities
I read Invisible Cities
while traveling to Toronto. I’d seen Toronto once before, in the summer, and was struck both times by the amount of construction, the sheer number of cranes hoisting the skyline. This was my first time seeing the ivy-bricked university white with snow. It felt like Calvino as deconstructing my perspective of the city.
Apparently, there is a tradition of illustrating the invisible cities. Some links:
- Pooja Sanghani-Patel Part 1, Part 2, Part 3
- Ethan Mollick Illustrating Cities, Writing Cities
- Karina Puente on ArchDaily (paywalled)
- Dave McKean Folio edition (physical book)
Nabokov's second-rate brand of English
In Lolita, Nabokov’s uses his afterword’s parting paragraph to remind us he’d pulled the trick off one hand behind his back (written Nov. 12 1956):
After Olympia Press, in Paris, published the book, an American critic suggested that Lolita was the record of my love affair with the romantic novel. The substitution “English language” for “romantic novel” would make this elegant formula more correct. But here I feel my voice rising to a much too strident pitch None of my Amencan friends have read my Russian books and thus every appraisal on the strength of my English ones is bound to be out of focus. My private tragedy, which cannot, and indeed should not, be anybody’s concern, is that I had to abandon my natural idiom, my untrammeled, rich, and infinitely docile Russian tongue for a second-rate brand of English, devoid of any of those apparatuses the baffling mirror, the black velvet backdrop, the implied associations and traditions-which the native illusionist, fractails flying, can magically use to transcend the heritage in his own way.
…
Jeff Dean and Noam Shazeer interviewed by Dwarkesh Patel
Solid Dwarkesh interview with the legends Jeff Dean (Google’s Chief Scientist) and Noam Shazeer (co-inventor of LLMs). Here are some takeaways I found interesting:
- We have models that can deal with millions of tokens of context, which is quite a lot. It’s hundreds of pages of PDF, or 50 research papers, or hours of video, or tens of hours of audio, or some combination of those things, which is pretty cool. But it would be really nice if the model could attend to trillions of tokens. Could it attend to the entire internet and find the right stuff for you? Could it attend to all your personal information for you? (Dean)
- We have a little bit of that in Mixture of Experts models, but it’s still very structured. I feel like this kind of more organic growth of expertise, and when you want more expertise of that, you add some more capacity to the model there and let it learn a bit more on that kind of thing. (Dean)
- [Current MoE1] is still a very regular structure… I want something more organic, where if we need more capacity for math, we add math capacity. If we need more for Southeast Asian languages, we add it. Let each piece be developed somewhat independently, then stitched together. (Dean) 2
- this notion of adapting the connectivity of the model to the connectivity of the hardware is a good one. I think you want incredibly dense connections between artificial neurons in the same chip and the same HBM because that doesn’t cost you that much. But then you want a smaller number of connections to nearby neurons. So, like a chip away, you should have some amount of connections and then, like many, many chips away, you should have a smaller number of connections where you send over a very limited kind of bottlenecky thing. (Dean)
- We’re already doing multi-data-center training fully synchronously. If each step is a couple seconds, the 50ms latency between data centers doesn’t kill you, as long as there’s enough bandwidth. (Dean)
- A related thing is I feel like we need interesting learning techniques during pre-training. I’m not sure we’re extracting the maximal value from every token we look at with the current training objective. Maybe we should think a lot harder about some tokens. (Dean)
- I think the good news is that analyzing text seems to be easier than generating text. So I believe that the ability of language models to actually analyze language model output and figure out what is problematic or dangerous will actually be the solution to a lot of these control issues. (Shazeer)3
-
Noam Shazeer is first author on the original mixture of experts paper ↩︎
…