1.
I didn’t give Fast, scalable, clean, and cheap enough the attention it deserved when it came out. To me, it was making an obvious point: solar and batteries are the best energy source for off-grid datacenters because they’re much faster to build. Energy costs aren’t a big factor for datacenters (~5% of costs) so it’s okay that solar+batteries are a little more expensive than natural gas.
To be specific, a 90% renewable data center would have an energy cost of $109/MWh, while a gas plant would cost $86/MWh1. But they admit they didn’t include many of the optimizations available to a renewable powered data center. These include using Erthos or PEG solar systems to lower capital costs, optimizing the datacenter to run on DC power, and reducing backup generator overcapacity. They estimate that even without redesigning for DC, costs would fall to $97/MWh for a 90% renewable data center.
They use conservative estimates for solar module and battery costs which I think could fall by 50% or more by 2030. If nothing else changed, this would only lower costs by 5-10% because the other costs are significant. But we can probably innovate on the other cost components as well.
There are many ways to get further cost reductions for renewable data centers. Other ways to supplement intermittent solar energy include:
Instead of colocating a backup generator, sip energy from the grid in the winter.
Long-term energy storage using iron-air batteries, thermal storage, or a host of other technologies.
Import solar fuels and park generators on site for part of the year.
Add windmills, reducing the number of low-power days.
Battery trucks that drive from other solar farms.
Ground-source cooling
Absorption refrigerators to convert waste heat into refrigeration.
Run the chips hotter to lower cooling costs. Alternatively, cool the chips to cryogenic temperatures to increase energy efficiency.
The broader point being that if 90% renewable energy can work for data centers, they can work for just about everything else. The grid can switch to 90% solar and batteries and industrial processes can orient around renewables for their energy needs.
Once our emissions are coming from a few natural gas peaker plants, reaching net-zero becomes a lot easier. Just capture the flue gas and offset the emissions from a few challenging sectors like air travel and aluminum production.
2.
There’s been a concerning decline in marriage over the last few decades. This is a sign that people aren’t finding lifelong companions. It’s also a key reason for fertility decline; 75% of the fall in fertility since 2007 comes from a decline in marriage.
Why aren’t people getting married? A new paper points out that women with college degrees have had similar marriage rates for decades, but women without college degrees have seen declining marriage rates. They argue that “… worsening male outcomes primarily undermine the marriage prospects of non-college women.”
Much of this decline comes from non-college women marrying fewer non-college men. Why might this be? For one, outcomes for non-college men have declined for decades. This is partially a composition effect, more people are going to college in recent decades, so the people who aren’t are more negatively selected.
Tax disincentives in the welfare system are another factor. There are many situations where raising household income sharply reduces welfare benefits. Tying the knot might not make financial sense if the man isn’t making a large enough contribution. So people could still be finding love, just not going to the courthouse2.
This paper takes a different angle, asking why fewer young adults are having sex. Declines in drinking frequency are a big driver for both men and women. For men, video games and living with parents also have a big impact.
The story rhymes with one of my favorite pieces by Andrew Kortina, Kinky Labor Supply and the Attention Tax. It blames cheap, high quality digital media for reducing men’s incentives to participate in the labor market and thus undermines their participation in the marriage market.
All of my clever ideas for dating markets are useless if men can’t be suitable partners, so how do we fix this? Pushing more men into college isn’t a good solution.
I’m also wary of adding new regulations to achieve policy goals. Instead, there are a few reforms and projects that might be worthwhile. Streamlining and expanding the welfare system to avoid work disincentives is a good start. The housing theory of everything applies here too. State backed construction projects could increase employment for non-college men, what megaprojects need building?
But really staring into the abyss requires recognizing that better and cheaper entertainment media is interfering with our social fabric. I’ll never suggest regulating internet use, but maybe giving students a Defense Against the Dark Web class would help. I’ll have to think more about this.
3.
The steam engine was in use a full century before we developed the thermodynamics to understand it. Machine learning is alchemy, when will it mature into chemistry?
That era is closer than you think. There are at least half a dozen strands of work exploring the edges of a unified theory of neural networks.
The scaling laws that I covered recently are empirical regularities that motivate further study. The gap in our understanding is palpable. We should be able to say something like “If my model starts at a loss of 4 bits/token and the entropy is 2 bits/token, I need X tokens of data to fully specify a model with Y bits stored in the parameters.”
Singular learning theory, which models neural networks in a simplified Bayesian setting, has made some progress here. In this setting, you can prove that the true dimension (i.e. number of parameters) is different from the actual number of parameters in your model. It tells us how much information a neural network can store and how well that network generalizes.
But while SLT can explain a lot of the behaviors of neural networks, our grand theory should provide detailed advice on how to conduct a training run. The Tensor Programs approach looks at the behavior of neural networks as the size (often the width) approaches infinity. Apparently, wide-enough neural networks share the same optimal hyperparameters, allowing you to tune the hyperparameters of a smaller network and transfer them to your real training run3.
Predating the machine learning revolution are two theories of learning that will probably get swept up in the new paradigm. Computational mechanics is the closest to a true “thermodynamics of learning” it focuses on a highly simplified model, a linear steam of bits and Markov chains as hypotheses. It gives us various terms to talk about apparent entropy, memory, and information. Then there’s all the active inference work, which set out to understand the how humans think and act using a sort of Bayesian model. Much of the terminology is shared with singular learning theory, but I’m not sure how similar they really are.
This is what the birth of a field looks like. People grasping at their part of the elephant. Like computational learning theory before it, this new theory will spawn better machine learning methods. But I hope it takes us much farther, to a deep understanding of intelligence, learning, and agency. It may be the last grand theory that humanity discovers.
EDIT: see also Foundations of algorithmic thermodynamics
Everything else
Solar is happening in Africa, from this presentation by Nat Bullard:
The Haverly Plan: Nuclear Explosions for Large Scale Carbon Sequestration
Ultrahigh Specific Strength by Bayesian Optimization of Carbon Nanolattices. Starting with a 3D printing technique to make nanostructures out of solid carbon, they optimize the lattice structure to make a material with high specific strength. Specific strength is a critical factor in the economics of space tethers. The characteristic velocity of their material is a disappointing 1.4 km/s but similar techniques could be applied to carbon fiber, glass, or Kevlar materials.
FPGAs are (not) Good at Deep Learning. I love all the weird things you can do with FPGA’s, but this lecture was a good dose of realism for me. FPGA’s can’t compete with ASIC’s for deep learning. Instead, they can play a support role in all the non MatMul parts of training. There’s also some cool stuff about co-design of algorithms and hardware on ASIC’s.
Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch. Deepmind offers some improvements to distributed LLM training, lowering the bandwidth requirements 100x. Bandwidth is a big bottleneck for training runs, and optical interconnects contribute significantly to costs. Fixing this is the goal of companies like Lightmatter. Clever algorithms and hardware will probably fix this.
Chad Jones has a new paper: How much should we spend to reduce A.I.’s existential risk? The estimates come out to ~1% of GDP which is far, far, higher than what we’re doing now.
Risk & Progress has a nice piece on efficient tax systems such as VAT and DBCFT. VAT seems sensible and pretty efficient another reminder that we have tax policy basically figured out. But what if I want a more progressive system than the VAT? Lund mentions that you can offer a “prebate” or vouchers for common necessities like groceries. This is nice because it’s better to make the payments system more progressive than force the tax system to be progressive. It also makes a VAT more politically feasible.
The Obvious-Once-You-Think-About-It Reason Why Education Cuts Fertility. People delay having kids until after they finish their education. It reminds me of an old idea I haven’t bothered to write up: what if there were colleges designed around parents?
This video of how a kitchen works in a Michelin-starred restaurant is neat. Communities of practice are a powerful thing. This collaboration, shared language, and culture is a metaphor for every human endeavor, not just snooty food. See point 1 here for something similar about the Tetris community.
Interestingly, combining this with flue gas capture would get you to $101/MWh. Better than renewables for now, but I think more expensive long term.
There’s also a small cultural effect. Our society has become more secular and emphasizes women’s careers more (and that’s good). This is why you see a small decline in marriage across all groups.
Though see this paper for some empirical contradiction of the tensor programs predictions.
Who know which sets of technology will evolve in a taxation of net CO2 emissions world!
An easier way to have a progressive consumption tax than a VAT would be to have a progressive consumption (income minus savings and "good" consumption) tax at progressive rates. Let the VAT replace the wage taxes that (all too partially) fund the social insurance system.