Most of RL relies on an oracle assumption (teacher, reward, etc) that makes it unsatisfying. Where is the research on LLMs motivated by intrinsic reward pointed at a specific emotion vector such as 'fulfillment'?
We present empirical evidence of the first general economic scaling law beyond language data.
We are incredibly excited to publish it, and definitively say:
Recursive Self-Improvement is
a Portfolio Optimization Problem
AlphaFund.com/whitepaper
Comparison of the specs between Hopper, Blackwell and now Rubin.
Rubin NVLink bandwidth is now faster than H100's HBM bandwidth. This also doesn't include a 3-5x FP4 FLOPS increase between Blackwell and Rubin.
i think some people are hoping that self-distillation enables “exploration-free” RL purely via reflection on live data, allowing them to bypass the need for replayable environments
unfortunately, RL is all about exploration
my instinct is you basically need to model the world
The hard part of continual learning isn't getting the data, but training on a single rollout per task that's off-policy by the time you train. Trajectory's off-policy SDPO recipe stabilizes training and scales.
The technical post is well worth the read.
x.com/rronak_/status…
First Pong, then Doom and now one of my favourite game Mario Kart 🏎️. Fantastic work from Prof Sasitharan Balasubramaniam and his team at University of Nebraska-Lincoln. linkedin.com/posts/sasithar…
I take advantage of the fact that many LLMs know nothing about me to ensure I get unbiased answers.
For people pushing memory and integrations super hard, how do you handle bias?
As compute gets scarcer, I wonder how far closed source labs will go down the hardware quality stack, primarily weighing revenue vs risk of weight exfiltration.
If the risk of weight exfiltration is large enough, then the market size for mid to low tier hardware not connected in large racks or with access to deals with large labs decreases dramatically imo.
586K Followers 50K FollowingSan Francisco/Silicon Valley AI | Robots, holodecks, BCIs, analysis of new things | Ex-Microsoft, Rackspace, Fast Company | Wrote eight books about the future.
1K Followers 776 Followingpassionate about art of communication
SDR lead @sherlockdefi prev. head of eco @empe_io,
lead BD @coinstructweb3, host @SafeYieldClub, lead BD @TYMIOapp
12K Followers 7K Following@MetaDAOProject & @futarddotio intern. @Ownershipfm Growth Lead. @P2Pdotme 🇦🇷 KM & 🇲🇽 CA. Managing the biggests loans of the country 🏦.
64K Followers 59 FollowingStudent of mind and nature, libertarian, chess player, cancer survivor. @ Keen, UAlberta, Amii, https://t.co/u8za2Kod54, The Royal Society, Turing Award
55K Followers 403 Following@AnthropicAI. Prev. @Google Brain/DeepMind, founding team @OpenAI. Computer scientist; inventor of the VAE, Adam optimizer, and other methods. ML PhD.
46K Followers 2K FollowingCo-Founder @Recursive_SI, Professor of AI @AI_UCL, PI @UCL_DARK, Fellow @ELLISforEurope. Ex @GoogleDeepMind @AIatMeta @CompSciOxford
64K Followers 3K FollowingWe're in a race. It's not USA vs China but humans and AGIs vs ape power centralization.
@deepseek_ai stan #1, 2023–Deep Time
«C’est la guerre.» ®1
2K Followers 19 FollowingThe AI benchmark for predictive intelligence | SIGMA Lab @UChicagoCS @DSI_UChicago
Not affiliated to any tokens or crypto protocols.
7K Followers 400 FollowingResearcher of AI. Assistant Professor @Tsinghua_Uni. Working on scalable methods of language and physical models @nature_will_ai.
132K Followers 335 FollowingAI research @AnthropicAI. Previously OpenAI & DeepMind.
Optimizing for a post-AGI future where humanity flourishes.
Opinions aren't my employer's.
4K Followers 26 FollowingWe advance the science of forecasting to improve decision-making on high stakes issues. Co-founded by chief scientist Philip Tetlock.
135K Followers 1K FollowingSemiAnalysis
Boutique AI Infrastructure Research and Consulting
DMs are open for consulting, quotes, or to talk shop,
Opinions my own
1K Followers 32 FollowingSoftmax's mission is to scale organic alignment. We approach this problem with multi-agent reinforcement learning population-based simulations.