Q*, Exponentially Faster Language Modeling, ZipLoRA, Orca 2, Aligning Diffusion Models and more...
Unpacking the hype cycle
This week, I decided to look into all this Q* commotion (briefly) and have a treasure trove of curated resources just for you. A new deep dive will be out next week.Â
I need your help...Â
…to make Gradient Ascent more valuable. Could you spare less than 5 minutes to share your input through a brief survey?
Your insights are crucial and will help me refine this newsletter to align more closely with your interests.Â
Exclusive Opportunity: As a token of my gratitude, a select few respondents will be randomly selected for a complimentary 30-minute one-on-one strategy session with me. Tailored to your career and interests in AI, this session is a chance to delve deeper into the topics that matter most to you.
Your perspective is invaluable, and I eagerly anticipate your input!
With that out of the way, let's get started.
The Q* Conundrum - Skynet or Smarter Spreadsheets?
TL;DR:Â
OpenAI's rumored Q* model signals where AI is heading. It's a step, maybe a big one, in AI's journey towards AGI. But let's not get carried away – it's solving grade-school math, not unraveling the mysteries of the universe.
The Backstory:Â
Rumors abounded last week that the OpenAI drama was all because of a project codenamed Q*. It's touted as a breakthrough, nailing basic math problems that most AI models, including GPT-4, trip over. Yet, some pioneers in the field, like Yann LeCun, aren't buying the hype. They're saying it's more of an upgrade than a revolution.
What It Can Do (Apparently):Â
Q*'s primary capability (grade-school math wizard) suggests potential applications in fields like scientific research, personalized tutoring, and complex mathematical problem-solving. How does it do this? Instead of auto-regressing the next token, Q* can apparently use planning (think AlphaGo) to produce better outputs. Unlike traditional LLMs, which sequentially predict the next step, planning allows a model to anticipate and strategize several steps ahead. This mirrors human cognitive processes more closely, enabling AI to tackle complex problems with a level of foresight previously unattainable. So, it's not about the math itself but what it represents: the ability to reason, to connect the dots. If Q* can do this, it's a step towards AGI, i.e., making AI more like us – thinking, reasoning, and maybe even understanding.
Takeaways:Â
Q* should be seen as a step forward in AI, not a revolutionary leap. It shines a light on both the potential of search-based methods and its limits. The reaction to Q* also underscores the need for a balanced perspective on AI advancements, avoiding hype cycles that can distort public understanding and expectations of AI's capabilities.Â
Read more here.Â
Disclaimer: All of this is highly speculative information. No one outside of OpenAI knows anything about Q* or even if it exists. Further, all the opinions and views above are my own and don't represent those of my employer.
Resources To Consider:
Grokking Training Loss Patterns
Link: https://github.com/stas00/ml-engineering/blob/master/instabilities/training-loss-patterns.md
This repository is an online book and has a wealth of information for practitioners. I found this post on understanding training loss patterns really useful. I hope you do, too!
Understand ZipLoRA
Paper: https://arxiv.org/abs/2311.13600
This excellent video explains ZipLoRA and how it can effectively merge independently trained style and subject LoRAs to any user-defined subject in any user-defined style. The video's description also has a link to a Colab in case you'd like to follow along.Â
Aligning Diffusion Models with Direct Preference Optimization
Paper: https://arxiv.org/abs/2311.12908
This paper introduces Diffusion-DPO. This method helps text-to-image diffusion models better understand and match what people prefer to see in generated images. While these models can produce high-quality results, they don't always know what users prefer. Diffusion-DPO changes this by using user feedback to make these models create more appealing images that match the text better. This research is vital because it helps make generative models more aligned with what people really want to see.
Making Pixels Dance
Paper: https://arxiv.org/abs/2311.10982
PixelDance is a novel approach for creating videos with rich motion and sophisticated visual effects. This method tackles the challenge faced by existing video generation techniques, which primarily focus on text-to-video conversion but often produce clips with limited motion despite high image quality. PixelDance advances the field by significantly improving the ability to synthesize videos with intricate motions and complex scenes.Â
Cross Image Attention for Zero-Shot Appearance Transfer
Project Page: https://garibida.github.io/cross-image-attention/
Given two images depicting a source structure and a target appearance, this method generates an image merging the structure of one image with the appearance of the other. What's really cool is that this happens in a zero-shot manner, with no optimization or model training required while supporting appearance transfer across images that may differ in size and shape. The paper and code can be found in the link above.
🧨 Exponentially Faster Language Modeling
Paper: https://arxiv.org/abs/2311.10770
This paper presents FastBERT, a highly efficient variant of the BERT that utilizes only 0.3% of its total neurons for each inference (engaging just 12 out of 4095 neurons per layer). This impressive reduction in neuron usage is achieved by using fast feedforward networks (FFFs), replacing traditional feedforward networks. Despite its lean architecture, FastBERT performs comparably to standard BERT models. The authors provide a high-level CPU code that achieves a 78x speedup over conventional feedforward implementations and a PyTorch implementation with a 40x speedup. Additionally, the training code, benchmarking setup, and model weights are publicly available. Overall, this is fantastic work by the authors.
Orca 2: Teaching Smaller Language Models to Reason
Project Page: https://arxiv.org/abs/2311.11045
Orca 2 significantly advances training smaller language models (LMs) for enhanced reasoning abilities. Moving away from the traditional imitation learning approach, where smaller models mimic the output of larger, more capable models, Orca 2 introduces a novel training strategy. This approach involves teaching small LMs various reasoning techniques, such as step-by-step processing, recall-then-generate, recall-reason-generate, and direct answers. The key innovation lies in enabling these smaller models to employ different solutions for different tasks, which may diverge from those used by larger models. Plus, the weights for the model are publicly available.
A Unified Library for Parameter-Efficient and Modular Transfer Learning
Repository: https://github.com/adapter-hub/adapters
Adapters is an open-source library that unifies parameter-efficient and modular transfer learning in large language models. Adapters offers ease of use and flexible configuration by integrating ten diverse adapter methods into a unified interface. Further, it leverages adapter modularity through composition blocks, enabling the design of complex adapter setups. Adapters provides a powerful tool for addressing the challenges of conventional fine-tuning paradigms and promoting more efficient and modular transfer learning.
Survey complete, although I'm probably a real outlier in terms of a reader, as you'll see from my responses. In short, I could use expert advice on how to adopt as little AI as possible, but I'm sure that sounds like a foolish wish. All the same, perhaps you have wisdom to offer in this respect.