Discover more from Gradient Ascent
Text-to-3D, AudioPaLM, Faster Training of Diffusion Models and more...
Translated speech never sounded this good...
Before we get going, I wanted to briefly discuss something I shared last week.
In the previous edition, I shared a paper where the authors claimed that GPT-4 got a perfect score on MIT's EECS and Math curricula.
It turns out that that's an incomplete story.
When you look through the test set which was released on Github, there are a ton of questions that can't be solved since they reference missing information.
It turns out that the authors used GPT-4 to evaluate itself and continued to prompt over and over until it got the right answer. That's definitely a letdown.
If you're curious about what other discoveries were made about this paper, read this thread:
I'm a bit disappointed at finding this out, and situations like this make me skeptical about the groundbreaking milestones that these language models claim to achieve.
From next week, we’ll be starting deep dives on Language Models and NLP.
With this out of the way, let's look at this week's content. From PaLM for audio to a faster training technique for diffusion models, I've collected some resources that I found interesting to learn from. I hope you find value in them too!
This Week on Gradient Ascent:
[Definitely Check out] AudioPaLM 🔊
[Consider reading] Text to 3D Content Creation 💎
[Consider reading] Faster Training of Diffusion Models ⏩
[Check out] Vision Transformer Paper Collection! 📚
[Consider reading] AI and the future of busy work ✍️
Resources To Consider:
Google has released AudioPaLM, an LLM for speech understanding and generation. It fuses PaLM-2 and AudioLM into a multimodal architecture that can perform speech recognition and speech-to-speech translation. The model significantly outperforms existing speech translation systems and is worth checking out. Watch the video below with the volume up.
Join over 3.2k subscribers from Meta, Google, Amazon, Dropbox, Microsoft, and more! Subscribe for free to receive new posts and learn cutting-edge machine learning the fun way.
DreamTime: Improved Optimization for Text-to-3D Content Creation
In this paper, the authors study the challenges of text-to-3D content creation. They introduce a new optimization strategy called DreamTime that helps overcome these challenges. Their proposed approach significantly improves 3D content creation with higher quality and diversity.
Faster Training of Diffusion Models
In this paper, the authors propose a fast way to train diffusion models using masked transformers. Masked training reduces the training cost significantly. Their proposed method enables a state-of-the-art Diffusion Transformer to train fully in just 31% of its original training time. Check out the paper and code.
Vision Transformer Paper Collection
As the name suggests, the link above has a fantastic collection of ViT papers and code repositories. Consider checking it out.
AI and the Future of Work
I loved reading this essay by Dr.on how AI is going to change the future of work by automating the writing process. I think you’ll enjoy it too.