Text-to-3D, AudioPaLM, Faster Training of Diffusion Models and more...
Translated speech never sounded this good...
Before we get going, I wanted to briefly discuss something I shared last week.
In the previous edition, I shared a paper where the authors claimed that GPT-4 got a perfect score on MIT's EECS and Math curricula.
It turns out that that's an incomplete story.
When you look through the test set which was released on Github, there are a ton of questions that can't be solved since they reference missing information.
It turns out that the authors used GPT-4 to evaluate itself and continued to prompt over and over until it got the right answer. That's definitely a letdown.
If you're curious about what other discoveries were made about this paper, read this thread:
https://twitter.com/sauhaarda/status/1670053720233750530
I'm a bit disappointed at finding this out, and situations like this make me skeptical about the groundbreaking milestones that these language models claim to achieve.
From next week, we’ll be starting deep dives on Language Models and NLP.
With this out of the way, let's look at this week's content. From PaLM for audio to a faster training technique for diffusion models, I've collected some resources that I found interesting to learn from. I hope you find value in them too!
This Week on Gradient Ascent:
[Definitely Check out] AudioPaLM 🔊
[Consider reading] Text to 3D Content Creation 💎
[Consider reading] Faster Training of Diffusion Models ⏩
[Check out] Vision Transformer Paper Collection! 📚
[Consider reading] AI and the future of busy work ✍️
Resources To Consider:
AudioPaLM
Link: https://google-research.github.io/seanet/audiopalm/examples/
Google has released AudioPaLM, an LLM for speech understanding and generation. It fuses PaLM-2 and AudioLM into a multimodal architecture that can perform speech recognition and speech-to-speech translation. The model significantly outperforms existing speech translation systems and is worth checking out. Watch the video below with the volume up.
DreamTime: Improved Optimization for Text-to-3D Content Creation
Paper: https://arxiv.org/abs//2306.12422
In this paper, the authors study the challenges of text-to-3D content creation. They introduce a new optimization strategy called DreamTime that helps overcome these challenges. Their proposed approach significantly improves 3D content creation with higher quality and diversity.
Faster Training of Diffusion Models
Paper: https://arxiv.org/abs/2306.09305
Code: https://github.com/Anima-Lab/MaskDiT
In this paper, the authors propose a fast way to train diffusion models using masked transformers. Masked training reduces the training cost significantly. Their proposed method enables a state-of-the-art Diffusion Transformer to train fully in just 31% of its original training time. Check out the paper and code.
Vision Transformer Paper Collection
Link: https://github.com/cmhungsteve/Awesome-Transformer-Attention
As the name suggests, the link above has a fantastic collection of ViT papers and code repositories. Consider checking it out.
AI and the Future of Work
I loved reading this essay by Dr.
on how AI is going to change the future of work by automating the writing process. I think you’ll enjoy it too.
I enjoy seeing what you're up to Sairam, although the subject matter is over my head. Maybe if I stay in touch over time I'll learn something!