Text-to-3D, AudioPaLM, Faster Training of Diffusion Models and more...

Translated speech never sounded this good...

Jun 23, 2023

Before we get going, I wanted to briefly discuss something I shared last week.

In the previous edition, I shared a paper where the authors claimed that GPT-4 got a perfect score on MIT's EECS and Math curricula.

It turns out that that's an incomplete story.

When you look through the test set which was released on Github, there are a ton of questions that can't be solved since they reference missing information.

It turns out that the authors used GPT-4 to evaluate itself and continued to prompt over and over until it got the right answer. That's definitely a letdown.

If you're curious about what other discoveries were made about this paper, read this thread:

https://twitter.com/sauhaarda/status/1670053720233750530

I'm a bit disappointed at finding this out, and situations like this make me skeptical about the groundbreaking milestones that these language models claim to achieve.

From next week, we’ll be starting deep dives on Language Models and NLP.

With this out of the way, let's look at this week's content. From PaLM for audio to a faster training technique for diffusion models, I've collected some resources that I found interesting to learn from. I hope you find value in them too!

This Week on Gradient Ascent:

[Definitely Check out] AudioPaLM 🔊
[Consider reading] Text to 3D Content Creation 💎
[Consider reading] Faster Training of Diffusion Models ⏩
[Check out] Vision Transformer Paper Collection! 📚
[Consider reading] AI and the future of busy work ✍️

Resources To Consider:

AudioPaLM

Link: https://google-research.github.io/seanet/audiopalm/examples/

Google has released AudioPaLM, an LLM for speech understanding and generation. It fuses PaLM-2 and AudioLM into a multimodal architecture that can perform speech recognition and speech-to-speech translation. The model significantly outperforms existing speech translation systems and is worth checking out. Watch the video below with the volume up.

DreamTime: Improved Optimization for Text-to-3D Content Creation

Paper: https://arxiv.org/abs//2306.12422

In this paper, the authors study the challenges of text-to-3D content creation. They introduce a new optimization strategy called DreamTime that helps overcome these challenges. Their proposed approach significantly improves 3D content creation with higher quality and diversity.

Faster Training of Diffusion Models

Paper: https://arxiv.org/abs/2306.09305

Code: https://github.com/Anima-Lab/MaskDiT

In this paper, the authors propose a fast way to train diffusion models using masked transformers. Masked training reduces the training cost significantly. Their proposed method enables a state-of-the-art Diffusion Transformer to train fully in just 31% of its original training time. Check out the paper and code.