Gradient Ascent

Share this post

Text-to-3D, AudioPaLM, Faster Training of Diffusion Models and more...

newsletter.artofsaience.com

Discover more from Gradient Ascent

Level up in machine learning the fun way - Illustrations, resources, videos, papers, and simplified breakdowns of hard to understand topics to make you an awesome practitioner! AI & Machine learning have never been simpler to learn!
Over 4,000 subscribers
Continue reading
Sign in

Text-to-3D, AudioPaLM, Faster Training of Diffusion Models and more...

Translated speech never sounded this good...

Sairam Sundaresan
Jun 23, 2023
8
Share this post

Text-to-3D, AudioPaLM, Faster Training of Diffusion Models and more...

newsletter.artofsaience.com
2
Share

Before we get going, I wanted to briefly discuss something I shared last week. 

In the previous edition, I shared a paper where the authors claimed that GPT-4 got a perfect score on MIT's EECS and Math curricula. 

It turns out that that's an incomplete story.

When you look through the test set which was released on Github, there are a ton of questions that can't be solved since they reference missing information. 

It turns out that the authors used GPT-4 to evaluate itself and continued to prompt over and over until it got the right answer. That's definitely a letdown.

If you're curious about what other discoveries were made about this paper, read this thread:

https://twitter.com/sauhaarda/status/1670053720233750530

I'm a bit disappointed at finding this out, and situations like this make me skeptical about the groundbreaking milestones that these language models claim to achieve.

From next week, we’ll be starting deep dives on Language Models and NLP.

With this out of the way, let's look at this week's content. From PaLM for audio to a faster training technique for diffusion models, I've collected some resources that I found interesting to learn from. I hope you find value in them too!

This Week on Gradient Ascent:

  • [Definitely Check out] AudioPaLM  🔊

  • [Consider reading] Text to 3D Content Creation 💎

  • [Consider reading] Faster Training of Diffusion Models ⏩

  • [Check out] Vision Transformer Paper Collection! 📚

  • [Consider reading] AI and the future of busy work ✍️

Resources To Consider:

AudioPaLM

Link: https://google-research.github.io/seanet/audiopalm/examples/

Google has released AudioPaLM, an LLM for speech understanding and generation. It fuses PaLM-2 and AudioLM into a multimodal architecture that can perform speech recognition and speech-to-speech translation. The model significantly outperforms existing speech translation systems and is worth checking out. Watch the video below with the volume up.

Loading video

Join over 3.2k subscribers from Meta, Google, Amazon, Dropbox, Microsoft, and more! Subscribe for free to receive new posts and learn cutting-edge machine learning the fun way.

DreamTime: Improved Optimization for Text-to-3D Content Creation

Paper: https://arxiv.org/abs//2306.12422

In this paper, the authors study the challenges of text-to-3D content creation. They introduce a new optimization strategy called DreamTime that helps overcome these challenges. Their proposed approach significantly improves 3D content creation with higher quality and diversity.

Faster Training of Diffusion Models

Paper: https://arxiv.org/abs/2306.09305

Code: https://github.com/Anima-Lab/MaskDiT

In this paper, the authors propose a fast way to train diffusion models using masked transformers. Masked training reduces the training cost significantly. Their proposed method enables a state-of-the-art Diffusion Transformer to train fully in just 31% of its original training time. Check out the paper and code.

Vision Transformer Paper Collection

Link: https://github.com/cmhungsteve/Awesome-Transformer-Attention

As the name suggests, the link above has a fantastic collection of ViT papers and code repositories. Consider checking it out. 

AI and the Future of Work

I loved reading this essay by Dr.

Ethan Mollick
on how AI is going to change the future of work by automating the writing process. I think you’ll enjoy it too.

One Useful Thing
Setting time on fire and the temptation of The Button
I saw a bit more of the future of AI at work this week, and it shows every sign of vastly boosting productivity, while also causing a crisis of meaning in many organizations. For such a dramatic statement, the actual bit of AI technology I got to experience this week is incredibly minor. It doesn’t do anything that AI couldn’t do before. In fact, other A…
Read more
4 months ago · 256 likes · 58 comments · Ethan Mollick
8
Share this post

Text-to-3D, AudioPaLM, Faster Training of Diffusion Models and more...

newsletter.artofsaience.com
2
Share
Previous
Next
2 Comments
Share this discussion

Text-to-3D, AudioPaLM, Faster Training of Diffusion Models and more...

newsletter.artofsaience.com
Rick Lewis
Writes Pivot to the Podium
Jun 23Liked by Sairam Sundaresan

I enjoy seeing what you're up to Sairam, although the subject matter is over my head. Maybe if I stay in touch over time I'll learn something!

Expand full comment
Reply
Share
1 reply by Sairam Sundaresan
1 more comment...
Top
New
Community

No posts

Ready for more?

© 2023 Sairam Sundaresan
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing