This Week on Gradient Ascent:
A year in review - Generative AI takes the cake ⌛️
The best machine learning courses in 2022 🧑🏫
The best machine learning papers in 2022 📜
2022 - A generational year for AI:
A buzzing sense of unease and tension lingers in the air. No one dares to move a muscle. Well over two hours have elapsed, yet the outcome is still unclear. This will be settled in the space of 12 yards. Bearing the weight of his nation on his bruised shoulders, Gonzalo Montiel walks up to the penalty spot hoping to write history for Argentina and in the process, rectify his error in the 117th minute of a breathless game that led to this moment. The football gods have given him a second chance to redeem himself. Calmly, he slots the balls to the right of opponent Hugo Lloris who guesses wrong and agonizingly watches the ball nestle in the corner of the net with a satisfying thump. The lighter of the two blues erupts in celebration. In the background, A 24-year-old Kylian Mbappe shakes his head in grief and disbelief. His hat trick wasn't enough for France on this day. His club teammate and legend, Lionel Messi shakes his head disbelievingly in joy. He has finally completed his medal collection. His legacy is assured.
The 2022 football world cup was a spectacle that captured the imagination of millions around the world from the casual fan to the devout analyst. A score of 3-3 after extra time and only the third penalty shootout in the history of the world cup finals to decide the winner of an epic battle. What a game.
No, you're not reading a sports article. The machine learning community witnessed this kind of excitement and non-stop action over a span of 12 months, not just for 120 minutes. Machine learning research has made strong inroads in protein folding, solved a 53-year-old math question, and generated incredible images, conversations, and music with the coolness of Emi Martinez, Argentina's heroic goalkeeper in the shootouts. There has been no paucity of action.
If the 2022 world cup belongs to the generational talent of Messi, the 2022 equivalent in machine learning belongs to generative AI.
Imagen Dall-E back to life
It's hard to find a good starting point to reflect on this whirlwind of a year, but I'll venture a guess that most people would start with image-generating models. So let's start there.
Until recently, Generative Adversarial Networks (GANs) were the kings of the hill when it came to image generation. It took researchers an idea from a completely unrelated field, non-equilibrium statistical physics, to knock these networks off the top spot. The result was a new class of models called Diffusion models. Although an exciting breakthrough, they remained in the shadow of GANs for many years.
But in April 2022, OpenAI introduced DaLL-E 2 leveraging the latent power of diffusion, and Pandora's box was opened. Millions of users beta-tested the model, and soon, the internet was flooded with incredible imagery that was almost indistinguishable from "real" images. Whether it was an avocado on the moon or recreations of the Mona Lisa, nothing was impossible anymore.
Not to be left behind, Google released not one but two models, Imagen and Parti. Both models produced excellent results but had fundamental differences in their underlying architectures. Imagen used the diffusion approach while Parti was an auto-regressive model.
While this research allowed for exciting creative possibilities and the rise of products like LensaAI and AvatarAI, it wasn't without controversy. Using Midjourney (a solution for users to create images from text), a Colorado man won a local state fair leading to huge outcries from the artistic community. Traditional artists were outraged when they learnt that their work had been used to train these models without their permission. What made things worse was that these generated works of art were flooding websites where these artists sold their art, and they weren't being compensated for it in any way. Lensa AI's app produced derogatory images when women tried the app..
In 2023, researchers will be looking to further the nascent efforts in text-to-video and text-to-3D. The practical implications of these models will be an interesting area to follow, especially when it comes to the metaverse, deep fakes, and copyright issues. Creatives might be scared that their jobs could be stolen. However, I feel that this will unlock the latent creativity and imagination of many who finally have the tools to create from simple expression. In fact, I feel experienced creators have the most to gain from this. Concept art will be infinitely easier to create, and "creator's block" will no longer be an issue.
For researchers, there are still many interesting problems to solve - How to make generation faster, how to deploy these models on edge devices (your phones), how to improve the quality of generation and how to make prompts closer to conversational English.
Finally, NeRFs (Neural Radiance Fields) made some great headway in a number of areas. NeRFs can synthesize new views of a scene from a limited number of example views. Say I show you a few pictures of a scene, then hide them. If I ask you to close your eyes and imagine how that scene would look from a different angle, you'd probably be able to do it right? That's kind of what NeRFs do. Previously, NeRFs were limited in what they could "imagine". Google's Mip-NERF 360 found a way to generate unbounded views of a scene in all directions, something which was unheard of. Just look at the video below to see what I mean:
Another limitation of NeRFs is the sheer amount of time they take to generate these new views. Plenoxels shortened this time significantly (Traditional NeRFs were slooooooooooowwwwwww) - from days to minutes.
My amazing collaborators (and yours truly) used NeRFs to generate new views of our Sun from satellite measurements.
Later in the year, researchers were able to generate "fly-throughs" using NeRFs that led to some really amazing moving scenery.
Can you imagine Google maps powered by this technology? How easy it would be to find a mom-and-pop store on a random street corner if you could "fly through" the route on your phone. That possibility might be closer than you think.
All right, all right, you're asking why I haven't mentioned a peep about ChatGPT. Let's switch gears to language models.
Conversational AI - waddle aside rubber ducky
A lot of programmers I know use the rubber ducky method. When you run into a bug while coding (and you will because that's life), you simply speak aloud sharing your thought process with a rubber ducky in front of you. Magically, you find the error and the solution.
In 2022, the rubber ducky was replaced with two smarter duckies - Github Copilot and ChatGPT. Copilot is an autocomplete system for code that is trained on tons of open-source code. Github opened access to it this year for all users. ChatGPT upped the bar significantly, learning from feedback and becoming much more conversational than models past (If you're curious about how it works, I wrote about it just last week here).
There were other significant notables I'd be remiss if I didn't mention. Alphacode from DeepMind, launched early this year was able to solve challenging coding problems. After all, it was trained on code submitted to competitive programming contests in a dozen programming languages. In fact, DeepMind found that it was within the top 54% of participants in programming competitions!
If you're like me, you'll know how challenging Leetcode interview problems are to solve under severe time constraints. Now, imagine solving problems that make these look like a piece of cake and finishing within the top half of coding experts. That's a pretty nifty achievement if you ask me. While these models won't replace human programmers anytime soon, they're wonderful tools to help us write and debug code more easily.
Meta released Atlas and later, Galactica. Atlas was a question-answering model that primarily retrieved information from a database of documents. Galactica on the other hand only survived 3 days online before being taken down. It was a language model trained on scientific and technical subjects. But, it was prone to generating fake information and citing sources that didn't exist.
Google search won't be replaced anytime soon, folks.
Reducing these models' propensity to generate misinformation and hallucination will be a key challenge for researchers going into next year.
LOTR meets large models
In addition to the amazing progress made above, researchers have also tried to scale up these models to perform more than one function - hundreds of tasks in fact.
Simply put, imagine if you had one model to rule them all. You'd have to train such a model once and then finetune it as needed for various tasks. That would be so beneficial not only from a generalizability perspective but also in reducing carbon emissions. These large models can chalk up electricity bills that dwarf the annual bills of a few countries.
In this context, two notable models come to mind this year. First, Google released PaLM which could show state-of-the-art performance on several language understanding and generation tasks. It could outperform humans in some. The other was Gato from DeepMind. This could learn over 600 different tasks. Whether it was playing atari games, generating captions for images, and more.
This is still nascent work, but there's a clear indication that these large models have tremendous potential to generalize and be true multipurpose solutions.
I can't wait to see where this thread of research goes in 2023. Think of a good Thanos with all the infinity stones. If you had that power, what would you achieve with a snap of your fingers?
There were a lot of more interesting breakthroughs that came out this year, but these are the ones that caught my eye. Which piece of research made you do a double-take?
Resources To Consider:
Best Machine learning courses 2022
As practitioners, we were spoilt for choice when it came to learning resources in the past year. Below are 4 of the best which I recommend checking out if you're keen to level up. I've also organized them in the order in which you should take them.
Fundamental Machine Learning: Andrew Ng's Coursera Specialization
Link: https://www.coursera.org/specializations/machine-learning-introduction
This course first launched in 2012 was the introduction that a lot of practitioners including myself took to get started in ML. In 2022, it's been freshly revamped and is a great starting point for anyone looking to kickstart their ML journey.
Deep learning: FastAI's 2022 course part I
Link: https://course.fast.ai
This is the course I recommend for anyone who knows basic coding. The fastai approach is to build and use things and deconstruct them in a top-down fashion. Don't bother learning deep learning from any other place.
Advanced Deep learning: CMU's Deep learning systems (Optional)
Link: https://dlsyscourse.org/lectures/
Once you have the basics down, consider checking this course out. In this, you build your own deep learning library from scratch in Python and C++. I recommend this purely because of how comprehensive and rigorous it is, but this course is not for beginners. If you have a good grounding in deep learning theory and programming, it's an incredible course to learn how these algorithms work under the hood.
MLOps: Full Stack Deep Learning 2022
Link: https://fullstackdeeplearning.com/course/2022/
Learning machine learning algorithms is just one part of the story. It's equally important to know how to deploy and maintain these models in a real-world system. This is the course that teaches you the best practices to do just that. Take this to complete your machine learning education.
Best Machine learning papers 2022
While saying that anything is definitely the best is a bit subjective, this curated list from Louis Bouchard has a lot of papers I would recommend as well. Two notable omissions here are ConvNext and MaxVit. Check out the full list here:
Link: https://github.com/louisfb01/best_AI_papers_2022
It’s been an incredible year, and I thank you for your time and attention. I’ve been working behind the scenes on a few exciting additions to the newsletter, and I can’t wait to share them with you! Wish you happy holidays, and an awesome, happy, and prosperous new year!
Cool roundup
I particularly like the line
> the rubber ducky was replaced with two smarter duckies - Github Copilot and ChatGPT
One thing I realized when I used the smarter duckies is that their strength is also their weakness. They are too fast and too good. So they accept your question too easily.
I had an experience recently where i was trying to get an answer in StackOverflow while typing the question midway, it's precisely it's not as efficient that it made me realize my question had a wrong approach altogether. https://twitter.com/KimStacks/status/1605032489436717057
i wonder if ChatGPT and the like would ever be that smart to do that
I won't bet against it tho
The way you tied in the world cup with AI was beautiful. Nicely done!