AI Distillation #2

ResNet sees a shovel in the moon, LLM writes 10000+ words, AI Agent can publish research papers, Diffusion tutorial, and more...

Aug 19, 2024

These are some of the most interesting resources I found over the past week covering a breadth of AI.

Why Does ResNet See a Shovel in the Moonman?

In this paper, the authors introduce a novel approach, CRAFT, for explainability in neural networks by identifying both "what" and "where" concepts drive a model's decisions. They leverage a Non-Negative Matrix Factorization (NMF) to extract and recursively decompose concepts from deep learning models automatically. CRAFT also introduces a method to estimate the importance of these concepts via attribution maps to visualize their influence. For more details, check out the paper.

Step-by-Step Diffusion: An Elementary Tutorial

[Paper Page]

This tutorial offers an accessible introduction to diffusion models and flow matching for machine learning. Aimed at a technical audience with no prior experience in diffusion models, it simplifies the mathematical details while retaining the precision necessary to derive correct algorithms. The tutorial covers the foundational concepts and guides readers through the essential steps in understanding and implementing diffusion processes in machine learning.

The AI Scientist

[Paper Page]

The AI Scientist presents can fully automate scientific discovery using large language models (LLMs). It autonomously generates research ideas, writes code, conducts experiments, visualizes results, writes full scientific papers, and performs simulated peer reviews. All under $15. It will be interesting to see how this evolves and, more importantly, whether it can pass Reviewer #2's scrutiny.

3D Gaussian Editing with A Single Image

[Paper]

The authors introduce a method for editing 3D scenes from a single image using 3D Gaussian Splatting. This technique enables intuitive 3D scene manipulation directly on a 2D image plane without the need for accurately reconstructed meshes. It supports long-range object deformation and non-rigid modeling, offering greater flexibility and quality in 3D content editing compared to prior work.

Matryoshka Diffusion Models

[Paper] [Code]

Matryoshka Diffusion Models (MDM) generate high-resolution images and videos through a nested UNet architecture. This method optimizes training efficiency and model quality through a multi-resolution loss function and progressive training schedule. MDM achieves state-of-the-art results in various tasks, including text-to-image and text-to-video generation, by avoiding the need for cascaded or latent diffusion processes. For more details, check out the paper.

Deep Dive on LLMs for Practitioners

[Tutorial Page]

Parlance Labs has a nifty little educational section on its website focused on NLP. Its content covers a range of topics, including fine-tuning, RAG, evaluation, and more. This is worth bookmarking to stay updated on the latest research and developments in the field.

LongWriter: Unleashing 10,000+ Word Generation

[Paper Page] [Code]

LongWriter addresses the challenge of generating ultra-long outputs (10,000+ words) LLMs. The "AgentWrite" pipeline breaks down extensive generation tasks into manageable subtasks, enabling existing LLMs to produce coherent, extended outputs. The authors introduce the LongWriter-6k dataset to fine-tune models and create LongBench-Write for evaluation. The 9B parameter model achieves SOTA results, demonstrating that current LLMs can generate much longer outputs with appropriate training data.

Yann LeCun's Deep Learning Course

[Course Page]

Yann LeCun's deep learning course from NYU provides a comprehensive introduction to deep learning. It covers neural networks, optimization, convolutional networks, sequence modeling, and generative models. The course includes a wide variety of lecture videos, slides, and assignments, so it's worth checking out if you'd like a deeper understanding of the fundamentals or are looking for a refresher.