AI Distillation #5
Prompt engineering deep dive, LLMs from scratch, Mario world generation, and more...
These are some excellent resources I found covering various CV, NLP, and optimization topics over the past week. I also heard something about a strawberry that takes a lot of time to answer questions, but that's for another day.
AI Prompt Engineering: A Deep Dive
This is a must-watch video on prompt engineering. Engineers from Anthropic cover various topics, including how prompt engineering has evolved, what makes a good prompt, tips to improve prompting, and the future of prompt engineering. Please do not miss this!
Late Chunking: Balancing Precision and Cost in Long Context Retrieval
Weaviate discusses late chunking, a strategy for processing large documents in vector search systems. Instead of splitting a document into smaller chunks before indexing (early chunking), late chunking dynamically processes the chunks after a query is made. This improves search quality by retaining more context during retrieval, making it particularly useful for long documents or complex queries. For further details, check out the full post.
LLM Agents for Software Engineering: A Survey
This is a comprehensive review of how LLM agents are applied to software engineering. It categorizes 106 papers from both software engineering and AI agent perspectives, exploring the effectiveness of LLMs in tasks such as code generation, bug fixing, and documentation. The survey also highlights the synergy between multiple agents and human interactions and discusses challenges and future directions in this domain. You'll need a cup or two of coffee to review this thoroughly.
DynOMo: Online Point Tracking via Gaussian Reconstruction
DynOMo performs online 2D and 3D point tracking using monocular camera input. It leverages 3D Gaussian splatting to reconstruct dynamic scenes in real time without requiring multi-view setups or offline processing. This allows for robust point tracking by enhancing image feature reconstructions and sets a strong baseline for point tracking with unposed cameras. For more details, check out the paper.
xLAM: A Family of Large Action Models for Agentic Systems
xLAM is a family of large action models for enhancing AI agent tasks. These models range from 1B to 8x22B parameters and are assembled using dense and mixture-of-expert architectures. Trained on diverse datasets to boost generalizability, xLAM models secure top performance in multiple agent benchmarks, including the Berkeley Function-Calling Leaderboard. The models are publicly available for open-source use, so check them out for your projects.
Video Game Generation: A Practical Study Using Mario
This project generates procedural content for Super Mario games using AI. Building on prior work in AI-driven game design, it creates levels and mechanics that mimic the original Mario experience but with novel twists and variations. The authors also explore procedural generation techniques for level design and game interaction, offering a new way to experiment with and expand on classic video game mechanics.
alphaXiv: Open Research Discussion 🤝 arXiv
This project from students at Stanford creates an open discussion forum for arXiv papers. You can post questions and comments directly on top of any arXiv paper by changing arXiv to alphaXiv in any URL. This will promote a ton of meaningful discourse and help identify "hidden" gems that don't get the spotlight on social media. Check it out!
Building LLMs from the Ground Up:
This tutorial from Sebastian Raschka dives deep into the building blocks of LLMs, how they work, and how to code them in PyTorch. Sebastian recently released a book on understanding LLMs so that this tutorial will pair nicely with that.