Discussion about this post

User's avatar
Neural Foundry's avatar

Great roundup! DeepSeek-OCR 2's causal attention approach is particularly clever. The shift from rigid raster scaning to semantic token reordering basically makes the model think like a human reader would. I've built document parsers before, and the column-by-column table processing alone would solve so many edge cases we struggled with. The 45% token eficiency gain over typical grid-based encoders is kinda wild for a 3B model.

No posts

Ready for more?