Consume Product - Arete Network

you are not correct, as of 2 weeks ago.
for coherence all prior data is used for each frame... worse... its not linear.

Making long, good-looking videos with video diffusion, especially using next-frame prediction models, is tricky due to two core challenges: forgetting and drifting.

Forgetting occurs when the model fails to maintain long-range temporal consistency, losing details from earlier in the video.

Drifting, also known as exposure bias, is the gradual degradation of visual quality as initial errors in one frame propagate and accumulate across subsequent frames.

11 sec uses a shitload of RAM

one new hack is "FramePack"

May 2025 FramePack:
======

FramePack is a Neural Network structure that introduces a novel anti-forgetting memory structure alongside sophisticated anti-drifting sampling methods to address the persistent challenges of forgetting and drifting in video synthesis. This combination provides a more robust and computationally tractable path towards high-quality, long-form video generation.

The central idea of FramePack’s approach to the forgetting problem is progressive compression of input frames based on their relative importance. The architecture ensures that the total transformer context length converges to a fixed upper bound, irrespective of the video’s duration. This pivotal feature allows the model to encode substantially more historical context without an escalating computational bottleneck, facilitating anti-forgetting directly.

FramePack system is built around Diffusion Transformers (DiTs) that generate a section of S unknown video frames, conditioned on T preceding input frames. It does not allow camera movement yet.

https://github.com/lllyasviel/FramePack

https://lllyasviel.github.io/frame_pack_gitpage/

TL/DR : XBX_X YOU ARE WRONG

TacosForTrump on scored.co

1 month ago 0 points (+0 / -0 ) 1 child

That's wild, give it 5 years though

Permalink Reply

Mirrored from scored.co

XBX_X on scored.co

1 month ago 1 point (+0 / -0 / ) 1 child

It's a totally wrong assumption. The 11-second limitation exists so that people don't make anything too wild or meaningful; as I mentioned, they're saving the full potential of this tech for high-dollar clients like ad agencies and film studios. Imagine if you discovered fire. Now imagine if you could limit who gets to use fire and how much of it. You want people to know fire exists, and it's benefits, but you also want to make it scarce so that you can demand big bucks for it too.

That's what's happening with AI.

-2

part on scored.co

1 month ago -2 points (+0 / -0 / )