Share this out people. Also - apologies for typos. Fast typing today!
Process
May 10, 2024 - From a post by Director (and SORA alpha user) Paul Trillo - Hundreds of script page long prompts were written to create the Sora music video "The Hardest Part" by Washed Out. Get a glimpse at one of the prompts in this BTS interview with fxguide.
For this project, Trillo rendered and assembled a number of clips to pull together the finished product. There was no pressing the magic button to spit this out en masse.
For The Hardest Part video which is almost four minutes long, Paul estimates that he generated around 700 clips – most of these were not the full one minute, most were closer to twenty seconds, so roughly: “about 230 minutes of video was generated in total,” he estimates, “and I used about 55 of those clips.” These were all rendered at 720P resolution and then, as with Air Head, upscaled to 2K using Topaz.
Computing the potential SORA cost from fxguide:
The Cost of SORA?
Paul Trillo is on a private alpha, and we have no visibility into that relationship. But doing research away from this specific video, there is an accepted working estimate that about 5 minutes of video per hour per NVIDIA H100. The site factorial funds has an excellent breakdown on what that might mean for a possible SORA cost model.
There is a vast difference between training and inferring configurations. A SORA user only needs to use the inference, which is naturally much faster and cheaper than the enormous amount of computing power required to train a GenAI like SORA. We should also stress these numbers are in no way from OpenAI or in any way official (or off-the-record unofficial) estimates. That said, away from OpenAI-specific cloud rendering, – any ML professional would typically budget in the range of $13- $15 an hour for 8x L4 GPU (we have seen quotes such as $14/hr). Based on that estimate alone, the pure compute cost of inferring 230 minutes of SORA would be 230 minutes of SORA = 2,760 minutes of H100 = 46 hours @ $14 per hour = US$644.00. Plus, there would be upload and download costs as well as storage costs. This also ignores that OpenAI may add a margin or alternative subsidise any actual SORA release pricing. That being established,…as one post house TD pointed out to fxguide, ” That sort of pricing would be cheap for a professional -but expensive for a non-professional”
This will prove quite the culture shock for the throng that is used to paying $10-$20 a month for software (which is already proven a shock to the throng that's used to using free software).
So we have our cost floor. The ceiling could be $400-$750 an hour for real-time effects, rendering, and fancy lunches at a post house. I will not include as an option your kick-ass former co-worker who will do After Effects compositing and motion design with some Nuke on the side from $95 - $150 / hour.
All of that said, the projected render base cost of $644 while it may seem a large number is actually quite affordable (cheap) in a commercial production context and that's assuming the 2k up-res render.
This assumes cost to the user. If you are the customer working with a production company, we’ re looking at prompt engineering prepro, production, and final fine tuning/QA, and color grading. You still might add some small footprint shoot days to facilitate transitions or making some “talkies.
Bottom line
I was looking at tiered approaches in this post a while back.
Sounds like Open AI is planning on staying in the rendering business. Distributed models seem to work best. They might want to spin that off not unlike global CDN’s (Content Delivery Networks). Also, larger more robust enterprise licenses would make sense for movie studios like Disney who want to own their tech stack (Lucasfilm and Pixar come to mind if they still exist under The Mouse). Plus the proprietary IP issues with shared training data.
Finally - a “LITE” version for the proletariat could work; maybe $50/month for a base of x rendered minutes PLUS purchasing tokens for additional render time.
That Prompt!
Read the interview in fxguide for context. Great article.
“Here is an example of one of the actual prompts used in The Hardest Part:”
continuous shot moving forward zooming through time, with a view of 1980s highschool hall corridor with checkered tiled floor, buzzing with students walking around. the scene is captured from a low angle front perspective, showing a door at the end of the corridor getting bigger and closer. the scene is blurred, indicating a high speed movement. the shot is moody and cinematic, with a slight vignette and a warm, vintage tone. the shot is captured on 35mm film, fuji film stock from the 90s with an anamorphic 24mm lens. motion blur as we zoom continuous shot, analog film. • One point perspective FPV, continuous shot moving forward zooming through a time and through the doorway, with a view of a open classroom of students dressed in 80s attire. we zoom through students looking to the front of the class room rushing in front of the lens. the classroom has a distinct 80s feel. the scene is captured from a front perspective, showing the students getting bigger and bigger we see two students, a male student with dark hair and jean jacket making eye contact with a female student also in a jean jacket. the female student is chewing bubblegum and make a bubble from pink bubble gum. the scene is blurred, indicating a high speed movement. the shot is moody and cinematic, with a slight vignette and a warm, vintage tone. the shot is captured on 35mm film, fuji film stock from the 90s with an anamorphic 24mm lens. motion blur as we zoom continuous shot, analog film. • One point perspective FPV, continuous shot moving forward zooming through the classroom, with a 18 year old boy with dark hair and jean jacket making eye contact with a female student also in a jean jacket. the female makes a bubble with pink bubblegum in front of the lens. we zoom through the bubble it pops and we zoom through the bubblegum and enter an open football field. the scene is moving rapidly, showing a front perspective, showing the students getting bigger and faster. the scene is blurred, indicating a high speed movement. the shot is moody and cinematic, with a slight vignette and a warm, vintage tone. the shot is captured on 35mm film, fuji film stock from the 90s with an anamorphic 24mm lens. motion blur as we zoom continuous shot, analog film. • One point perspective FPV, continuous shot moving forward zooming through an open football field overcast, from the 1980s, with the bleachers in the background distance. in the center of the shot is the same guy and girl in jean jackets with their back to camera walking in the field. we see they are holding hands the camera narrows in zooming in toward their hands clutching. the scene is moving rapidly, showing a front perspective of their hands getting bigger and closer. we zoom toward the bleachers in the background, the scene is blurred, indicating a high speed movement. the shot is moody and cinematic, with a slight vignette and a warm, vintage tone. the shot is captured on 35mm film, fuji film stock from the 90s with an anamorphic 24mm lens. motion blur as we zoom continuous shot, analog film. • One point perspective FPV, continuous shot moving forward zooming through the couple’s hands holding, we zoom through the bleachers in background of the football field and through a moody forest of trees at night with the neon glow of the city in the background is out of focus with bokeh. the city is out of focus behind the trees at night. the scene is captured by the camera in a fast and smooth movement. the scene is blurred, indicating a high speed movement. the trees have an opening a tunnel at the center that we enter. the shot is moody and cinematic, with a slight vignette and a warm, vintage tone. the shot is captured on 35mm film, fuji film stock from the 90s with an anamorphic 24mm lens. motion blur as we zoom continuous shot, analog film. • One point perspective FPV, continuous shot moving forward zooming through the opening between the dark moody forest trees and we enter to a look out point at the top of a hill with a view of the out of focus city lights shimmering in the background. we zoom in toward an 80s car parked a the top of the hill with it’s red taillights illuminated the grassy hill, the the lookout point and car scene is quaint and peaceful. the scene is moving rapidly, showing a front perspective of the town getting smaller and further at night. the scene is blurred, indicating a high speed movement. the shot is moody and cinematic, with a slight vignette and a warm, vintage tone. the shot is captured on 35mm film, fuji film stock from the 90s with an anamorphic 24mm lens. motion blur as we zoom continuous shot, analog film. • One point perspective FPVcontinuous shot moving forward zooming through the nightime lookout point zooming through the back window of an 80s car and into the interior of the 80s car where the young couple are seating in the front seat and are leaning in toward each other, with a view of a out of focus city in the background through the car windshield, the scene is moving rapidly, showing a top view of the city. the shot is moody and cinematic, with a slight vignette and a warm, vintage tone. the shot is captured on 35mm film, fuji film stock from the 90s with an anamorphic 24mm lens. motion blur as we zoom continuous shot, analog film. • One point perspective FPV, continuous shot moving forward zooming through the interior of the 80s backsetat car where the couple are seating in the front seat and lean in to each other, with a view of a out of focus city in the background through the car windshield. the scene is moving rapidly, showing a straight view of the out of focus city outside the car windshield. we zoom between the faces of the young couple as they lean in toward each other. the shot is moody and cinematic, with a slight vignette and a warm, vintage tone. the shot is captured on 35mm film, fuji film stock from the 90s with an anamorphic 24mm lens. motion blur as we zoom continuous shot, analog film. • One point perspective FPV,continuous shot moving forward zooming through the front seat of the car toward the young couple leaning in toward each other and we zoom out the windshield into the city at night repeating new york library with large aisles, with a counter, shelves, and products. the library is large and crowded, is in a new york city we zoom into a woman reading a book looking over their shoulder she is holiding a book up, the shot is moody and cinematic, with a slight vignette and a warm, vintage tone. the shot is captured on 35mm film, fuji film stock from the 90s with an anamorphic 24mm lens. motion blur as we zoom continuous shot, analog film. •One point perspective FPV, continuous shot moving forward zooming through infinitely through the windshield into the out of focus city at night, we zoom in and drop down to the city at night zooming through the street, through the street lamps, we zoom into the young couple walking down the middle of the street at night, the shot is moody and cinematic, with a slight vignette and a warm, vintage tone. the shot is moody and cinematic, with a slight vignette and a warm, vintage tone. the shot is captured on 35mm film, fuji film stock from the 90s with an anamorphic 24mm lens. motion blur as we zoom continuous shot, analog film. •One point perspective FPV, continuous shot moving forward zooming through an infinitely down the street at night and we see the couple again laughing and running under the lights at night in a suburban street, looking over their shoulder we land in a close up shot of the book. the shot is moody and cinematic, with a slight vignette and a warm, vintage tone. the shot is moody and cinematic, with a slight vignette and a warm, vintage tone. the shot is captured on 35mm film, fuji film stock from the 90s with an anamorphic 24mm lens. motion blur as we zoom continuous shot, analog film. One point perspective FPV
Have a great weekend people! And please share with your friends!