A.I. Language Models should spook brands, agencies, you, and me.

Edition 15 - Some bad, bad stuff is out there.

Apr 15, 2024

1. Models All The Way Down

(Such a great way to spend a Friday night. Let’s get at it).

“Models All The Way Down” is a website presentation both insightful and sobering. I could do a huge write-up on this, however your time would be best spent reviewing it for yourselves. It presents the information cogently and smartly. I could do this no justice. I am not a news organization so it is presented as is. Below is the missing context quoted from the site.

If you want to make a really big AI model — the kind that can generate images or do your homework, or build this website, or fake a moon landing — you start by finding a really big training set.
Images and words, harvested by the billions from the internet, material to build the world that your AI model will reflect back to you.
Yet few people in the world have spent the time to look at what these sets that feed their models contain.
In December, researchers from Stanford's Internet Observatory identified more than 1,000 images categorized as Child Sexual Abuse Material (CSAM) in one of the most influential AI training sets of the moment: LAION-5B.

For more, check out “Models All The Way Down.”

LAION-5B - remember this.

2. AI image training dataset found to include child sexual abuse imagery

Stanford researchers discovered LAION-5B, used by Stable Diffusion, included thousands of links to CSAM.

Article - The Verge Dec 20, 2023, 9:57 AM CST

A popular training dataset for AI image generation contained links to child abuse imagery, Stanford’s Internet Observatory found, potentially allowing AI models to create harmful content.
LAION-5B, a dataset used by Stable Diffusion creator Stability AI, included at least 1,679 illegal images scraped from social media posts and popular adult websites.

This data set has been used to train over 1,300 different academic AI projects.

The stated goal of the project to create LAION-5B was to conduct basic research into dataset curation. Specifically, its authors wanted to create an image training set with purely automated methods - with no humans in the mix.

On their homepage, its creators explicitly warn against its use in real-world contexts:

“Providing our dataset openly, we however do not recommend using it for creating ready-to-go industrial products ...”

This significant warning regarding the potential biases present in large language models like Midjourney and Stable Diffusion has been mostly disregarded. These two prominent models are known to have been trained, at least partially, on the LAION-5B dataset. It is highly probable that numerous other commercial models, potentially numbering in the hundreds, have also been trained using the same dataset. Runway has also used version(s) of Stable Diffusion as one of it’s co-founders.

My comment: I’ve read conflicting reports that LAION-5B has been taken down (Github and others) and that they are attempting to address these issues. I’ve read that it’s still up. Everything I have read points to the fact that addressing this is nearly impossible since the dataset is gigantic. So please, if you have any concerns about your datasets please deal with your rep if you’re on an enterprise version. Check the vendor sites for applicable notices as well.

These models power widely-used applications such as chatbots and image generators, serving hundreds of thousands of users. Despite the cautionary advisory, the potential risks associated with the biases inherent in the training data have not been adequately addressed.

LAION-5B - a dataset consisting of 5.85 billion CLIP-filtered image-text pairs, of which 2.32B contain English language. The scale of the dataset makes it impossible to implement human curation.

For more on this, check out Models All The Way Down.

3. Mor-a Sora

Diffusion transformers are the key behind OpenAI’s Sora — and they’re set to upend GenAI. TechCrunch 2/28/24

YouTube CEO: Using Platform’s Videos to Train OpenAI’s Sora Violates Terms of Service. PYMNTS 4/4/24

4. Runway? No F Way.

Last week some jackass shot one across the bow about being a big deal. Let’s have a look at what this knucklehead said …

Let’s unpack this.

I was contacted by Runway with the offer of seeing what’s new and what they are up to.

The outcome? So I saw nothing they were working on. We did chat.

RUNWAY: “Would you be interested in an enterprise license?”
- ME: “For 1?”

ME: “Can I see what you working on? Like have a login to play around with a Beta?”
- RUNWAY: “That’s reserved for people at the Enterprise level.”
ME: “Can I see any recent video renderings.”
- RUNWAY: “Go to this guy’s Twitter page. He does some really cool shit.”

I lost the link and don’t really care. He went on to tell me they have other people that do interviews. He said he’d connect me. I’m sure he’s been too busy to check his email for the past week.

So yeah. Runway. No new news. Still great four-to-eight second conceptual clips in slo-mo, and animating birds. I’m sure I’ll hear from them if anything new pops up. He said by year’s end things will be wicked. (yawn).

5. Random Rando

AI making some really weird shizzle.

Rapid GenAI Progress Exposes Ethical Concerns. And it’s making for some weird sh*t. Datanami 3/4/24

Working for Microsoft in London sounds more fun than Redmond.

Microsoft AI has opened a London hub to access their plentiful talent. WTF? With exception of our Congressional Dysfunction, I’m not noticing a lack of qualified and available able bodies here in the U.S. of A. All of the tech sector layoffs aside, is Microsoft training AI voice features with British accents? Sounds more top shelf I must say. So here’s the history of the people who decline to pronounce their r’s.

Wearable Terrible

The AI Pin. The Washington Post. The left hand.

Grab Bag

Google has announced a chatbot subscription plan called Gemini Advanced, which will cost $20 per month and will be its most powerful chatbot.
OpenAI has made GPT-4 Turbo with Vision API generally available.
A regulator is concerned about the power of big tech firms in the AI market.
Elon Musk has predicted that superhuman AI will be smarter than him next year. And everyone else I imagine.
Think in Italian.

Weird A.I. “Stuff”

Someone asked AI to make a Joe Biden dairy queen commercial. I love it when AI can’t hit the side of a barn. I love this.

Feel free to share comments below. If you enjoyed this read, please smash the heart icon …

…at the top or bottom of the page so others can find it. Re-stacking is great too.

Have a great week! H&R Block didn’t start my taxes until today. They’ve had them 2.5 weeks. I’m so f——— (forlorn)
-pb

Discussion about this post

Ready for more?