Why DALL-E Isn't The Future Of Blog Illustration

2022-08-26 17:07:53 PDT (last update 2022-08-26 17:08:38 PDT)

If you've been following this blog (why? how?) you'll know I've been playing with the DALL-E AI art generation tool for blog illustration purposes.

I feel like I'm getting to the end of that experiment. Here's a few thoughts on why.

On Business Models and Creativity… In the world of modern computing an awful lot is premised on literally "pay to play". Whether it's games themselves or creative tools, organizations expect to cover their costs as well as a healthy profit by charging. This seems quite reasonable on the face of it.

The question of whether a creativity service charge is a workable business model is thus premised on whether the chargee values the service enough to pay adequately. I'm grateful that DALL-E gave me a chance to get an answer for free. The bad news is that I am concluding that I do not value DALL-E that much.

Pay-per-image, first of all, is deadly for me. I've been really careful with my 50 free images, as a gamer-inclined person will be with most any resource. I have skipped asking for illustrations I didn't really "need", and I've never asked for a re-render of an illustration I was unhappy with. I get to a spot where a picture was mandatory, pick the best of the four results, and call it good — regardless of whether I'm really happy with what went down.

I'd much prefer an unlimited-use (or huge-cap use — say 5000 images annually) periodic fee model. But I'm really skeptical they could charge small enough to make it work for me on that basis. I guess I'd happily pay $10 per year for effectively-unlimited access to DALL-E. I doubt that would even come close to covering my share of the infrastructure costs.
On The Limitations Of 2022 AI… The premise of DALL-E and friends is just insanely aggressive. As an AI Professor, I am more than aware. The fact that this bear dances at all is nothing short of astonishing.

That said, once the astonishment wears off, the holes become pretty apparent. The biggest problem with Magic Neural Nets is that there is little that can be done with small-scale engineering to uncover obvious systemic limitations.

For example, since its inception (no pun intended) DALL-E has liked to put "symbols that look more or less like text" in its images. It does this the most when it is confused about what is happening. It's "clever," but it isn't something that's OK in an image that is meant to stand on its own.

No one knows how to fix this, really. I'm not much exaggerating here. OpenAI could retrain the net heavily to get DALL-E to stop doing that. Mostly. But that big a retrain would likely induce other follow-on problems. You can't just go comment out the "generate texty stuff" code — there isn't any. What you'd really like is to get DALL-E to put real, sensible text in the places where it wants to put text. That would be a whole 'nother grand-challenge-grade AI project.

I look at this, at the failure to faithfully utilize the art styles I've asked for, the confusion about the subject — all of which are completely understandable given the tech involved — and I think "Will I be ok with very slow incremental progress in these areas?" and I feel like the answer is no.
On GitAtom's Clunkiness And "Ease-Of-Use"… The friction right now for getting a DALL-E image set up for my blog is surprisingly high. Much of this is GitAtom's fault. GitAtom doesn't have any real intrinsic support for images beyond what is provided by the cmarkgfm Markdown engine I'm using. It's not great. In particular, ensuring that an image is rendered at a reasonable size pretty much eliminates using Markdown-style image links, since the width parameter of the image tag wants to be specified. It's possible that with some CSS magic I could make this smoother, but it would be a big project.

Further, I can't just drag-and-drop my generated DALL-E image, for several reasons. I want to preserve the query text I used to get the image as the image title, which again means I'm stuck copy-pasting it into an <img> element somehow. I need to get some alt text for visually-impaired readers, which is different from the title because DALL-E is just not that good. Finally, I want the image wrapped to link to my DALL-E credits page when clicked, which means some nested magic.

I would guess it takes me about 20 minutes to go from "I want a DALL-E image here" to "OK ready to go." That's too long.
On The Competition… But if not DALL-E images, where would my blog pictures come from? Hopefully, this is not too much of a mystery. I could do pretty much without pictures altogether, as I basically am doing anyhow. I could use good ol' photos, both freely-available stock photos and photos I have taken myself. I can't really afford to pay artists, but I can create my own art in my own limited primitive way.

I ran a blog for years and years without DALL-E. I can easily survive without it going forward. "Free is a very good price." —Tom Peterson

It's been a grand experiment. I'm not saying I will never use another DALL-E image. But I'm branching out. Thanks to OpenAI for DALL-E anyhow. It's been an amazing trial.