Comic: Roman Republic, 6th century BC

This image has an empty alt attribute; its file name is romans_killing_dogs_comic-1024x341.png

Back in the spring of 2024 I listened to an episode of the great podcast, The Rest is History, where they talked about the emergence of the Roman empire and had this great bit about how the Romans were so ruthless in their imposition of power that if they were resisted, they killed everyone including the dogs. As a dog lover my ears perked up and I thought if i had an exposition of this from a professor whose pug was listening in, there might be a slightly funny comic there.

I figured this was a good test case for ChatGPT 4/Dall-E 3 (later tried with 4o/Dall-E 3)…

Could I get it to produce this simple strip? I got tantalizingly good results right away. But nothing ever worked. Here are the problems:

Dall-E – and other image generators – cannot do text, it’ll get a few words right and then descends into gibberish. The image generator obviously should not be trying to generate text… to me it seems like the best thing would be for chatgpt to intercept the request for text and get Dall-E to put in blank space where the text should go, and add text in post-processing but that runs into
Dall-E seems to want to fill the space… it’s hard to get it to put in empty speech bubbles, at least no the right size.
Putting in the text for a speech bubble gets that text into Dall-E’s head and stuff from the text dribbles into the comic
Chat-GPT + Dall-E cannot for the life of them do a multiple panel strip that respects the three panel layout I requested. It always does its own thing, creating however many panels it feels like. Some of the designs it came up with were quite lovely. But I mean pacing is important in comics and you can’t just move things around willy nilly
It’s hard to iterate. When Dall-E does the wrong thing, you can ask it for modifications. Typically it just makes the same mistake agains but says it fixed it. Once a bad idea gets into its memory, it never leaves and pollutes subsequent requests, so you pretty much always need to start a new conversation to get a clean slate.
Dall-E is only comfortable creating 1024×1024 (square), 1728×1024 (landscape), and 1024×1728
If you want a character to be consistent across your comic, it’s _way_ better if you can get the whole strip created in one go… but since Dall-E doesn’t really understand panels, that won’t work.
Weirdly, Dall-E would get basic things wrong. If I ask for a dog lying down, gnawing on a bone, I get a dog sitting up with a bone in front of it. Or the professor is holding a bone. Things like that.

Here’s an example of the tantalizing garbage I would get. I mean, I kind of love some of the graphics here. But it obviously does not understand what I’m going for. It’s some fun impressionism I guess:

Given all that, how did I get the mediocre strip you see at the start of this article? It was done with ChatGPT-4o +Dall-E 3, one panel at a time, with considerable effort put into convincing Dall-E to leave empty spaces. When I got a decent first panel I asked Chat-GPT to remember it so I could use its seed for the third panel, hoping to be able to use that seed to generate a similar pug in the third panel… that didn’t really work.

Anyway I eventually got three ok panels, stitched them together in gimp, and added the text. I don’t love the result so my dream of using ChatGPT as my artist assistant for comics ideas is currently dead.

What about alternatives? I looked into midjourney but I really didn’t like the interface and it doesn’t seem to solve the text issue. I tried some other recommended tools, but they weren’t effective at getting the drawing style I wanted. Dall-E can definitely produce the style of images I want! But it can’t put the whole thing together, at this time – Oct 2024.

This is the kind of prompt I want to give, and expect to get something I can iterate on:

Characters:
Professor: a a balding, energetic, middle aged professor with wild hair  wearing a tweed coat
Pug: a pug

Panel 1: 
  description: a head-on view of  Professor standing behind a lectern. Pug gnaws on a bone on the floor to his right. 
  professor(says): "In the sixth century BC the romans eschewed monarchichal rule and became more dangerous to their neighbours..."

Panel 2: 
  description:  we see a handful of ancient roman infantry carrying short swords marching toward an ancient primitive city on a barren plain, 
  Professor(says, from off-panel)"... in this period the Romans began to expand their territory more aggessively ...cities that resisted Roman might were treated without mercy. The Romans would kill every living thing within the walls..."

Panel 3: 
  description: a closeup of Pug from the first panel, with its head turned and ears pricked up quizzically. 
  professor(says, from off-panel):  "... even the dogs."

I don’t want to be stitching things together in an image editor and adding text. That’s the type of toil AI tooling should be able to do. And I’m sure it’s doable. But I haven’t seen much progress in the 6 months since I started trying to do it.