Fun with Stable Diffusion

After Ars Technica ran this article on a Stable Diffusion mobile app, it seemed like a good time to give Stable Diffusion another shot. I had previously given up figuring out how to set up the desktop version. It’s polished! It includes example prompts to demonstrate what sorts of incantations make up a good prompt, and with that and a moderate wait for it to generate I had this:

8k resolution, beautiful, cozy, inviting, bloomcore, decopunk, opulent, hobbit-house, luxurious, enchanted library in giverny flower garden, lily pond, detailed painting, romanticism, warm colors, digital illustration, polished, psychadelic, matte painting trending on artstation

That was enough to hook me, so when I noticed that article also linked to stable-diffusion-webui, it became a great time to see what the same underlying image generation can do when it can draw ~300W continuously on my desktop instead of being limited to a phone’s resources. I quickly (and somewhat inadvertently) was able to generate a cat fractal:

cute ((cat)) with a bow, studio photo, soft lighting, 4k

This was my introduction to the sorts of artifacts I could expect to do battle with. Then I had an idea of how to use its ability to modify existing photos. After some finagling, I had a prompt ready, set it to replace the view outside the window, and left it running. When set to a very high level of detail and output resolution, it generated 92 images over about 6 hours. Of those, 22 seemed pretty good. Here is a comparison of my 2 favorites:

And a slightly different prompt where I had selected part of the window other than the glass:

a colorful photo of the circular wooden door to a hobbit hole in the middle of a forest with trees and (((bushes))), by Ismail Inceoglu, ((((shadows)))), ((((high contrast)))), dynamic shading, ((hdr)), detailed vegetation, digital painting, digital drawing, detailed painting, a detailed digital painting, gothic art, featured on deviantart

There were three sorts of artifacts or undesirable outputs that were frequent:

  1. It focused too much on the “circular” part describing a hobbit hole door.
  2. It added unsettling Hobbit-cryptids that brought to mind Loab.
  3. The way I used the built-in inpainting to replace the outside meant both that the image was awkwardly separated into 3 largely independent areas, and those images often tried to merge with the area around the window. This makes total sense for its actual use case of editing an image, but I’d tried to configure it to ignore the existing image without success. In retrospect, I could have used the mask I made to manually put the images behind the window with conventional editing software.

It’s wild to use image generation to generate this without being able to even imagine painting it myself. It’s definitely not the system doing all the work – you have to come up with the right kind of prompt, adjust generation parameters, and curate the results. But it’s a lot cheaper and easier than art school and practice, which I feel uncomfortable about because this model was trained in part and without permission on art from those who did go to art school.