this post was submitted on 22 Jul 2024
121 points (100.0% liked)

Technology

37737 readers
711 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:


This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] barsoap@lemm.ee 4 points 4 months ago (1 children)

It's quite easy to trick people with untrained eyes... for one, they have no idea what "consistent illumination" and stuff means. And something being off doesn't mean that an AI made that mistake because humans make mistakes, too -- photographs don't, but the general problem is not just about telling realistic stuff apart but also illustrations. You're looking specifically for mistakes that AI is likely to make, but humans are practically never going to make. And yes humans get hands wrong all the time.

Here's a good video about what to look for and what not.

[–] jarfil@beehaw.org 2 points 4 months ago (1 children)

Yes, my comment applied more to photorealistic AI images.

Illustrations are a different beast, where people have much more creative freedom... and that video is reasonably good at explaining that, but I find it falls short at some points:

  1. AI image generators don't "consult" source images to generate an output. At training time, they extract patterns from the training set, which is never again used for generation, only the extracted patterns are.
  2. Modern AI generators are increasingly good at generating text. They still struggle a bit, but compared to a year ago, they can now generate headlines and large text correctly, while the mess gets shoved into smaller and less important text. This isn't all that different from human artists adding "filler gibberish" text.
  3. Layers. While a naive (and cheaper) approach to AI generation doesn't use layers, there are generators which do use layers, and can keep object consistency across obscured or cut-off sections.

As AI generators advance, all these differences are likely to disappear... by following this same criticisms to fix things.

[–] barsoap@lemm.ee 1 points 4 months ago (1 children)

AI image generators don’t “consult” source images to generate an output.

Well, you have an artist breaking things down for an audience understanding neither the technical nor artistic aspect...

Modern AI generators are increasingly good at generating text. They still struggle a bit

I mean... SDXL still struggles a lot. The only thing you can get it to spell reliably is probably "Hooters". There's the one or other lora which makes it not suck completely but it's still nowhere near actually good at generating text, the training just isn't there. And even with that in place things like signatures are probably going to be gibberish.

While a naive (and cheaper) approach to AI generation doesn’t use layers, there are generators which do use layers,

Unless you start off training by feeding the model 3d data (say, voxels) alongside 2d projections I don't think it's ever going to develop a proper understanding of these kinds of things. Or, differently put: Learning object permanence (of sorts, related) is a meta-cognitive abstraction step that just won't happen with the type of topologies we know how to engineer. It's probably like 90% on the way towards AGI, so to get a simple topology to understand it we have to spoon-feed it permanence information alongside the (apparent) non-permanence.

[–] jarfil@beehaw.org 1 points 4 months ago

breaking things down for an audience understanding neither the technical nor artistic aspect...

Not a reason to misrepresent things. Reminds me of the animistic fallacy, if they even understand what's really going on themselves.

As for text, I've seen the MS generator spit out decent text, at least in titles and logos, and some AI art with full legible sentences.

Unless you start off training by feeding the model 3d data (say, voxels) alongside 2d projections

Some time ago already, there was an SD fork with bounded box support, and a ChatGPT preprocessor prompt template to do the layout. Object permanence in this case is as simple as continuing with the lower layer once the upper one is finished, maintaining object continuity in the lower layer. It's reasonable to expect this to go from bounded boxes, to freehand layers for each object. Since an LLM has been shown to be a good preprocessor to set the layout, some more integration between both, with object feedback from the SD to reduce the layer bounding box, would do wonders. Adding an opacity mask could be a bit harder, but sounds doable.

I don't see the need of much higher abstraction to address this issue. Rendering videos of translucent objects, might need it, though.