Technology

37739 readers

500 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago

MODERATORS

alyaza@beehaw.org

TheRtRevKaiser@beehaw.org

gyrfalcon@beehaw.org

rs5th@beehaw.org

coldredlight@beehaw.org

Los@beehaw.org

SemioticStandard@beehaw.org

TheRtRevKaiser@kbin.social

remington@beehaw.org

221

Mozilla Firefox new alt-text generator powered by "fully private on-device AI model" (hacks.mozilla.org)

submitted 5 months ago* (last edited 5 months ago) by frogman@beehaw.org to c/technology@beehaw.org

71 comments fedilink hide all child comments

New accessibility feature coming to Firefox, an "AI powered" alt-text generator.

"Starting in Firefox 130, we will automatically generate an alt text and let the user validate it. So every time an image is added, we get an array of pixels we pass to the ML engine and a few seconds after, we get a string corresponding to a description of this image (see the code).

...

Our alt text generator is far from perfect, but we want to take an iterative approach and improve it in the open.

...

We are currently working on improving the image-to-text datasets and model with what we’ve described in this blog post..."

you are viewing a single comment's thread
view the rest of the comments

[–] jherazob@beehaw.org 10 points 5 months ago (2 children)

Now i want this standalone in a commandline binary, take an image and give me a single phrase description (gut feeling says this already exists but depending on Teh Cloudz and OpenAI, not fully local on-device for non-GPU-powered computers)

[–] umami_wasbi@lemmy.ml 4 points 5 months ago (2 children)

Ollama + llava-llama3

You now just need a cli wrapper interact with the ollama api

[–] jherazob@beehaw.org 4 points 5 months ago (2 children)

So, it's possible to build but no one has made it yet? Because i have negative interest in messing with that kinda tech, and would rather just "apt-get install whatever-image-describing-gizmo" so i wouldn't be the one who does it

[–] Swedneck@discuss.tchncs.de 4 points 5 months ago

this is how i feel about basically all technology nowadays, it's all so artificially limited by capitalism.

nothing fucking progresses unless someone figures out a way to monetize it or an autistic furry decides to revolutionize things in a weekend because they were bored and inventing god was almost stimulating enough

[–] drwho@beehaw.org 2 points 5 months ago (1 children)

Folks have made it - I think ollama was name-checked specifically because it's on Github and in Homebrew and in some distros' package repositories (it's definitely in Arch's). I think some folks (at least) aren't talking about it because of the general hate-on folks have for LLMs these days.

[–] jherazob@beehaw.org 2 points 5 months ago (2 children)

I don't want an LLM to chat with or whatever folks do with those things, i want a command i can just install, i call the binary on a terminal window with an image of some sort as a parameter, it returns a single phrase describing the image, on a typical office machine with no significant GPU and zero internet access.

Right now i cannot do this as far as i know. Pointing me at some LLM and "Go build yourself something with that" is the direct opposite of what i stated that i desire. So, it doesn't currently seem to exist, that's why i stated that i wished somebody ripped it off the Firefox source and made it a standalone command.

[–] umami_wasbi@lemmy.ml 1 points 5 months ago* (last edited 5 months ago)

And you expect someone just do it for you? You alrady get the inferencing engine and the model for free mate.

[–] drwho@beehaw.org 1 points 5 months ago

I thought that feature was built into it, but okay.

[–] Zworf@beehaw.org 1 points 5 months ago

Yes I was just writing that, I would love to see more integrations that can talk against ollama.

[–] Marsupial@quokk.au 1 points 5 months ago (1 children)

Any multimodal llm could do this in a heart beat locally.

And OpenAI has made their shit freely available to run locally, it’s like the worst company to use as an example.

[–] photonic_sorcerer@lemmy.dbzer0.com 3 points 5 months ago (1 children)

Where is this freely available multimodal GPT4 you speak of?