this post was submitted on 30 Jan 2025
2 points (100.0% liked)

Memes

46479 readers
526 users here now

Rules:

  1. Be civil and nice.
  2. Try not to excessively repost, as a rule of thumb, wait at least 2 months to do it if you have to.

founded 5 years ago
MODERATORS
 

blob:https://phtn.app/bce94c48-9b96-4b8e-a4fd-e90166d56ed7

top 19 comments
sorted by: hot top controversial new old
[–] timewarp@lemmy.world 0 points 5 days ago (1 children)
[–] electricyarn@lemmy.world 0 points 5 days ago (1 children)
[–] timewarp@lemmy.world 0 points 5 days ago

Not sure if this is paywalled, but if it is there are plenty of other news stories or you can access the court documents on Court Listener:

https://www.nytimes.com/2025/01/08/technology/sam-altman-sister-lawsuit.html

[–] wuphysics87@lemmy.ml 0 points 5 days ago
[–] qualia@lemmy.world 0 points 5 days ago
[–] brucethemoose@lemmy.world 0 points 6 days ago* (last edited 6 days ago) (1 children)

My friend, the Chinese have been releasing amazing models all last year, it just didn’t make headlines.

Tencent's Hunyuan Video is incredible. Alibabas Qwen is still a go to local model. I've used InternLM pretty regularly… Heck, Yi 32B was awesome in 2023, as the first decent long context local model.

…The Janus models are actually kind of meh, unless you're captioning images, and FLUX/Hunyuan Video is still king in diffusion world.

[–] lambda@programming.dev 0 points 5 days ago (1 children)

Any use for programming? Preferably local hosting only?

[–] brucethemoose@lemmy.world 0 points 5 days ago* (last edited 5 days ago) (1 children)

I mean, if you have huge GPU, sure. Or at least 12GB free vram or a big Mac.

Local LLMs for coding is kinda a niche because most people don’t have a 3090 or 7900 lying around, and you really need 12GB+ free VRAM for the models to start being "smart" and even worth using over free LLM APIs, much less cheap paid ones.

But if you do have the hardware and the time to set a server up, the Deepseek R1 models or the FuseAI merges are great for "slow" answers where the model thinks things out for replying. Qwen 2.5 32B coder is great for quick answers on 24GB VRAM. Arcee 14B is great for 12GB VRAM.

Sometimes running a small model on a "fast" less vram efficient backend is better for stuff like cursor code completion.

[–] Cort@lemmy.world 0 points 5 days ago

Would a 12g 3060 work?

[–] S3verin@slrpnk.net 0 points 6 days ago (4 children)

Which second after deepseek r1?

[–] davel@lemmy.ml 0 points 6 days ago (2 children)

Viral AI company DeepSeek releases new image model family

DeepSeek, the viral AI company, has released a new set of multimodal AI models that it claims can outperform OpenAI’s DALL-E 3.

The models, which are available for download from the AI dev platform Hugging Face, are part of a new model family that DeepSeek is calling Janus-Pro. They range in size from 1 billion to 7 billion parameters. Parameters roughly correspond to a model’s problem-solving skills, and models with more parameters generally perform better than those with fewer parameters.

Janus-Pro is under an MIT license, meaning it can be used commercially without restriction.

[–] AFC1886VCC@reddthat.com 0 points 5 days ago

Hugh Janus Pro!!!

[–] Viking_Hippie@lemmy.dbzer0.com 0 points 6 days ago (1 children)

from the AI dev platform Hugging Face

[–] twei@discuss.tchncs.de 0 points 6 days ago (1 children)

Isn't that description pretty accurate?

[–] Viking_Hippie@lemmy.dbzer0.com 0 points 6 days ago* (last edited 6 days ago) (1 children)

I don't know, probably?

I've just seen enough Alien movies and other pop culture references to be wary of anything combining faces and hugging 😉

[–] Bldck@beehaw.org 0 points 6 days ago
[–] Ascend910@lemmy.ml 0 points 6 days ago

The alibaba one is kinda bad Kimi k1.5 is the one riveling Deepseek r1

[–] Jimmycakes@lemmy.world 0 points 6 days ago

Alibaba has one