That's right I'm a huge open source shill.
If the economic situation in Europe continues to deteriorate then people with the means to do so will leave. They might not all go to the US, but that's not gonna be a small consolation to Europe. The only other real alternative is China because nobody else is funding this kind of research and technology development right now.
You'd think this would be an easy one, but here we are.
I didn't say that using LoRA makes it more open, I was pointing out that you don't need the original data to extend the model.
Basically what you're talking about is being able to replicate the original model from scratch given the code and the data. And since the data component is missing you can't replicate the original model. I personally don't find this to be that much of a problem because people could create a comparable model from scratch if they really wanted to using an open data set.
The actual innovation with DeepSeek lies in the use of mixture-of-experts approach to get far better performance. While it has 671 billion parameters overall, it only uses 37 billion at a time, making it very efficient. For comparison, Meta’s Llama3.1 uses 405 billion parameters used all at once. That's the really interesting part of the whole thing. That's the part where openness really matters.
And I full expect that OpenAI will incorporate this idea into their models. The disaster for open AI is in the fact that their whole business model around selling subscriptions is now dead in the water. When models were really expensive to run, then only a handful of megacorps could do it. Now, it turns out that you can get the same results at a fraction of the cost.
What's revolutionary here is the use of mixture-of-experts approach to get far better performance. While it has 671 billion parameters overall, it only uses 37 billion at a time, making it very efficient. For comparison, Meta’s Llama3.1 uses 405 billion parameters used all at once. It does as well as GPT-4o in the benchmarks, and excels in advanced mathematics and code generation. It also has 128K token context window means it can process and understand very long documents, and processes text at 60 tokens per second, twice as fast as GPT-4o.
I never disagreed that you can run Meta's model with the same level of privacy, so don't know why you keep bringing that up as some sort of gotcha. The point about DeepSeek is its efficiency. OSI definition for open source is good, and it does look like you're right that the full data set is not available. However, the real question is why you'd be so hung up on that.
Given that the code for training a new model is released, and it can be applied to open data sets, that means it's perfectly possible to make a version that's trained on open data that would check off the final requirement you keep bringing up. Also, adapting it does not require having the original training set since it's done by tuning the weights in the network itself. Go read up on how LoRA works for example.
It should be repeated: anybody can run DeepSeek themselves on premise. You have absolutely no clue what you're talking about. Keep on coping there though, it's pretty adorable.
Anybody can adjust the weights any way they want.
The only hilarity here is you exposing yourself as being utterly clueless on the subject you're attempting to debate. A model is a deep neural network that's generated by code through reinforcement training on the data. Evidently you don't understand this leading you to make absurd statements. I asked you for information because I knew you were a troll and now you've confirmed it.
Who do you think has more leverage in this relationship, and what stops the US from simply poaching the talent from Europe as the economy there collapses?
lol imagine thinking that a country where pretty much everything is owned by a handful of families is a democracy
He's a billionaire based on the valuation of OpenAI, if the company fizzles so does his wealth.