Artificial Intelligence & Machine Learning
,
Next-Generation Technologies & Secure Development
German Consultancy’s Latest LLM Aims to Reduce Costs, Preserve Reasoning Skills

As artificial intelligence companies race to build larger and more complex models, a German firm is taking an alternative approach by remixing existing models to try and deliver a faster and more efficient offering.
See Also: OnDemand Webinar | Trends, Threats and Expert Takeaways: 2025 Global IR Report Insights
Munich-based TNG Technology Consulting on Thursday debuted DeepSeek-TNG R1T2 Chimera, a new, large language model designed to blend the strengths of three previous DeepSeek releases into a single package.
Founded in 2001, TNG Technology Consulting says it works closely with organizations in the telecommunications, insurance and e-commerce sectors, as well as the automobile sector, logistics and finance services, on multiple IT fronts, including adopting AI.
The consultancy has already released – as open-source software – multiple LLMs derived from DeepSeek, which is a project from a Chinese startup affiliated with quantitative hedge fund High-Flyer Capital Management. These have included TNG Chimera models R1 and R1T.
The latest offering, R1T2, offers enterprise users and developers a high-reasoning model designed to not overwhelm infrastructure budgets or response times. “R1T2 operates at a sweet spot in intelligence versus inference cost,” TNG said. “We perceive it as generally well-behaved and a nice persona to talk to.”
The company said the model is well-suited for enterprise use cases that prioritize reasoning, concise answers and predictable infrastructure use, although doesn’t recommend it for tasks involving function calling or sophisticated tool use.
The new LLM has been built by combining parts of the original DeepSeek R1, released in January, V3-0324, released in March, and R1-0528, released in May.
The latter model has already drawn high levels of interest from developers for delivering strong reasoning benchmark scores while being trainable at a lower cost. All DeepSeek models are open source and available under an Apache 2.0 license, which allows others to build and distribute derivative models (see: DeepSeek’s New AI Model Shakes American Tech Industry).
To build R1T2, TNG used a process it calls assembly of experts, which merges pre-trained models by combining selected weight tensors – referring to mathematical objects used in LLMs – rather than retraining from scratch, which is a time- and labor-intensive process. TNG describes this approach as using routed expert tensors to influence specialized reasoning while also using shared layers from faster models to reduce output length and latency.
In a technical paper, TNG described how assembly of experts differs from mixture of experts – an approach used in models such as Mixtral and DeepSeek V3. Mixture of experts selectively activates parts of a network at runtime to reduce compute costs. By contrast, assembly of experts produces a single, merged model without using any dynamic routing.
The three different DeepSeek models used to build R1T2 each contributed particular traits: R1’s structured approach to language, V3-0324’s shorter, instruction-oriented responses and R1-0528’s performance on reasoning tasks.
Benchmark data published by TNG shows that R1T2 achieves between 90% and 92% of R1-0528’s performance, based on AIME-24, AIME-25 and GPQA-Diamond tests, which evaluate a model’s ability to solve advanced mathematical and graduate-level reasoning problems. The merged model also uses about 40% of the output tokens compared to R1-0528. Such a reduction should lower inference time and server load in production environments.
For testing its new model, TNG evaluated speed based on how many words the model produces in each answer, saying that shorter replies are often more practical than just measuring raw processing speed. This reduction has implications for real-world deployment costs and latency. In many applications, the length of model outputs is a practical bottleneck. By generating shorter responses, TNG said R1T2 reduces inference time by about 60% compared to R1-0528, and that in practice, the model is about 20% faster than the original DeepSeek-R1 and more than twice as fast as R1-0528.
Early reviews appear to be positive, including at Hugging Face, which is hosting a copy of the model.
“We have been dogfooding it for a while now,” a user said in a post to Reddit. “My personal experience is that the claims are true; it’s better than the original R1 and R1T and much faster (i.e. fewer output tokens) than R1-0528 but not quite as good.”
As is typical with LLMs, “there still are tradeoffs, like the fact that we were not able to preserve tool calling, which is likely due to the original R1 being in the mix,” the user said, referring to R1’s shortcomings when it comes to the AI model interacting with external tools, APIs or systems.
TNG has released its new LLM under an MIT License, which permits free use, modification and commercial deployment.
The firm cautioned organizations serving EU users to review their compliance obligations under the EU AI Act, saying it might not be suitable.
Chimera Project
TNG’s earlier Chimera models still remain available through various platforms, including OpenRouter and Chutes. The firm said these previous versions have already processed billions of tokens daily and that R1T2 extends this strategy by offering a variant with lower output length and faster response times.
The company said its Chimera project remains focused on improving efficiency rather than trying to compete with the largest proprietary models. The assembly-of-experts approach should provide performance while reducing costs, which may appeal to enterprises prioritizing predictable infrastructure usage.
Benchmark charts accompanying R1T2’s release show it sits near the upper end of the tradeoff curve between reasoning accuracy and output length. This positioning reflects a deliberate balance rather than a focus on maximizing benchmark scores alone, TNG said.
To what extent R1T2 may display further shortcomings identified in DeepSeek remains unclear. Multiple versions, including ones used to create R1T2, have been criticized for producing factual inaccuracies and having opaque reasoning. Researchers have reported DeepSeek models hallucinating financial data, as well as exhibiting “significant safety deficiencies.”
Some developers have also found that DeepSeek model responses don’t always consistently include labeled chain-of-thought reasoning, which can make it harder to assess how answers are generated.
Political bias and benchmark reliability also remain open concerns. A recently published paper found evidence of pro-government filtering in DeepSeek, suggesting that certain prompts trigger censorship or partial responses. Additional evaluations have shown uneven performance across different domains. For example, an independent analysis reported that DeepSeek-R1 underperformed a GPT-2 baseline on chess tasks. Another study argued that the model’s high benchmark scores can vary significantly with minor evaluation adjustments, raising questions about the consistency of reported test results.
The long-term adoption of LLMs such as R1T2 is uncertain, especially as more companies experiment with mixtures of smaller models and specialized systems. Regardless, the release highlights ongoing quests to deliver higher-performing LLMs without having to increase model size.