Technology company VNGRS, Turkey announced 'Kumru LLM', its first major language model trained from scratch with entirely Turkish data and publicly shared. This development documents that Türkiye has risen to the position of a country that not only uses artificial intelligence technology but also develops and produces it. Thanks to its own national "tokenizer," it offers advantages over multilingual models. %90’a varan oranda daha verimli Kumru, with its ability to be implemented in-house at a low cost, is pioneering a new approach that keeps data security within national borders in critical sectors such as finance and healthcare.
'Kumru', a Turkish-language application developed by Turkish engineers, has been made publicly available!
Considering Türkiye's needs for security, compliance, and excellence in the Turkish language, VNGRS engineers embarked on a journey with the vision of developing a completely domestic and national core language model. Following an intensive 45-day training process, Kumru has 7.4 billion parameters., It was brought to life by training with a massive 500 GB Turkish dataset and 300 billion "tokens".
Kumru is not just a machine that understands words; it's an artificial intelligence rooted in this land, understanding the structure, idioms, cultural codes, and natural flow of the Turkish language. This is its most fundamental and powerful feature that sets it apart from its global competitors.
Verimliliğin Sırrı: %90 Daha Verimli Milli Tokenizer, Küresel Rakiplerini Geride Bırakıyor
The biggest technical secret behind Kumru's revolutionary success is..., "Tokenizer" developed from scratch for Turkish.“ This technology, which determines how a language model understands text, generally works inefficiently for Turkish in multilingual models and increases cost and processing time by breaking even a simple sentence into too many parts.
Kumru's national tokenizer solves this problem at its root. In tests, other multilingual models used Kumru to process Turkish text. %38 ila %98 arasında daha fazla “token” kullandığı It has been detected. What does that mean?
- Faster and Cheaper: Kumru can process the same text much faster and cheaper, but with significantly less processing power.
- More Information: With a context window length of 8,192 tokens, it can fit almost twice as much Turkish text as its competitors. This means it can understand a document of approximately 20 A4 pages in a single reading.
A Success Beyond Its Size: How Did the 7 Billion Parameter Pigeon Outperform the 70 Billion Parameter Giants?
Kumru's capabilities meet the academic artificial intelligence performance measurement standard in Türkiye. Ruler This was also proven in the benchmark test. In this test, which included 26 different Turkish natural language processing tasks, Kumru achieved incredible success.
According to the results Kumru-7B, which is 10 times larger than itself Global giants like LLaMA-3-70B, Qwen-2-72B and Gemma-3-27B, especially in areas requiring expertise in language nuances, such as Turkish grammar correction and text summarization. He left it behind. This has become the clearest proof in the world of artificial intelligence that "it's not the biggest that wins, but the one who does their job best.".
Our Data is Safe in Our Country: Digital Sovereignty with On-Premise Implementation
One of Kumru's most strategic advantages is its ability to be installed on-premise (on-premise) by companies. This is revolutionary, especially for sectors where data privacy is vital, such as banking, finance, healthcare, and the public sector. Now, companies can securely develop AI solutions on their own systems without having to send their sensitive data to cloud servers abroad.
Moreover, accessing this technology is now much more economical. Kumru-7B, on a consumer-grade graphics card like the RTX 3090 (16GB VRAM) It can even function. According to VNGRS, the hardware cost of installing Kumru in-house is approximately... When it was $2,000, The cost of a single H100 GPU required for the closest competing model with similar Turkish language capabilities. $30,000 This represents a tremendous cost advantage brought about by national technology.
Democratic AI: The Open Source Kumru-2B Version, Available for Everyone, Has Arrived.
VNGRS operates with the mission of offering this technology not only to large institutions but to all of Türkiye., Kumru-2B smaller in name and completely open source It also released a version. This model, with 2 billion parameters, has a low memory requirement of 4.8 GB. even on mobile devices It is operational. Thanks to this, students, researchers, start-ups, and anyone curious can freely use Türkiye's first national language model in their own projects.
Kumru is opening a new era in localized artificial intelligence experience in Türkiye by being integrated into countless scenarios, from RAG-based chatbots and document summarization to call center analysis and social media content creation.
Try Kumru
VNGRS has also created a webpage so that everyone can try out Kumru, the national language model it developed. Users who want to see the model's capabilities such as text comprehension, summarizing, and generation can access the site., kumru.ai They can meet Kumru through this address.
Key words: Kumru LLM, native language model, Turkish artificial intelligence, VNGRS, open source LLM, large language model, natural language processing, on-premise AI, national tokenizer, Cetvel benchmark









Reply