"I am pleased to announce that Maral is here. The 7 billion parameters bilingual model which can respond to Persian and English prompts, and can produce GPT-3.5 level of answers based on the dataset we fed to it!"

"I personally never thought a 7 billion parameters model can understand multiple languages this well. So I put more information on the next section of the article on why Mistral became my number one choice as the base model!"

"However, the biggest problem was still there and it was the dataset. Finding a good enough dataset is always a bottleneck. But we've been lucky enough that one of Iranian developers, has translated Alpaca Dataset to our beloved Persian language."

They trained it with two techniques, Quantized Low Rand Adoption (QLoRa) and Parameter Efficient Fine Tuning (PEFT). The blog post does not elaborate on how these techniques work.

The tokenizer is not optimized for Persian letters, so in the future they expect to be able to make a better language model for Persian.

Maral is here, 7 billion parameters bilingual model with support of Persian!

#solidstatelife #ai #genai #llms #persian

1

There are no comments yet.