
Text and Image Are Hard to Distinguish with Apple’s New MM1 Large Language Model

According to some recent pieces of information, the research team of Apple has been successful with a new ‘MM1’ multi-modal large language model. This remarkable advancement has been outlined in a recent publication named “MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training.” It presents a model with remarkable ability in both natural language reasoning and image recognition.

The model is available in three sizes

Reportedly, Apple’s new MM1 is available in three sizes. These include 3 billion, 7 billion, and 30 billion parameters respectively. These models have been used by researchers to conduct different experiments which highlight the main factors impacting performance. In contrast to the visual language connectors, image resolution and the number of image tags have a bigger influence. The effectiveness of the model can be affected based on the variety of pre-training data sets.

According to details, the new MM1 model is based on “Mixture of Experts” architecture and a “Top-2 Gating” method. With this approach, researchers were able to generate exceptional results in pre-training standards. The similar results were obtained on current multi-modal benchmarks. Notably, the competitive performance of the MM1 models were maintained even after fine-tuning them for different tasks.

During testing, it was unveiled that the performance of the MM1-3B-Chat and MM1-7B-Chat models was greater than the already existing models in the market. In particular, these models were quite efficient in tasks like TextVQA (text-based question answering about an image), ScienceQA (scientific question answering), and VQAv2 (question answering based on an image and text).

One thing to mention here is that currently the MM1’s overall performance is not more than Google’s Gemini or OpenAI’s GPT-4V models. Even if MM1 isn’t the definitive contender just yet, Apple has made tremendous progress in artificial intelligence with this release. Besides this, DarwinAI was recently acquired by Apple. It is quite obvious that Apple is moving forward with a progressive approach.