Google Release Gemini 1.5 Pro to Public with Advanced Features

Google has announced the global launch of their next-generation Gemini 1.5 Pro model, now available in over 180 countries through the Gemini API in public preview. This release comes less than two months after the model was initially made available to developers in Google AI Studio, allowing them to explore its groundbreaking 1 million context window.

One of the most significant advancements in Gemini 1.5 Pro is its native audio (speech) understanding capability, which is available both in the Gemini API and Google AI Studio. This new feature enables developers to unlock new use cases by expanding the input modalities beyond text. Furthermore, Gemini 1.5 Pro can now reason across both image (frames) and audio (speech) for videos uploaded in Google AI Studio, with plans to add API support for this functionality in the near future.

In response to developer feedback, Google has introduced several improvements to the Gemini API. System instructions allow developers to guide the model’s responses by defining roles, formats, goals, and rules, tailoring the model’s behaviour to specific use cases. The new JSON mode enables structured data extraction from text or images by instructing the model to output only JSON objects. Additionally, improvements to function calling now allow developers to select modes that limit the model’s outputs, enhancing reliability.

Alongside the Gemini 1.5 Pro launch, Google has released their next-generation text embedding model, text-embedding-004 (text-embedding-preview-0409 in Vertex AI), accessible via the Gemini API. This new model achieves stronger retrieval performance and outperforms existing models with comparable dimensions on the MTEB benchmarks.

To facilitate the adoption of Gemini 1.5 Pro, Google has introduced a new Gemini API Cookbook, which provides code examples and quickstarts for developers. The company emphasizes its commitment to making Google AI Studio and the Gemini API the easiest way to build with Gemini, with plans for further improvements in the coming weeks.

As the AI landscape continues to evolve at a rapid pace, Google’s release of Gemini 1.5 Pro demonstrates its dedication to pushing the boundaries of natural language processing and providing developers with cutting-edge tools to create innovative applications. With its expanded global availability, advanced features, and improved performance, Gemini 1.5 Pro is poised to empower developers worldwide to unlock new possibilities in AI-driven solutions.

Leave a Comment जवाब रद्द करें

Microsoft’s VASA-1 Generates Lifelike Talking Faces in Real-Time from Audio