It represents a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, achieving impressive chat capabilities mimicking spirits of the multimodal GPT-4 and setting a new state-of-the-art accuracy on Science QA. | Create stunning content with 50+ free AI models including Flux, GPT-4o, Veo3, Suno. Generate professional images, videos, and music from text instantly. No subscriptions required - start creating now! |
Open-source;
Multimodal GPT-4 level capabilities;
Impressive chat abilities | 50+ Free AI Models, Text to Image Generation, Text to Video Generation, AI Music Creation, GPT-4o Integration, Flux Models, Veo3 Video Model, Suno Music Model, No Subscription Required, Instant Generation |
Statistics | |
GitHub Stars 23.9K | GitHub Stars - |
GitHub Forks 2.7K | GitHub Forks - |
Stacks 1 | Stacks 0 |
Followers 1 | Followers 1 |
Votes 0 | Votes 1 |
Integrations | |
| No integrations available | |

It is the base model weights and network architecture of Grok-1, the large language model. Grok-1 is a 314 billion parameter Mixture-of-Experts model trained from scratch by xAI.

Try Grok 4 on GPT Proto. Access xAI’s most advanced 1.7T LLM with 130K context, multimodal support, and real-time data integration for dynamic analysis.

Creating safe artificial general intelligence that benefits all of humanity. Our work to create safe and beneficial AI requires a deep understanding of the potential risks and benefits, as well as careful consideration of the impact.

It is a next-generation AI assistant. It is accessible through chat interface and API. It is capable of a wide variety of conversational and text-processing tasks while maintaining a high degree of reliability and predictability.

It is Google’s largest and most capable AI model. It is built to be multimodal, it can generalize, understand, operate across, and combine different types of info — like text, images, audio, video, and code.

It is a state-of-the-art foundational large language model designed to help researchers advance their work in this subfield of AI.

It is a large multimodal model (accepting text inputs and emitting text outputs today, with image inputs coming in the future) that can solve difficult problems with greater accuracy than any of our previous models, thanks to its broader general knowledge and advanced reasoning capabilities.

It offers an API to add cutting-edge language processing to any system. Through training, users can create massive models customized to their use case and trained on their data.

It is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.

It is a small, yet powerful model adaptable to many use cases. It is better than Llama 2 13B on all benchmarks, has natural coding abilities, and 8k sequence length. We made it easy to deploy on any cloud.