GLIGEN, or Grounded-Language-to-Image Generation, is a novel technique that builds on and extends the capability of current pre-trained diffusion models.
With caption and bounding box condition inputs, GLIGEN model generates open-world grounded text2img.
GLIGEN can generate a variety of objects in specific places and styles by leveraging knowledge from a pretrained text2img model.
GLIGEN may also ground human keypoints while generating text-to-images.
The Trust Project is a worldwide group of news organizations working to establish transparency standards.
Large-scale text-to-image diffusion models have come a long way. However, the current practice is to rely solely on text input, which can limit controllability. GLIGEN, or Grounded-Language-to-Image Generation, is a novel technique that builds on and extends the capability of current pre-trained text-to-image diffusion models by allowing them to be conditioned on grounding inputs.
To maintain the pre-trained model’s extensive concept knowledge, developers freeze all of its weights and pump the grounding information into fresh trainable layers via a controlled process. With caption and bounding box condition inputs, GLIGEN model generates open-world grounded text-to-image, and the grounding ability generalizes effectively to novel spatial configurations and concepts.
Check out the demo here.
- GLIGEN is based on existing pre-trained diffusion models, the original weights of which have been frozen to retain massive amounts of pre-trained knowledge.
- At each transformer block, a new trainable Gated Self-Attention layer is created to absorb additional grounding input.
- Each grounding token has two types of information: semantic information about the grounded thing (encoded text or image) and spatial position information (encoded bounding box or key points).
|Related article: Microsoft has released a diffusion model that can build a 3D avatar from a single photo of a person|
Read more about AI:
Any data, text, or other content on this page is provided as general market information and not as investment advice. Past performance is not necessarily an indicator of future results.