AI Weekly #006

🆕 What's New?

Product Updates:

We have added more guidance in the Model Tab and Tutorial Tab to help users get started with our product more quickly.

To enhance the experience of installing plugins, we have added a Pip Install entry point.
At the same time, we've also fixed a number of bugs and improved the installation success rate.

New tutorials:

The website has also been updated with the following articles:

🤩 Weekly‘s AI highlights

📄 Noteworthy papers and technic

MGIE: (opens in a new tab)

This is an image editing model open-sourced by Apple, where users can edit any image through text prompts. For example, "turn the sky pink" or "add a dinosaur to this photo." The model can change the color of the picture, add or remove objects, or adjust and remove certain elements in the image according to the command.

InstructIR: (opens in a new tab)

It can perform high-quality image restoration according to human instructions, including denoising, deraining, defogging, and enhancing low-light images, among other problems. You only need to use text descriptions to repair and improve photos. For example, if you have a photo that looks blurry because of raindrops, you can tell it, "Please remove the raindrops from the photo but keep the content unchanged," and it will automatically do it for you.

Google Bard: (opens in a new tab)

Gemini Pro will support use in more than 40 languages, including Chinese, across over 230 countries and regions. The product adds image generation functionality and a multilingual double-check feature. Just by entering a description, such as "create an image of a dog riding a surfboard," you can generate the corresponding picture.

Media2Face: (opens in a new tab)

Media2Face is capable of generating expressive 3D facial animations synchronized with voice. It also allows users to make more detailed personalized adjustments to the generated facial animations, such as emotional adjustments, like "happy" or "sad." Additionally, it can understand various types of input information (audio, text, image) and use this information as guidance to generate facial animations.

Boximator: (opens in a new tab)

Boximator is a product that controls the motion trajectory of subjects in videos. The method released by Byte is to first circle the subject in the picture, then circle the destination, to generate a video of the subject moving from the starting point to the endpoint. It supports setting the movement path and allows for the selection of multiple subjects. Compatible with SD video models, it can be used as a plugin. There is no experience website available yet; currently, one can send an email to the project side to generate videos.

🛠️ Products you should try

ML Blocks: (opens in a new tab)

This is a no-code AI image generation and analysis workflow platform, primarily addressing the issue of bulk image processing encountered in the e-commerce sector. It offers a drag-and-drop interface, allowing users to easily create complex image processing workflows.

ElevenLabs GPTs: (opens in a new tab)

It can provide an online link for reading online articles aloud through any document content you upload or paste. Currently, it supports five voice options.

Galileo: (opens in a new tab)

Galileo supports text-to-UI and image-to-UI generation, and the results are quite impressive.

Clipdrop: (opens in a new tab)

The Clipdrop tool can easily solve your image clipping troubles, perfectly extracting even hair or detailed parts. Upload a picture, and with a single swipe, it's clean, transforming mosaic images into high definition instantly.

Subscribe for free to receive new posts and support my work. Or join our Discord.