AI Weekly 011

🆕 What's New?

Product Update:

Added the ControlBoard feature, where you can adjust the parameters of each node through the sidebar, so you no longer have to struggle to find the place to adjust settings.
Added a prominent Cancel button, making it more convenient to stop running processes.

Added a settings menu; you can now manually adjust the language settings.
Optimized the display of certain notifications.
Fixed some known bugs:
- Fixed some plugin compatibility issues.
- Fixed an issue where some Windows users' systems could not find PowerShell.

Download link: Comflowyspace (opens in a new tab)

🤩 Weekly‘s AI highlights

📄 Noteworthy papers and technic

Magi (opens in a new tab)

The Magi model adeptly converts comic content into refined textual scripts by accurately identifying key elements on comic pages such as panels, texts, and characters. This technology not only establishes new benchmarks in the understanding and automated processing of comics but also, through advanced functionalities like character clustering and text association, ensures the narrative's logical tightness and the correctness of the reading sequence, offering readers an unparalleled reading experience.

H2O (opens in a new tab)

The Human to Humanoid (H2O) system, based on Reinforcement Learning (RL) technology, enables users to perform real-time, full-body remote control of full-size humanoid robots using just an RGB camera. The essence of the H2O system lies in its ability to transform human dynamic actions, such as walking, jumping back, kicking, turning, waving, pushing, and boxing, into actions that humanoid robots can execute, thereby achieving seamless human-robot collaboration.

DragAnything (opens in a new tab)

DragAnything achieves precise motion control of any object in a video through physical representation. Users can manipulate objects by drawing simple trajectories, without the need for complex auxiliary signals. This technology supports simultaneous control of multiple objects, significantly enhancing editing efficiency and video quality.

NaturalSpeech 3 (opens in a new tab)

NaturalSpeech 3 utilizes an innovative decomposed diffusion model to break down speech into independent subspaces such as content, prosody, timbre, and acoustic details, generating these attributes separately to effectively simulate complex speech. It surpasses existing technologies in terms of speech quality, similarity, prosody, and intelligibility. Particularly on the LibriSpeech test set, its synthetic speech quality is comparable to real speech.

🛠️ Products you should try

Optimizer AI (opens in a new tab)

Optimizer AI is a sound generation tool that can produce sounds and sound effects suitable for various scenarios based on text prompts, such as the sound of shooting in games, the sound of rain in animations, and the sound of a subway arriving at a station. It supports stereo and high-quality audio at 44.1kHz, enhancing realism and immersion. Additionally, it allows for the direct generation of sound effects from videos, providing users with significant creative convenience.

Screenshot to Code (opens in a new tab)

Screenshot to Code is an open-source project that automatically converts screenshots into code for HTML, CSS, or front-end frameworks such as React and Vue. This project leverages OpenAI's GPT-4 Vision technology for image recognition and code generation, along with DALL-E 3 for image generation, thus streamlining the front-end development process. Users simply upload a screenshot, and the system can identify interface elements and output the corresponding code, greatly enhancing development efficiency.

Dora (opens in a new tab)

Dora AI is a no-code website building platform that, with the help of AI generation technology, quickly creates editable and interactive websites through text prompts alone.

PixVerse AI (opens in a new tab)

PixVerse AI can accept a variety of data sources, including images, text, and audio, as input to generate coherent and lifelike video content. This platform can transform the materials provided by users into videos in a short amount of time, greatly improving the efficiency of video production. It also supports users in making personalized settings and adjustments to the generated videos.

Subscribe for free to receive new posts and support my work. Or join our Discord.