Welcome, Seeweb users! Are you ready to dive into the exciting world of AI-driven image generation with unparalleled control? ComfyUI is a powerful, node-based graphical user interface for Stable Diffusion that transforms how you create images. Forget rigid menus; ComfyUI offers a visual, modular environment where you design and execute advanced image generation pipelines.
With ComfyUI, you’re not just generating images; you’re orchestrating the entire process. Its key advantages are:
- Transparency and Control: You see every step of the generation process. This allows for precise control and a deeper understanding of how your final image is created.
- Modularity and Flexibility: Each operation is a self-contained node. You can connect them in any way you imagine, creating simple or incredibly complex workflows that are impossible with other interfaces.
- Efficiency: ComfyUI only re-executes the parts of the workflow that have changed, saving you time and computational resources on complex generations.
- Shareability: The entire workflow can be saved as a single JSON file or embedded within a generated PNG image, making it incredibly easy to share your exact process with others.
Linux Installation Guide
Run the installation commands: The following series of commands will install the necessary tools, create a virtual environment, and start the application.
# 1. Install Virtualenv
apt-get install virtualenv
virtualenv venv
# 2. Activate Virtualenv
source venv/bin/activate
# 3. Install ComfyUI Cli
pip install comfy-cli
comfy install
cd ComfyUI
# 4. Start ComfyUI
From within the ComfyUI directory, run:
python main.py --port 8188
You can now access the interface in your browser, usually at http://127.0.0.1:8188.
How to Update ComfyUI
To update your manual installation:
- Pull the latest code:
cd /path/to/your/ComfyUI
git pull
2. Install/Update dependencies: Make sure your comfyenv virtual environment is active.
pip install -r requirements.txt
The Graph/Nodes Interface
When you first open the link, you’ll see the ComfyUI interface. It’s a graph where you connect different blocks, called nodes, together.
- Nodes: These are the rectangular blocks that perform specific tasks (e.g., loading a model, writing a prompt). They have inputs on the left and outputs on the right.
- Connections (Edges): These are the wires that connect the nodes, allowing data to flow from one node’s output to another’s input.
- Workflow (Graph): The entire arrangement of nodes and connections is your workflow. It represents a complete process for generating an image.
Your First Image Generation: Text-to-Image
The easiest way to get started is by using a pre-built template, which provides a solid foundation for understanding the fundamental flow of data.
Starting with a Template
ComfyUI comes with several workflow templates that are perfect for getting started. You can browse them by clicking the folder icon 📁 on the right-hand menu. For your first generation, however, the easiest way is to simply click the Load Default button to load a basic, fully functional text-to-image pipeline.

Now, let’s break down this default workflow node by node to understand what each part does in greater detail.
The Workflow, Step-by-Step
1. Load Checkpoint (The Model)
This is the starting point. This node loads the main Stable Diffusion model (the checkpoint), which is a large file containing all the “knowledge” the AI has learned from training on billions of images. It’s the core engine of your image generation.
- Action: Click the ckpt_name field and select a model like v1-5-pruned.safensors. Different models have different styles and specialties.
- Outputs: This node provides the MODEL, CLIP, and VAE.

2. CLIP Text Encode (The Prompts)
The model needs instructions. These nodes take your text and, using the CLIP model, convert it into “conditioning,” a numerical representation (embeddings) that guides the AI.
- Positive Prompt: (Green node) This is your creative direction. A majestic lion with a fiery mane, photorealistic, 4k, cinematic lighting.
- Negative Prompt: (Red node) This tells the AI what to avoid. cartoon, drawing, deformed, bad anatomy, blurry, text, watermark.

3. Empty Latent Image (The Canvas)
This node creates a blank canvas of random noise in the “latent space,” which will be the starting point for the generation.
- Action: Set the width and height. For SDXL models, 1024×1024 is a good start. batch_size allows you to create multiple images in a single run.

4. KSampler (The Generator)
This is the heart of the operation. The KSampler iteratively “denoises” the latent image, refining the chaos into a coherent picture that matches your vision.
- seed: This number initializes the random noise. Keeping the seed identical (along with all other settings) allows you to reproduce an image exactly. To get a new variation of your prompt, simply change the seed.
- steps: The number of denoising steps. A good range is 20-30.
- cfg: How strictly the AI adheres to your prompt. A value of 7-8 is a good balance.
- sampler_name: The algorithm used for denoising. euler is fast, while dpmpp_2m_sde is popular for quality.
- scheduler: Controls the rate of denoising at each step. karras is a popular choice.
- denoise: A value of 1.0 is standard for text-to-image.

5. VAE Decode (From Latent to Pixels)
This node decodes the latent representation back into a standard pixel-based image.

6. Save Image (The Final Output)
This node saves the image and displays a preview in the interface.

Queueing the Prompt
Click Queue Prompt on the menu. The nodes will execute, and your image will appear in the Save Image node.

Upscaling Your Image (2-Pass Workflow)
To get higher-resolution images with more detail, you can generate an image at its native resolution and then upscale it in a second pass.

Second Pass: Upscaling and Refinement
- Load an Upscale Model: Add a Load Upscale Model node.

2. Upscale the Image: Add an ImageUpscaleWithModel node. Connect the IMAGE from your first pass here.

3. Refine with a Second KSampler: Encode the upscaled image back to latent and pass it to a new KSampler. Crucially, set the denoise value between 0.3 and 0.5 to add detail without changing the composition.
4. Decode and Save: Decode the result from the second sampler and save the final, high-resolution image.
From Text to Video with Wan 2.1
ComfyUI isn’t just for static images. You can also generate short, animated video clips starting from a single image. The Wan 2.1 workflow is excellent for this.
The easiest way to get started is by using a text to video template, which provides everything you need for reaching your goal.
Here’s a breakdown of the key components for a text-to-video setup:

- Load Checkpoint (The Video Models)
This node will load the necessary models for video generation.
- Action:
- For the Diffusion Model, load wan2.1_t2v_1.3B_fp16.safetensors.
- For the CLIP Model, load umt5_xxl_fp8_e4m3fn_scaled.safetensors.
- For the VAE Model, load wan_2.1_vae.safetensors.
- CLIP Text Encode (The Prompts)
Similar to image generation, these nodes convert your text prompts into numerical representations to guide the video generation. You can modify these if needed.
- Positive Prompt: (Green node) Describe what you want to see in your video.
- Negative Prompt: (Red node) Describe what you want to avoid in your video.
- Empty Latent Video (The Video Canvas)
This node creates a blank, noisy video canvas.
- Action: (Optional) Adjust the width and height to your desired video dimensions.
- Wan2.1 Img2Vid (The Generator)
This is the core node that orchestrates the video generation.
- Connect the inputs: Wire up the MODEL, CLIP, and LORA from the Load LoRA node, the positive and negative prompts, and the IMAGE from your Load Image node.
- Key Settings:
- width/height: Set the resolution of your output video.
- frames: The total number of frames to generate. 24 frames at 8 fps will give you a 3-second video.
- frame_rate: Frames per second (fps) for the final video.
- steps/cfg/sampler_name: These function just like in the KSampler, controlling the quality of each generated frame.
- Create the Video File: Video Combine
This node stitches the generated frames into a playable video file.
- Connect the IMAGE output from the Wan2.1 Img2Vid node to the images input of this node.
- Settings:
- frame_rate: Should match the frame rate set in the previous node.
- format: Choose your output format, such as video/mp4 or image/gif.
- filename_prefix: Set the name for your saved video file.
Once everything is connected and configured, click Queue Prompt on the menu or use the shortcut Ctrl (Cmd) + Enter to execute the video generation. ComfyUI will generate and combine the frames into your final video.
From Image to Video with Wan 2.1
Just as you can create images from text, you can also bring static images to life by turning them into short video clips. This process, often called “Image-to-Video,” uses your initial image as a starting point and animates it based on your text prompts. We’ll use the powerful Wan 2.1 workflow for this.

How to Load the Workflow
The easiest way is to use a pre-built template. Click the folder icon 📁 and look for a template named something like “Wan2.1 Image to Video”. This will set up the necessary nodes.

The Workflow, Step-by-Step
1. Load Your Starting Image
Instead of an Empty Latent Image, this workflow begins with a Load Image node. Use it to upload the image you want to animate.
2. Load the Necessary Models
This workflow is more complex and requires specific models:
- Load Checkpoint: Loads the base Stable Diffusion model.
- Load LoRA: This is crucial. Load the wan2.1_lora_1.1_fp16.safetensors LoRA. LoRAs are small models that modify the output of the main checkpoint, and this one is essential for the video effect.
3. CLIP Text Encode (Prompts for Motion)
Your prompts here will guide the animation. Describe the motion you want to see.
- Positive Prompt: cinematic, a gentle breeze rustles the lion’s mane, slow motion.
- Negative Prompt: static, frozen, still image, shaking.
4. Wan2.1 Img2Vid (The Animator)
This is the core node for this process. It takes your source image, the models, and your motion prompts and generates the animated frames.
5. Video Combine and Save
Just like in the text-to-video workflow, the Video Combine node assembles the frames into a video file, which is then saved and previewed.
6. Generate Your Video
Click Queue Prompt to start the process. The result will be a short video where your initial image is animated according to your instructions.
Advanced Techniques
Image-to-Image (img2img)
This technique, often called img2img, is perfect for transforming a sketch into a detailed image or applying a new style to an existing photo. Instead of starting from pure noise, you use an existing image as a guide.
- Add Load Image and VAE Encode nodes.
- Connect the encoded LATENT to your KSampler, replacing the Empty Latent Image node.
- Adjust Denoise: In the KSampler, lower the denoise value to 0.6-0.8.

Inpainting
Regenerate a specific part of an image.
- Right-click a Load Image node, select Open in MaskEditor, and draw your mask.
- Use a VAE Encode (for Inpainting) node.
- Connect to KSampler. In your positive prompt, describe what you want in the masked area.

Exploring Other Features
- ControlNet: For precise control over poses and composition.
- Loading Workflows from Images: Drag and drop a PNG from ComfyUI onto the canvas to instantly load its workflow.
Conclusion
ComfyUI, available to you via Seeweb, is a powerful framework for exploring AI art. Its node-based interface empowers you to take full control, moving beyond simple prompts to craft truly unique visuals. This tutorial is just the beginning. We encourage you to experiment, explore, and discover new workflows. Happy creating!
Articoli correlati:
- Accelerating LLM Inference with vLLM: A Hands-on Guide
- Getting Started with Ollama: A Hands-on Guide
- Guide: Deploying LLMs with KubeAI on k3s (Ubuntu)
- Complete Guide to Deploying DeepSeek-R1 on AMD MI300X GPUs + Open WebUI: Enterprise AI Solution
- DeepSeek-R1: Performance and Quality Testing on Seeweb CLOUD GPU MI300X
- Come monitorare e graficare i server con Munin


