🤖

自编码器与变分自编码器：无监督学习的利器

📂 ai ⏱ 3 min 526 words

自编码器与变分自编码器：无监督学习的利器

什么是自编码器？

自编码器（Autoencoder）是一种无监督学习模型，通过学习数据的压缩表示来重构输入。它由两部分组成：

编码器（Encoder）：将高维输入压缩为低维潜在表示
解码器（Decoder）：从潜在表示重构原始输入

基础自编码器

结构

import torch
import torch.nn as nn

class Autoencoder(nn.Module):
    def __init__(self, input_dim, hidden_dim, latent_dim):
        super().__init__()
        
        # 编码器
        self.encoder = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, latent_dim)
        )
        
        # 解码器
        self.decoder = nn.Sequential(
            nn.Linear(latent_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, input_dim),
            nn.Sigmoid()
        )
    
    def forward(self, x):
        z = self.encoder(x)
        x_reconstructed = self.decoder(z)
        return x_reconstructed, z

损失函数

自编码器的损失函数是最小化输入和重构输出之间的差异：

def autoencoder_loss(x, x_reconstructed):
    return nn.MSELoss()(x_reconstructed, x)

训练过程

def train_autoencoder(model, dataloader, epochs=100, lr=0.001):
    optimizer = torch.optim.Adam(model.parameters(), lr=lr)
    loss_fn = nn.MSELoss()
    
    for epoch in range(epochs):
        total_loss = 0
        for batch in dataloader:
            x = batch[0]
            
            # 前向传播
            x_reconstructed, z = model(x)
            
            # 计算损失
            loss = loss_fn(x_reconstructed, x)
            
            # 反向传播
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            
            total_loss += loss.item()
        
        if (epoch + 1) % 10 == 0:
            print(f'Epoch [{epoch+1}/{epochs}], Loss: {total_loss/len(dataloader):.4f}')

变分自编码器（VAE）

VAE在自编码器的基础上引入概率思想，使潜在空间具有连续性和可采样性。

核心思想

传统自编码器将输入映射到固定的潜在向量，而VAE将输入映射到潜在空间的概率分布（通常假设为高斯分布）。

数学原理

VAE的损失函数由两部分组成：

重构损失：确保解码器能够从潜在表示重构输入
KL散度：确保潜在分布接近标准正态分布

Loss = Reconstruction Loss + KL Divergence
     = -E[log p(x|z)] + KL(q(z|x) || p(z))

实现

class VAE(nn.Module):
    def __init__(self, input_dim, hidden_dim, latent_dim):
        super().__init__()
        
        # 编码器
        self.encoder = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, hidden_dim),
            nn.ReLU()
        )
        
        # 均值和方差
        self.fc_mu = nn.Linear(hidden_dim, latent_dim)
        self.fc_logvar = nn.Linear(hidden_dim, latent_dim)
        
        # 解码器
        self.decoder = nn.Sequential(
            nn.Linear(latent_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, input_dim),
            nn.Sigmoid()
        )
    
    def encode(self, x):
        h = self.encoder(x)
        mu = self.fc_mu(h)
        logvar = self.fc_logvar(h)
        return mu, logvar
    
    def reparameterize(self, mu, logvar):
        # 重参数化技巧
        std = torch.exp(0.5 * logvar)
        eps = torch.randn_like(std)
        return mu + eps * std
    
    def decode(self, z):
        return self.decoder(z)
    
    def forward(self, x):
        mu, logvar = self.encode(x)
        z = self.reparameterize(mu, logvar)
        x_reconstructed = self.decode(z)
        return x_reconstructed, mu, logvar

def vae_loss(x, x_reconstructed, mu, logvar):
    # 重构损失
    recon_loss = nn.MSELoss()(x_reconstructed, x)
    
    # KL散度
    kl_loss = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
    
    return recon_loss + kl_loss

潜在空间的理解

潜在空间的性质

连续性：相近的潜在向量应该重构出相似的样本
完整性：潜在空间中的每个点都应该对应有效的样本
可解释性：潜在维度可能对应有意义的特征

潜在空间可视化

import matplotlib.pyplot as plt
from sklearn.manifold import TSNE

def visualize_latent_space(model, dataloader, num_samples=1000):
    model.eval()
    latent_vectors = []
    labels = []
    
    with torch.no_grad():
        for batch in dataloader:
            x, y = batch
            _, mu, _ = model(x)
            latent_vectors.append(mu.numpy())
            labels.append(y.numpy())
    
    latent_vectors = np.concatenate(latent_vectors)[:num_samples]
    labels = np.concatenate(labels)[:num_samples]
    
    # 使用t-SNE降维到2D
    tsne = TSNE(n_components=2, random_state=42)
    latent_2d = tsne.fit_transform(latent_vectors)
    
    plt.figure(figsize=(10, 8))
    scatter = plt.scatter(latent_2d[:, 0], latent_2d[:, 1], c=labels, cmap='tab10')
    plt.colorbar(scatter)
    plt.title('Latent Space Visualization')
    plt.show()

VAE的应用

1. 图像生成

从潜在空间采样生成新图像：

def generate_images(model, num_images=16, latent_dim=32):
    model.eval()
    
    with torch.no_grad():
        # 从标准正态分布采样
        z = torch.randn(num_images, latent_dim)
        
        # 解码生成图像
        generated = model.decode(z)
    
    # 可视化
    fig, axes = plt.subplots(4, 4, figsize=(8, 8))
    for i, ax in enumerate(axes.flat):
        ax.imshow(generated[i].reshape(28, 28), cmap='gray')
        ax.axis('off')
    plt.tight_layout()
    plt.show()

2. 图像插值

在潜在空间中进行插值，实现平滑过渡：

def interpolate_images(model, img1, img2, num_steps=10):
    model.eval()
    
    with torch.no_grad():
        # 编码两张图像
        _, _, z1 = model(img1)
        _, _, z2 = model(img2)
        
        # 潜在空间插值
        interpolations = []
        for alpha in np.linspace(0, 1, num_steps):
            z = (1 - alpha) * z1 + alpha * z2
            img = model.decode(z)
            interpolations.append(img)
    
    # 可视化
    fig, axes = plt.subplots(1, num_steps, figsize=(15, 2))
    for i, ax in enumerate(axes):
        ax.imshow(interpolations[i].reshape(28, 28), cmap='gray')
        ax.axis('off')
    plt.tight_layout()
    plt.show()

3. 异常检测

利用重构误差检测异常样本：

def detect_anomalies(model, normal_data, test_data, threshold=0.1):
    model.eval()
    
    with torch.no_grad():
        # 在正常数据上计算重构误差阈值
        normal_recon, _ = model(normal_data)
        normal_errors = torch.mean((normal_data - normal_recon) ** 2, dim=1)
        threshold = torch.mean(normal_errors) + 3 * torch.std(normal_errors)
        
        # 检测异常
        test_recon, _ = model(test_data)
        test_errors = torch.mean((test_data - test_recon) ** 2, dim=1)
        anomalies = test_errors > threshold
    
    return anomalies, test_errors

4. 数据降维

将高维数据压缩到低维潜在空间：

def encode_data(model, data):
    model.eval()
    
    with torch.no_grad():
        mu, logvar = model.encode(data)
    
    return mu  # 使用均值作为降维后的表示

VAE的变体

1. β-VAE

引入超参数β控制KL散度的权重，学习更解耦的潜在表示。

2. Conditional VAE (CVAE)

在编码和解码过程中加入条件信息，实现可控生成。

3. VQ-VAE

使用离散潜在表示，结合自编码器和向量量化。

4. Hierarchical VAE

使用多层潜在变量，建模更复杂的分布。

总结

自编码器和变分自编码器是无监督学习的重要工具。VAE通过引入概率思想，使潜在空间具有连续性和可采样性，为图像生成、异常检测等任务提供了强大的框架。理解VAE的原理对于学习更先进的生成模型（如GAN、扩散模型）至关重要。