diff --git a/source/_posts/Docker-Compose.md b/source/_posts/Docker-Compose.md
index 3384741..9f727d5 100644
--- a/source/_posts/Docker-Compose.md
+++ b/source/_posts/Docker-Compose.md
@@ -58,6 +58,28 @@ sudo ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose
 docker-compose version
 ```
 
+# 命令
+
+### 运行
+
+```shell
+# 默认以所在目录名，为Name -d 为后台运行
+docker compose up -d
+# 指定Name运行
+docker compose --project-name dify-docker up -d
+```
+
+## 停止并清理容器
+
+默认情况下不会清理挂载卷，除非额外指定 -v
+
+```shell
+# 进入到之前启动容器的所在目录
+docker compose down
+# 指定Name
+docker compose --project-name dify-docker down
+```
+
 
 
 # 集群搭建
diff --git a/source/_posts/Docker.md b/source/_posts/Docker.md
index 91feb9a..2522fef 100644
--- a/source/_posts/Docker.md
+++ b/source/_posts/Docker.md
@@ -23,6 +23,8 @@ CentOS占CPU               Docker CPU引擎占用低
 
 # Docker 安装
 
+## CentOS 7
+
 ```shell
 curl -fsSL https://get.docker.com | bash -s docker --mirror Aliyun
 # 另一种方式 
@@ -33,8 +35,11 @@ yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/d
 yum list docker-ce --showduplicates | sort -r
 # 安装
 yum install docker-ce-18.03.1.ce
+```
 
-# almalinux centos8
+## AlmaLinux CentOS 8
+
+```shell
 dnf clean all
 dnf update
 # 添加必要的Docker存储库
@@ -47,6 +52,86 @@ dnf install docker-ce-3:24.0.7-1.el9 -y
 vim /etc/docker/daemon.json 
 ```
 
+## Ubuntu
+
+```shell
+# 安装前先卸载操作系统默认安装的docker，
+sudo apt-get remove docker docker-engine docker.io containerd runc
+
+# 安装必要支持
+sudo apt install apt-transport-https ca-certificates curl software-properties-common gnupg lsb-release
+
+# 阿里源（推荐使用阿里的gpg KEY）
+curl -fsSL https://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
+
+# 添加 apt 源:
+# 阿里apt源
+echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://mirrors.aliyun.com/docker-ce/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
+
+# 更新源
+sudo apt update
+sudo apt-get update
+
+# 安装最新版本的Docker
+sudo apt install docker-ce docker-ce-cli containerd.io
+
+# 等待安装完成
+
+# 查看Docker版本
+sudo docker version
+
+# 查看Docker运行状态
+sudo systemctl status docker
+
+# 可选安装Docker 命令补全工具(bash shell)
+sudo apt-get install bash-completion
+
+sudo curl -L https://raw.githubusercontent.com/docker/docker-ce/master/components/cli/contrib/completion/bash/docker -o /etc/bash_completion.d/docker.sh
+
+source /etc/bash_completion.d/docker.sh
+
+```
+
+安装docker后,执行docker ps命令时提示
+permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/containers/json": dial unix /var/run/docker.sock: connect: permission denied
+
+首先查看当前存在的用户组中是否存在 `docker`用户组
+
+```bash
+ cat /etc/group | grep docker
+ 
+docker:x:988:
+```
+
+若不存在，则需要使用以下命令添加`docker`用户组
+
+```shell
+sudo groupadd docker
+```
+
+然后执行以下命令将当前用户加入到`docker`用户组中
+
+```shell
+sudo gpasswd -a $USER docker
+```
+
+更新用户组
+
+```shell
+newgrp docker
+```
+
+也可以直接编辑当前Shell环境变量
+
+```shell
+vim ~/.zshrc
+
+# 末尾添加 groupadd -f docker
+groupadd -f docker
+```
+
+
+
 1、镜像：image。一个镜像代表一个软件。如：redis镜像，mysql镜像，tomcat镜像。。
 	特点：只读
 2、容器：container。一个镜像只要一启动，称之为启动了一个容器。
diff --git a/source/_posts/Git.md b/source/_posts/Git.md
index 143e1eb..3bb199f 100644
--- a/source/_posts/Git.md
+++ b/source/_posts/Git.md
@@ -90,6 +90,14 @@ git config --global credential.helper wincred
 
 ## 储藏 (Stash)
 
+```shell
+git stash
+
+git stash list
+```
+
+
+
 ## SSH提交
 
 1. SSH 秘钥默认储存在账户的主目录下的 ~/.ssh 目录
@@ -144,6 +152,13 @@ git config --global credential.helper wincred
 ssh-keygen -c -C "new_comment" -f ssh_key_path
 ```
 
+## 还原变更
+
+```shell
+# 指定文件或目录
+git checkout -- <filename or directory>
+```
+
 
 
 # github
diff --git a/source/_posts/Python.md b/source/_posts/Python.md
index e84ca54..fd7e759 100644
--- a/source/_posts/Python.md
+++ b/source/_posts/Python.md
@@ -4,5 +4,36 @@ date: 2025-03-10 14:26:30
 tags:
 ---
 
-# Python
+# pip
 
+## 查看版本
+
+```shell
+pip --version
+```
+
+## 使用Pip安装Github上的软件包
+
+接下来，使用以下命令来安装Github上的软件包：
+
+```python
+pip install git+https://github.com/username/repository.git
+```
+
+## 升级和卸载软件包
+
+要升级软件包，可以使用以下命令：
+
+```python
+pip install --upgrade package_name
+```
+
+其中，`package_name`是你要升级的软件包的名称。Pip会自动检查版本并安装最新的软件包。
+
+如果你想卸载已安装的软件包，可以使用以下命令：
+
+```python
+pip uninstall package_name
+```
+
+Pip会询问你是否确定卸载软件包，并删除相关的文件。
diff --git a/source/_posts/lucky.md b/source/_posts/lucky.md
new file mode 100644
index 0000000..2d774ae
--- /dev/null
+++ b/source/_posts/lucky.md
@@ -0,0 +1,5 @@
+---
+title: lucky
+date: 2025-04-07 16:04:58
+tags:
+---
diff --git a/source/_posts/ubuntu.md b/source/_posts/ubuntu.md
new file mode 100644
index 0000000..1328533
--- /dev/null
+++ b/source/_posts/ubuntu.md
@@ -0,0 +1,178 @@
+---
+title: ubuntu
+date: 2025-05-09 09:44:01
+tags:
+---
+
+# Server
+
+## 安装
+
+默认选中「Try or Install Ubuntu Server」安装选项，回车（或等待 30 秒后），等待系统镜像自检并进行安装初始化。
+
+![image-20250509094634889](http://minio.wenyongdalucky.club:9000/hexo/image-20250509094634889.png)
+
+### 选择语言：English
+
+![image-20250509100201646](http://minio.wenyongdalucky.club:9000/hexo/image-20250509100201646.png)
+
+### 键盘默认：English
+
+![image-20250509100212670](http://minio.wenyongdalucky.club:9000/hexo/image-20250509100212670.png)
+
+### 安装类型：Ubuntu Server
+
+选择默认第一个（会自带一些组件，方便使用）
+
+![image-20250509100247973](http://minio.wenyongdalucky.club:9000/hexo/image-20250509100247973.png)
+
+### 网络配置
+
+使用 DHCP 或者 静态IP (建议这里设置好 静态IP，如果选择 DHCP，则在此界面直接选择Done 后即可)
+
+![image-20250509100604701](http://minio.wenyongdalucky.club:9000/hexo/image-20250509100604701.png)
+
+静态IP 选择 Edit IPv4
+
+![image-20250509100656239](http://minio.wenyongdalucky.club:9000/hexo/image-20250509100656239.png)
+
+然后选择 Manual
+
+![image-20250509100804038](http://minio.wenyongdalucky.club:9000/hexo/image-20250509100804038.png)
+
+![image-20250509102609874](http://minio.wenyongdalucky.club:9000/hexo/image-20250509102609874.png)
+
+### 代理配置
+
+**Configure proxy配置页面的Proxy address无需配置**
+
+![image-20250509102734539](http://minio.wenyongdalucky.club:9000/hexo/image-20250509102734539.png)
+
+### 镜像源配置
+
+默认清华源
+
+![image-20250509102858753](http://minio.wenyongdalucky.club:9000/hexo/image-20250509102858753.png)
+
+### 安装磁盘配置
+
+**选择安装磁盘，直接回车默认自动分配，需要手动分区的话选择 [custom storage layout]**
+
+![image-20250509111350269](http://minio.wenyongdalucky.club:9000/hexo/image-20250509111350269.png)
+
+选择 **custom storage layout**
+
+![image-20250509112338500](http://minio.wenyongdalucky.club:9000/hexo/image-20250509112338500.png)
+
+![image-20250509112354306](http://minio.wenyongdalucky.club:9000/hexo/image-20250509112354306.png)
+
+首先分配swap分区：一般基于物理内存的 2-4倍
+
+![image-20250509112453286](http://minio.wenyongdalucky.club:9000/hexo/image-20250509112453286.png)
+
+/boot 分区，一般2G足以
+
+/ 根分区，分配剩余空间
+
+![image-20250509112822681](http://minio.wenyongdalucky.club:9000/hexo/image-20250509112822681.png)
+
+### 设置计算机名及用户名
+
+![image-20250509113002925](http://minio.wenyongdalucky.club:9000/hexo/image-20250509113002925.png)
+
+### 是否升级 Ubuntu Pro
+
+直接默认跳过即可
+![image-20250509121748189](http://minio.wenyongdalucky.club:9000/hexo/image-20250509121748189.png)
+
+### 安装 OpenSSH 服务
+
+![image-20250509121806128](http://minio.wenyongdalucky.club:9000/hexo/image-20250509121806128.png)
+
+### 选择预置环境
+
+按需选取，不需要则直接选择 Done 回车继续
+
+![image-20250509121923077](http://minio.wenyongdalucky.club:9000/hexo/image-20250509121923077.png)
+
+安装系统中
+
+![image-20250509122057921](http://minio.wenyongdalucky.club:9000/hexo/image-20250509122057921.png)
+
+安装完成后重启即可
+
+![image-20250509122413007](http://minio.wenyongdalucky.club:9000/hexo/image-20250509122413007.png)
+
+重启完成，进入系统
+
+![image-20250509123500684](http://minio.wenyongdalucky.club:9000/hexo/image-20250509123500684.png)
+
+## 配置网络
+
+![image-20250509124052044](http://minio.wenyongdalucky.club:9000/hexo/image-20250509124052044.png)
+
+```shell
+cd /etc/netplan
+ls
+# 编辑当前目录下以yaml扩展名的网卡配置文件
+sudo vim 50-cloud-init.yaml
+```
+
+文件内容
+
+```shell
+network:
+  version: 2
+  ethernets:
+    enp0s3:
+      dhcp4: true
+```
+
+在VirtualBox中工具->网络中 增加仅主机(Host-Only)网络
+
+![image-20250509124733922](http://minio.wenyongdalucky.club:9000/hexo/image-20250509124733922.png)
+
+网卡如果要是DHCP就选自动配置网卡，否则手动分配就选手动配置网卡
+
+如果选DHCP，还需要启动服务器
+
+![image-20250509124838460](http://minio.wenyongdalucky.club:9000/hexo/image-20250509124838460.png)
+
+配置好后，在对应虚拟机中，添加好网卡，连接方式选择仅主机(Host-Only)网络，名称选择刚刚在工具中配置的
+
+![image-20250509125003457](http://minio.wenyongdalucky.club:9000/hexo/image-20250509125003457.png)
+
+以上修改需要先重启虚拟机
+
+查看是否生效，需要执行`ip a`命令
+
+看是否有网卡名称为`enp0s8`
+
+紧接着回到刚刚在`/etc/netplan`目录，下编辑的网卡配置文件`50-cloud-init.yaml`
+
+增加`enp0s8`，若是自动分配网络，则直接`dhcp4: true`即可，否则按一下分配`addressed`，手动分配一个根据子网的ipv4地址，并将`dhcp4`设置为`false`
+
+```shell
+network:
+  version: 2
+  ethernets:
+    enp0s3:
+      dhcp4: true
+    enp0s8:
+      addresses: [192.168.56.35/24]
+      dhcp4: no
+
+```
+
+`:wq`保存后，执行一下命令
+
+```shell
+sudo netplan generate
+sudo netplan apply
+```
+
+若不报错，则修改成功，再执行`ip a`查看网卡信息
+
+![image-20250509125542444](http://minio.wenyongdalucky.club:9000/hexo/image-20250509125542444.png)
+
+ip地址已经生效，可以在主机里 ping 一下
diff --git a/source/_posts/大模型.md b/source/_posts/大模型.md
index e28495f..c67cf30 100644
--- a/source/_posts/大模型.md
+++ b/source/_posts/大模型.md
@@ -240,6 +240,48 @@ modelscope download --model deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
 modelscope download --model deepseek-ai/DeepSeek-R1-Distill-Qwen-7B README.md --local_dir ./dir
 ```
 
+#### 指定下载单个文件(以'tokenizer.json'文件为例)
+
+```
+    modelscope download --model 'Qwen/Qwen2-7b' tokenizer.json
+```
+
+#### 指定下载多个个文件
+
+```
+    modelscope download --model 'Qwen/Qwen2-7b' tokenizer.json config.json
+```
+
+#### 指定下载某些文件
+
+```
+    modelscope download --model 'Qwen/Qwen2-7b' --include '*.safetensors'
+```
+
+#### 过滤指定文件
+
+```
+    modelscope download --model 'Qwen/Qwen2-7b' --exclude '*.safetensors'
+```
+
+#### 指定下载cache_dir
+
+```
+    modelscope download --model 'Qwen/Qwen2-7b' --include '*.json' --cache_dir './cache_dir'
+```
+
+模型文件将被下载到`'cache_dir/Qwen/Qwen2-7b'`。
+
+#### 指定下载local_dir
+
+```
+    modelscope download --model 'Qwen/Qwen2-7b' --include '*.json' --local_dir './local_dir'
+```
+
+模型文件将被下载到`'./local_dir'`。
+
+如果`cache_dir`和`local_dir`参数同时被指定，`local_dir`优先级高，`cache_dir`将被忽略。
+
 # Anaconda
 
 > 
@@ -391,6 +433,14 @@ jupyter notebook --allow-root
 conda create -n vLLM python=3.12
 conda activate vLLM
 pip install vLLM
+
+# FlashInfer is optional but required for specific functionalities such as sliding window attention with Gemma 2.
+# For CUDA 12.4 & torch 2.4 to support sliding window attention for gemma 2 and llama 3.1 style rope
+pip install flashinfer -i https://flashinfer.ai/whl/cu124/torch2.4
+# For other CUDA & torch versions, please check https://docs.flashinfer.ai/installation.html
+# 也可下载到本地
+wget https://github.com/flashinfer-ai/flashinfer/releases/download/v0.2.5/flashinfer_python-0.2.5+cu124torch2.6-cp38-abi3-linux_x86_64.whl#sha256=43d767b912c0c43a04be99595e0123eab9385fc72530a2874b5fb08e3145c0be
+pip install flashinfer_python-0.2.5+cu124torch2.6-cp38-abi3-linux_x86_64.whl --no-deps
 ```
 
 ## 部署
@@ -408,6 +458,28 @@ CUDA_VISIBLE_DEVICES=0 vllm serve /mnt/e/modelscope/deepseek-ai/DeepSeek-R1-Dist
 - **执行启动命令:** 在终端或命令提示符中执行上述 `vllm serve` 命令。
 - **注意 GPU 显存:** 启动 vLLM 服务会占用 GPU 显存。请确保您的 GPU 显存足够运行模型。如果显存不足，可能会导致启动失败或运行缓慢。您可以尝试减小 `--max-model-len` 参数或使用更小规模的模型。
 
+### Qwen/Qwen3-4B
+
+```shell
+vllm serve ~/modelscope/Qwen/Qwen3-4B --api-key token-abc123 --enable-reasoning --reasoning-parser deepseek_r1 --max_model_len=2048 --gpu_memory_utilization=0.85 
+```
+
+#### supervisor
+
+```conf
+sudo vim vLLM-Qwen-Qwen3-4B.conf
+[program:vLLM-Qwen-Qwen3-4B.conf]
+command=zsh -c "source /home/user/miniconda3/bin/activate && source activate vLLM && vllm serve ~/modelscope/Qwen/Qwen3-4B --api-key token-abc123 --enable-reasoning --reasoning-parser deepseek_r1 --max_model_len=2048 --gpu_memory_utilization=0.85"
+user=user
+autostart=true
+autorestart=true
+stderr_logfile=/var/log/supervisor/vLLM-Qwen-Qwen3-4B/err.log
+stdout_logfile=/var/log/supervisor/vLLM-Qwen-Qwen3-4B/out.log
+stopasgroup=true
+```
+
+
+
 # LLama.cpp
 
 > `llama.cpp`是一个基于纯`C/C++`实现的高性能大语言模型推理引擎，专为优化本地及云端部署而设计。其核心目标在于通过底层硬件加速和量化技术，实现在多样化硬件平台上的高效推理，同时保持低资源占用与易用性。
@@ -526,6 +598,12 @@ python convert_hf_to_gguf.py DeepSeek-R1-Distill-Qwen-7B/
 
 然后打开浏览器，输入地址`http://127.0.0.1:8088`就可以在网页上与模型进行交互了，非常方便！
 
+# Ktransformers
+
+安装
+
+
+
 # LLaMA-Factory
 
 > 可参考文章：[DeepSeek-R1-7B-Distill模型微调全过程记录，LLaMA_Factory训练自己的数据集，合并lora微调模型并量化为gguf，接入微信实现自动对话回复_微信_qq_53091149-DeepSeek技术社区](https://deepseek.csdn.net/67b84a893c9cd21f4cb9aab6.html#devmenu2)
@@ -556,3 +634,180 @@ python src/webui.py
 合并
 
 ![image-20250320152645802](https://markdownhexo.oss-cn-hangzhou.aliyuncs.com/img/image-20250320152645802.png)
+
+# AutoAWQ量化
+
+## 安装
+
+```shell
+conda create -n AutoAWQ python=3.12
+conda activate AutoAWQ
+pip install torch
+pip install autoawq
+```
+
+## 脚本
+
+默认AutoAWQ会从Huggingface上下载数据集mit-han-lab/pile-val-backup，会因为网络问题失败
+
+需事先手动下载，通过modelscope
+
+```shell
+modelscope download --dataset mit-han-lab/pile-val-backup  --local_dir ~/modelscope/mit-han-lab/pile-val-backup
+```
+
+以`Qwen/Qwen3-4B`模型为例
+
+```shell
+from awq import AutoAWQForCausalLM
+from transformers import AutoTokenizer
+from datasets import load_dataset
+
+# 加载本地数据集
+def load_calib_data():
+    data=load_dataset('/home/user/modelscope/mit-han-lab/pile-val-backup', split="validation")
+    return [text for text in data["text"] if text.strip() != '' and len(text.split(' ')) > 20]
+
+model_path = '/home/user/modelscope/Qwen/Qwen3-4B'
+quant_path = '/home/user/modelscope/Qwen/Qwen3-4B-awq'
+quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }
+
+
+# Load model
+# 加载模型
+
+
+model = AutoAWQForCausalLM.from_pretrained(
+    model_path, **{"low_cpu_mem_usage": True, "use_cache": False}
+)
+tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
+
+
+# Quantize
+# 量化
+
+calib_data=load_calib_data()
+model.quantize(tokenizer, quant_config=quant_config, calib_data=calib_data)
+
+
+# Save quantized model
+# 保存量化模型
+
+
+model.save_quantized(quant_path)
+tokenizer.save_pretrained(quant_path)
+
+
+print(f'Model is quantized and saved at "{quant_path}"')
+```
+
+
+
+# 大模型Dense、MoE 与 Hybrid-MoE 架构的比较
+
+在大模型架构设计中，Dense（全连接）、MoE（混合专家）和Hybrid-MoE（混合式MoE）是三种主流的参数组织方式，它们在模型容量、计算效率和应用场景上存在显著差异。以下从核心原理、技术特点、优缺点及适用场景进行系统对比：
+
+------
+
+## **1. 核心原理对比**
+
+| **架构类型**   | **核心思想**                                                 | **典型模型**               |
+| -------------- | ------------------------------------------------------------ | -------------------------- |
+| **Dense**      | 所有参数对所有输入生效，每层神经元全连接，统一处理所有输入特征。 | GPT-3、BERT、LLAMA         |
+| **MoE**        | 将模型划分为多个“专家”（子网络），每个输入仅激活部分专家，通过路由机制动态分配任务。 | Switch Transformer、GShard |
+| **Hybrid-MoE** | 混合Dense和MoE层：部分层全连接，部分层采用MoE结构，平衡计算效率和模型容量。 | DeepSeek-MoE、Google GLaM  |
+
+------
+
+## **2. 技术特点与性能对比**
+
+| **维度**       | **Dense**                                           | **MoE**                                                      | **Hybrid-MoE**                            |
+| -------------- | --------------------------------------------------- | ------------------------------------------------------------ | ----------------------------------------- |
+| **参数规模**   | 总参数量=激活参数量，随层数线性增长。               | 总参数量高（专家数×专家规模），但激活参数量低（仅激活部分专家）。 | 介于两者之间，MoE层数可控。               |
+| **计算效率**   | 计算成本高（FLOPs与参数量正相关），适合小规模模型。 | 相同参数量下，FLOPs显著降低（仅激活部分专家）。              | 通过调整MoE层比例，灵活平衡计算开销。     |
+| **训练稳定性** | 收敛稳定，梯度传播路径简单。                        | 路由机制易导致专家负载不均衡，需复杂正则化。                 | 稳定性优于纯MoE，但仍需路由优化。         |
+| **扩展性**     | 参数规模受硬件限制，千亿级后成本陡增。              | 可扩展至万亿参数（如GShard-1.6T），适合超大规模模型。        | 通过局部MoE化实现高效扩展，适配中等规模。 |
+| **显存占用**   | 高（需存储全部参数梯度）。                          | 显存需求更高（专家参数独立存储）。                           | 显存介于两者之间，取决于MoE层占比。       |
+| **应用场景**   | 通用任务、资源受限场景。                            | 超大规模预训练、多任务学习。                                 | 需平衡性能与成本的工业级应用。            |
+
+------
+
+## **3. 优缺点对比**
+
+### **Dense架构**
+
+- **优点**：
+  - 结构简单，训练稳定性高。
+  - 参数利用率最大化，适合小规模高精度任务。
+- **缺点**：
+  - 计算成本随参数量指数级增长，难以扩展至超大规模。
+  - 显存占用高，限制单卡可训练模型规模。
+
+### **MoE架构**
+
+- **优点**：
+  - 计算效率高，相同FLOPs下模型容量更大。
+  - 支持万亿级参数扩展，适合分布式训练。
+- **缺点**：
+  - 路由机制复杂，易出现专家“坍缩”（部分专家未被激活）。
+  - 显存和通信开销大，需定制化负载均衡策略。
+
+### **Hybrid-MoE架构**
+
+- **优点**：
+  - 灵活性高，可通过调整MoE层位置平衡性能与成本。
+  - 保留关键层的全连接特性，提升任务特定性能。
+- **缺点**：
+  - 需精心设计MoE层分布，调参成本较高。
+  - 仍面临部分MoE的稳定性挑战。
+
+------
+
+## **4. 典型应用场景**
+
+| **架构**       | **适用场景**                                                 |
+| -------------- | ------------------------------------------------------------ |
+| **Dense**      | - 中小规模模型（<100B参数） - 对训练稳定性要求高的任务（如对话生成） - 边缘设备推理 |
+| **MoE**        | - 超大规模预训练（>500B参数） - 多任务/多模态学习 - 云端高性能计算集群 |
+| **Hybrid-MoE** | - 中等规模模型（100B-500B参数） - 需兼顾通用性与效率的工业场景 - 长文本处理任务 |
+
+------
+
+## **5. 技术选型建议**
+
+- **选择Dense的条件**：
+  - 资源有限（单卡训练/推理）。
+  - 任务单一，无需极高模型容量。
+  - 追求极简架构和稳定收敛。
+- **选择MoE的条件**：
+  - 追求极致模型性能（如AGI探索）。
+  - 拥有大规模计算集群（千卡级）。
+  - 多任务/多模态需求显著。
+- **选择Hybrid-MoE的条件**：
+  - 需平衡模型容量与计算成本。
+  - 部分任务依赖全连接层的强表征能力（如逻辑推理）。
+  - 希望渐进式扩展模型规模。
+
+------
+
+## **6. 未来发展方向**
+
+1. **Dense架构优化**：
+   - 参数高效微调（LoRA、Adapter）。
+   - 动态稀疏激活（如微软的DeepSpeed-MoE）。
+2. **MoE架构改进**：
+   - 更智能的路由机制（如基于强化学习）。
+   - 专家共享与分层MoE设计。
+3. **Hybrid-MoE创新**：
+   - 自动化MoE层分布搜索（NAS技术）。
+   - 异构专家设计（不同专家结构适配不同任务）。
+
+------
+
+## **总结**
+
+- **Dense**：简单可靠，适合资源受限场景，但扩展性差。
+- **MoE**：计算高效，扩展性强，但工程复杂度高。
+- **Hybrid-MoE**：折中方案，平衡性能与成本，需精细调优。
+
+实际选型需结合**任务需求**、**硬件资源**和**工程能力**综合评估。对于大多数企业级应用，Hybrid-MoE可能是当前的最优解，而科研前沿更倾向于探索纯MoE的极限能力。
diff --git a/source/_posts/大模型.txt b/source/_posts/大模型.txt
new file mode 100644
index 0000000..6204741
--- /dev/null
+++ b/source/_posts/大模型.txt
@@ -0,0 +1,811 @@
+----
+
+h2. title: 大模型
+date: 2025-02-18 10:06:57
+tags:
+
+h1. Ollama
+
+h2. 1. 安装
+
+首先需要下载并安装Ollama，这是运行模型的基础环境。
+
+!https://markdownhexo.oss-cn-hangzhou.aliyuncs.com/img/image-20250218102658870.png!
+
+h2. 2. 下载模型
+
+打开命令行终端，根据需要运行以下命令之一来下载对应版本的模型：
+
+!https://markdownhexo.oss-cn-hangzhou.aliyuncs.com/img/image-20250218104847668.png!
+
+以DeepSeek为例：
+
+7B 版本（推荐显存 8G）:
+
+{code:title=none|language=bash|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+ollama pull deepseek-coder:7b
+{code}
+
+8B 版本（推荐显存 8G）:
+
+{code:title=none|language=bash|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+ollama run huihui_ai/deepseek-r1-abliterated:8b
+{code}
+
+14B 版本（推荐显存 12G）:
+
+{code:title=none|language=bash|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+ollama run huihui_ai/deepseek-r1-abliterated:14b
+{code}
+
+32B 版本（推荐显存 32G）:
+
+{code:title=none|language=bash|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+ollama run huihui_ai/deepseek-r1-abliterated:32b
+{code}
+
+70B 版本（需要高端显卡支持）:
+
+{code:title=none|language=bash|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+ollama run huihui_ai/deepseek-r1-abliterated:70b
+{code}
+
+h2. 3. Ollama 常用命令
+
+在使用 Ollama 时，以下是一些常用的命令操作：
+
+{code:title=none|language=bash|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+# 启动 Ollama 服务
+ollama serve
+
+# 从 Modelfile 创建模型
+ollama create <模型名称>
+
+# 显示模型信息
+ollama show <模型名称>
+
+# 运行模型
+ollama run <模型名称>
+
+# 停止运行中的模型
+ollama stop <模型名称>
+
+# 从仓库拉取模型
+ollama pull <模型名称>
+
+# 推送模型到仓库
+ollama push <模型名称>
+
+# 列出所有已安装的模型
+ollama list
+
+# 列出正在运行的模型
+ollama ps
+
+# 复制模型
+ollama cp <源模型> <目标模型>
+
+# 删除模型
+ollama rm <模型名称>
+
+# 显示模型文件
+ollama show --modelfile <模型名称>
+{code}
+
+h2. 4. Ollama模型存储目录
+
+* macOS: {{&#126;&#47;&#46;ollama&#47;models}}
+* Linux: {{&#47;usr&#47;share&#47;ollama&#47;&#46;ollama&#47;models}}
+* Windows: {{C&#58;&#92;Users&#92;&#37;username&#37;&#92;&#46;ollama&#92;models}}
+
+h3. 如何将它们设置为不同的位置？
+
+如果需要使用不同的目录，可以将环境变量 {{OLLAMA&#95;MODELS}} 设置为你选择的目录。
+
+{quote}
+注意：在 Linux 上使用标准安装程序时，{{ollama}} 用户需要对指定目录有读写权限。要将目录分配给 {{ollama}} 用户，请运行 {{sudo chown &#45;R ollama&#58;ollama &lt;directory&gt;}}.
+{quote}
+
+请参考[上面的部分|https://ollama.readthedocs.io/faq/#how-do-i-configure-ollama-server]了解如何在你的平台上设置环境变量。
+
+!https://markdownhexo.oss-cn-hangzhou.aliyuncs.com/img/image-20250218132430850.png!
+
+h2. 5. WSL中Ollama使用Windows中的
+
+{code:title=none|language=bash|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+# 编辑环境变量
+vim /etc/profile
+
+# 文件末尾添加
+export PATH="$PATH:/mnt/c/Program Files/Ollama"
+alias ollama='ollama.exe'
+{code}
+
+h1. nvidia
+
+h2. cuda-toolkit
+
+h3. WSL安装
+
+# 下载 *CUDA Toolkit Installer*
+
+{code:title=none|language=bash|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+wget https://developer.download.nvidia.com/compute/cuda/12.8.1/local_installers/cuda-repo-rhel9-12-8-local-12.8.1_570.124.06-1.x86_64.rpm
+sudo rpm -i cuda-repo-rhel9-12-8-local-12.8.1_570.124.06-1.x86_64.rpm
+sudo dnf clean all
+sudo dnf -y install cuda-toolkit-12-8
+{code}
+# 配置环境变量
+
+{code:title=none|language=bash|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+vim ~/.bashrc
+
+## 文件末尾添加
+## cuda 10.2
+export CUDA_HOME=/usr/local/cuda
+export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
+export PATH=$PATH:$CUDA_HOME/bin
+{code}
+
+h2. nvidia-smi
+
+{quote}
+nvidia-smi是nvidia 的系统管理界面 ，其中smi是System management interface的缩写，它可以收集各种级别的信息，查看显存使用情况。此外, 可以启用和禁用 GPU 配置选项 (如 ECC 内存功能)。
+{quote}
+
+{code:title=none|language=bash|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+nvidia-smi
+
++-----------------------------------------------------------------------------------------+
+| NVIDIA-SMI 570.86.09              Driver Version: 571.96         CUDA Version: 12.8     |
+|-----------------------------------------+------------------------+----------------------+
+| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
+| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
+|                                         |                        |               MIG M. |
+|=========================================+========================+======================|
+|   0  NVIDIA RTX 4000 Ada Gene...    On  |   00000000:01:00.0 Off |                  Off |
+| N/A   50C    P8              7W /   85W |    4970MiB /  12282MiB |      0%      Default |
+|                                         |                        |                  N/A |
++-----------------------------------------+------------------------+----------------------+
+
++-----------------------------------------------------------------------------------------+
+| Processes:                                                                              |
+|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
+|        ID   ID                                                               Usage      |
+|=========================================================================================|
+|    0   N/A  N/A           16221      C   /python3.12                           N/A      |
++-----------------------------------------------------------------------------------------+
+
+{code}
+
+解释相关参数含义：
+
+GPU：本机中的GPU编号
+
+Name：GPU 类型
+
+Persistence-M：
+
+Fan：风扇转速
+
+Temp：温度，单位摄氏度
+
+Perf：表征性能状态，从P0到P12，P0表示最大性能，P12表示状态最小性能
+
+Pwr:Usage/Cap：能耗表示
+
+Bus-Id：涉及GPU总线的相关信息；
+
+Disp.A：Display Active，表示GPU的显示是否初始化
+
+Memory-Usage：显存使用率
+
+Volatile GPU-Util：浮动的GPU利用率
+
+Uncorr. ECC：关于ECC的东西
+
+Compute M.：计算模式
+
+Processes 显示每块GPU上每个进程所使用的显存情况。
+
+h3. 持续监控
+
+{code:title=none|language=bash|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+# 使用 watch 命令，它可以定时执行指定的命令并刷新输出。例如，每隔 1 秒刷新一次 GPU 状态，可以使用以下命令
+watch -n 1 nvidia-smi
+{code}
+
+h2. nvidia-smi -L
+
+{code:title=none|language=bash|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+#  列出所有可用的 NVIDIA 设备
+nvidia-smi -L
+GPU 0: NVIDIA RTX 4000 Ada Generation Laptop GPU (UUID: GPU-9856f99a-c32c-fe63-b2ad-7bdee2b12291)
+{code}
+
+h1. ModelScope
+
+h2. 模型下载
+
+h3. 安装
+
+{code:title=none|language=bash|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+pip install modelscope
+{code}
+
+h3. 命令行下载
+
+{code:title=none|language=bash|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+# 下载完整模型库
+modelscope download --model deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
+# 下载单个文件到指定本地文件夹（以下载README.md到当前路径下“dir”目录为例）
+modelscope download --model deepseek-ai/DeepSeek-R1-Distill-Qwen-7B README.md --local_dir ./dir
+{code}
+
+h4. 指定下载单个文件(以&#39;tokenizer.json&#39;文件为例)
+
+{code:title=none|language=none|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+    modelscope download --model 'Qwen/Qwen2-7b' tokenizer.json
+{code}
+
+h4. 指定下载多个个文件
+
+{code:title=none|language=none|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+    modelscope download --model 'Qwen/Qwen2-7b' tokenizer.json config.json
+{code}
+
+h4. 指定下载某些文件
+
+{code:title=none|language=none|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+    modelscope download --model 'Qwen/Qwen2-7b' --include '*.safetensors'
+{code}
+
+h4. 过滤指定文件
+
+{code:title=none|language=none|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+    modelscope download --model 'Qwen/Qwen2-7b' --exclude '*.safetensors'
+{code}
+
+h4. 指定下载cache_dir
+
+{code:title=none|language=none|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+    modelscope download --model 'Qwen/Qwen2-7b' --include '*.json' --cache_dir './cache_dir'
+{code}
+
+模型文件将被下载到{{&#39;cache&#95;dir&#47;Qwen&#47;Qwen2&#45;7b&#39;}}。
+
+h4. 指定下载local_dir
+
+{code:title=none|language=none|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+    modelscope download --model 'Qwen/Qwen2-7b' --include '*.json' --local_dir './local_dir'
+{code}
+
+模型文件将被下载到{{&#39;&#46;&#47;local&#95;dir&#39;}}。
+
+如果{{cache&#95;dir}}和{{local&#95;dir}}参数同时被指定，{{local&#95;dir}}优先级高，{{cache&#95;dir}}将被忽略。
+
+h1. Anaconda
+
+{quote}
+
+{quote}
+
+h2. 常用命令
+
+h3. 管理环境
+
+# 列出所有的环境
+
+{code:title=none|language=bash|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+conda env list
+{code}
+# 查看conda下的包
+
+{code:title=none|language=|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+conda list
+{code}
+# 创建环境
+
+{code:title=none|language=bash|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+conda create -n env-name [list of package]
+{code}
+
+-n env-name 是设置新建环境的名字，list of package 是可选项，选择要为该环境安装的包
+如果我们没有指定安装python的版本，conda会安装我们最初安装conda所装的那个版本的python
+若创建特定python版本的包环境，需键入
+
+{code:title=none|language=bash|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+conda create -n env-name python=3.6
+{code}
+# 激活环境
+
+{code:title=none|language=bash|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+conda activate env-name
+{code}
+
+切换到base环境
+
+{code:title=none|language=bash|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+conda activate base
+{code}
+# 删除环境
+执行以下命令可以将该指定虚拟环境及其中所安装的包都删除。
+
+{code:title=none|language=bash|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+conda remove --name env_name --all
+{code}
+
+如果只删除虚拟环境中的某个或者某些包则是：
+
+{code:title=none|language=bash|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+conda remove --name env_name  package_name
+{code}
+
+h2. 问题
+
+# conda激活[虚拟环境|https://so.csdn.net/so/search?q=虚拟环境&spm=1001.2101.3001.7020]，只显示环境名称，不再显示用户名和当前文件夹
+
+{code:title=none|language=bash|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+##在个人环境下修改
+conda activate gatkenv
+
+conda env config vars set PS1='($CONDA_DEFAULT_ENV)[\u@\h \W]$'
+##重启环境就ok了
+
+conda deactivate
+
+conda activate gatkenv
+
+##在所有的虚拟环境下修改，这个命令的意思是在~/.condarc下添加一行
+
+conda config --set env_prompt "({default_env})[\u@\h \W]$"
+
+##取消设置
+
+conda config --remove-key env_prompt
+{code}
+# conda 安装后没有将conda置为默认
+
+{code:title=none|language=bash|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+Do you wish to update your shell profile to automatically initialize conda?
+This will activate conda on startup and change the command prompt when activated.
+If you'd prefer that conda's base environment not be activated on startup,
+   run the following command when conda is activated:
+
+conda config --set auto_activate_base false
+
+You can undo this by running `conda init --reverse $SHELL`? [yes|no]
+[no] >>>
+
+You have chosen to not have conda modify your shell scripts at all.
+To activate conda's base environment in your current shell session:
+
+eval "$(/home/user/miniconda3/bin/conda shell.YOUR_SHELL_NAME hook)"
+
+To install conda's shell functions for easier access, first activate, then:
+
+conda init
+
+Thank you for installing Miniconda3!
+
+{code}
+
+则执行以下shell
+
+{code:title=none|language=bash|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+## eval "$(/home/user/miniconda3/bin/conda shell.SHELL_NAME hook)"
+## bash 下
+eval "$(/home/user/miniconda3/bin/conda shell.bash hook)"
+conda init
+## zsh 下
+eval "$(/home/user/miniconda3/bin/conda shell.zsh hook)"
+conda init
+{code}
+#
+
+h1. Jupyter Notebook
+
+{quote}
+
+{quote}
+
+h2. 安装
+
+{code:title=none|language=bash|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+pip install jupyter
+{code}
+
+h2. 运行
+
+{code:title=none|language=bash|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+jupyter notebook
+# 若是root用户执行，会出现警告 Running as root is not recommended. Use --allow-root to bypass. 需在后面加上 --allow-root
+jupyter notebook --allow-root
+{code}
+
+h1. UnSloth
+
+h1. vLLM
+
+h2. 安装
+
+{code:title=none|language=bash|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+conda create -n vLLM python=3.12
+conda activate vLLM
+pip install vLLM
+
+# FlashInfer is optional but required for specific functionalities such as sliding window attention with Gemma 2.
+# For CUDA 12.4 & torch 2.4 to support sliding window attention for gemma 2 and llama 3.1 style rope
+pip install flashinfer -i https://flashinfer.ai/whl/cu124/torch2.4
+# For other CUDA & torch versions, please check https://docs.flashinfer.ai/installation.html
+# 也可下载到本地
+wget https://github.com/flashinfer-ai/flashinfer/releases/download/v0.2.5/flashinfer_python-0.2.5+cu124torch2.6-cp38-abi3-linux_x86_64.whl#sha256=43d767b912c0c43a04be99595e0123eab9385fc72530a2874b5fb08e3145c0be
+pip install flashinfer_python-0.2.5+cu124torch2.6-cp38-abi3-linux_x86_64.whl --no-deps
+{code}
+
+h2. 部署
+
+h3. deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
+
+{code:title=none|language=bash|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+CUDA_VISIBLE_DEVICES=0 vllm serve /mnt/e/modelscope/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --port 8102 --max-model-len 2048 --api-key token-abc123
+{code}
+
+* CUDA_VISIBLE_DEVICES=0: 指定使用的 GPU 设备 ID。 0 表示使用第一块 GPU。如果您有多块 GPU，可以根据需要修改为其他 ID (例如 CUDA_VISIBLE_DEVICES=1,2 使用 GPU 1 和 GPU 2)。如果您只有一块 GPU，通常使用 0 即可。
+* {{&#47;mnt&#47;e&#47;modelscope&#47;deepseek&#45;ai&#47;DeepSeek&#45;R1&#45;Distill&#45;Qwen&#45;1&#46;5B}}: *模型路径。* 请替换为您在步骤 2 中模型实际保存的路径。
+* {{&#45;&#45;port 8102}}: *服务端口号。* {{8102}} 是服务启动后监听的端口。您可以根据需要修改端口号，例如 {{&#45;&#45;port 8000}}。在后续代码调用中，需要使用相同的端口号。
+* --max-model-len 16384: 模型最大上下文长度。 16384 表示模型处理的最大输入序列长度。您可以根据您的 GPU 显存大小和需求调整此参数。对于 /mnt/e/modelscope/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B 模型，16384 是一个较大的上下文长度。您可以尝试减小此值以减少显存占用，例如 --max-model-len 2048 或更小。
+* *执行启动命令:* 在终端或命令提示符中执行上述 {{vllm serve}} 命令。
+* *注意 GPU 显存:* 启动 vLLM 服务会占用 GPU 显存。请确保您的 GPU 显存足够运行模型。如果显存不足，可能会导致启动失败或运行缓慢。您可以尝试减小 {{&#45;&#45;max&#45;model&#45;len}} 参数或使用更小规模的模型。
+
+h3. Qwen/Qwen3-4B
+
+{code:title=none|language=bash|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+vllm serve ~/modelscope/Qwen/Qwen3-4B --api-key token-abc123 --enable-reasoning --reasoning-parser deepseek_r1 --max_model_len=2048 --gpu_memory_utilization=0.85 
+{code}
+
+h4. supervisor
+
+{code:title=none|language=|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+sudo vim vLLM-Qwen-Qwen3-4B.conf
+[program:vLLM-Qwen-Qwen3-4B.conf]
+command=zsh -c "source /home/user/miniconda3/bin/activate && source activate vLLM && vllm serve ~/modelscope/Qwen/Qwen3-4B --api-key token-abc123 --enable-reasoning --reasoning-parser deepseek_r1 --max_model_len=2048 --gpu_memory_utilization=0.85"
+user=user
+autostart=true
+autorestart=true
+stderr_logfile=/var/log/supervisor/vLLM-Qwen-Qwen3-4B/err.log
+stdout_logfile=/var/log/supervisor/vLLM-Qwen-Qwen3-4B/out.log
+stopasgroup=true
+{code}
+
+h1. LLama.cpp
+
+{quote}
+{{llama&#46;cpp}}是一个基于纯{{C&#47;C&#43;&#43;}}实现的高性能大语言模型推理引擎，专为优化本地及云端部署而设计。其核心目标在于通过底层硬件加速和量化技术，实现在多样化硬件平台上的高效推理，同时保持低资源占用与易用性。
+{quote}
+
+h2. 编译llama.cpp
+
+首先从{{Github}}上下载{{llama&#46;cpp}}的源码:
+
+{code:title=none|language=|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+git clone https://github.com/ggml-org/llama.cpp
+cd llama.cpp
+{code}
+
+{{llama&#46;cpp}}支持多种硬件平台，可根据实际的硬件配置情况选择合适的编译参数进行编译，具体可以参考文档{{docs&#47;build&#46;md}}。
+
+*安装CMAKE*
+
+{code:title=none|language=bash|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+wget https://github.com/Kitware/CMake/releases/download/v4.0.0-rc4/cmake-4.0.0-rc4-linux-x86_64.sh
+chmod -R 777 cmake-4.0.0-rc4-linux-x86_64.sh
+./cmake-4.0.0-rc4-linux-x86_64.sh
+mv cmake-4.0.0-rc4-linux-x86_64/ /usr/local/cmake
+echo 'export PATH="/usr/local/cmake/bin:$PATH"' >> ~/.bashrc
+source ~/.bashrc
+
+sudo dnf install gcc-toolset-13-gcc gcc-toolset-13-gcc-c++
+source /opt/rh/gcc-toolset-13/enable
+{code}
+
+*编译CPU版本*
+
+{code:title=none|language=|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+cmake -B build
+cmake --build build --config Release -j 8
+{code}
+
+*编译GPU版本*
+
+编译英伟达{{GPU}}版本需要先装好驱动和{{CUDA}}，然后执行下面的命令进行编译
+
+{code:title=none|language=|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+cmake -B build -DGGML_CUDA=ON -DGGML_CUDA_ENABLE_UNIFIED_MEMORY=1
+cmake --build build --config Release -j 8
+{code}
+
+{quote}
+编译完成后，可执行文件和库文件被存放在{{build&#47;bin}}目录下。
+{quote}
+
+h2. 模型下载与转换
+
+首先从魔搭社区下载模型：
+
+{code:title=none|language=|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+pip install modelscope
+modelscope download --model deepseek-ai/DeepSeek-R1-Distill-Qwen-7B --local_dir DeepSeek-R1-Distill-Qwen-7B
+{code}
+
+下载好的模型是以{{HuggingFace}}的{{safetensors}}格式存放的，而{{llama&#46;cpp}}使用的是{{GGUF}}格式，因此需要先要把模型转换为{{GGUF}}格式：
+
+{code:title=none|language=|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+# 安装python依赖库
+pip install -r requirements.txt
+# 转换模型
+python convert_hf_to_gguf.py DeepSeek-R1-Distill-Qwen-7B/
+{code}
+
+转换成功后，在该目录下会生成一个{{FP16}}精度、{{GGUF}}格式的模型文件{{DeepSeek&#45;R1&#45;Distill&#45;Qwen&#45;7B&#45;F16&#46;gguf}}。
+
+h2. 模型量化
+
+{{FP16}}精度的模型跑起来可能会有点慢，我们可以对模型进行量化以提升推理速度。
+
+{{llama&#46;cpp}}主要采用了分块量化（{{Block&#45;wise Quantization}}）和{{K&#45;Quantization}}算法来实现模型压缩与加速，其核心策略包括以下关键技术：
+
+# *分块量化（Block-wise Quantization）*
+该方法将权重矩阵划分为固定大小的子块（如{{32}}或{{64}}元素为一组），每个子块独立进行量化。通过为每个子块分配独立的缩放因子（{{Scale}}）和零点（{{Zero Point}}），有效减少量化误差。例如，{{Q4&#95;K&#95;M}}表示每个权重用{{4}}比特存储，且子块内采用动态范围调整。
+# *K-Quantization（混合精度量化）*
+在子块内部进一步划分更小的单元（称为“超块”），根据数值分布动态选择量化参数。例如，{{Q4&#95;K&#95;M}}将超块拆分为多个子单元，每个子单元使用不同位数的缩放因子（如{{6bit}}的缩放因子和{{4bit}}的量化值），通过混合精度平衡精度与压缩率。
+# *重要性矩阵（Imatrix）优化*
+通过分析模型推理过程中各层激活值的重要性，动态调整量化策略。高重要性区域保留更高精度（如{{FP16}}），低重要性区域采用激进量化（如{{Q2&#95;K}}），从而在整体模型性能损失可控的前提下实现高效压缩。
+# *量化类型分级策略*
+提供{{Q2&#95;K}}至{{Q8&#95;K}}等多种量化级别，其中字母后缀（如{{&#95;M}}、{{&#95;S}}）表示优化级别：
+# *Q4_K_M*：中等优化级别，平衡推理速度与精度（常用推荐）。
+# *Q5_K_S*：轻量化级别，侧重减少内存占用
+
+典型场景下，{{Q4&#95;K&#95;M}}相比{{FP16}}模型可减少{{70&#37;}}内存占用，推理速度提升{{2&#45;3}}倍，同时保持{{95&#37;}}以上的原始模型精度。实际部署时需根据硬件资源（如{{GPU}}显存容量）和任务需求（如生成文本长度）选择量化策略。
+
+执行下面的命令可将{{FP16}}精度的模型采用{{Q4&#95;K&#95;M}}的量化策略进行量化：
+
+{code:title=none|language=|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+./build/bin/llama-quantize DeepSeek-R1-Distill-Qwen-7B/DeepSeek-R1-Distill-Qwen-7B-F16.gguf DeepSeek-R1-Distill-Qwen-7B/DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf Q4_K_M
+{code}
+
+量化完成后，模型文件由{{15&#46;2G}}减少到{{4&#46;7G}}。
+
+h2. 运行模型
+
+模型量化完后，我们就可以运行模型来试试效果了。{{llama&#46;cpp}}提供了多种运行模型的方式：
+
+h3. 命令行方式
+
+执行下面的命令就可以在命令行与模型进行对话了：
+
+{code:title=none|language=|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+./build/bin/llama-cli -m DeepSeek-R1-Distill-Qwen-7B/DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf -cnv
+{code}
+
+h3. HTTP Server方式
+
+由于模型是以{{Markdown}}格式输出内容，因此用命令行的方式看着不太方便。{{llama&#46;cpp}}还提供{{HTTP Server}}的方式运行，交互性要好很多。
+
+首先在终端执行命令
+
+{code:title=none|language=|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+./build/bin/llama-server -m DeepSeek-R1-Distill-Qwen-7B/DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf --port 8088
+{code}
+
+然后打开浏览器，输入地址{{http&#58;&#47;&#47;127&#46;0&#46;0&#46;1&#58;8088}}就可以在网页上与模型进行交互了，非常方便！
+
+h1. Ktransformers
+
+安装
+
+h1. LLaMA-Factory
+
+{quote}
+可参考文章：[DeepSeek-R1-7B-Distill模型微调全过程记录，LLaMA_Factory训练自己的数据集，合并lora微调模型并量化为gguf，接入微信实现自动对话回复_微信_qq_53091149-DeepSeek技术社区|https://deepseek.csdn.net/67b84a893c9cd21f4cb9aab6.html#devmenu2]
+{quote}
+
+h2. 安装
+
+{code:title=none|language=bash|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+# 首先 conda创建环境
+conda create -n LLaMA-Factory python=3.12
+# 激活环境
+conda activate LLaMA-Factory
+# 从GitHub上拉去项目代码到当前目录下
+git clone https://github.com/hiyouga/LLaMA-Factory.git
+# 进入目录
+cd LLaMA-Factory
+# 安装所需依赖
+pip install -e ".[torch,metrics]"
+# 启动webui
+python src/webui.py 
+{code}
+
+h2. 微调
+
+!https://markdownhexo.oss-cn-hangzhou.aliyuncs.com/img/image-20250320152454509.png!
+
+!https://markdownhexo.oss-cn-hangzhou.aliyuncs.com/img/image-20250320152533756.png!
+
+合并
+
+!https://markdownhexo.oss-cn-hangzhou.aliyuncs.com/img/image-20250320152645802.png!
+
+h1. AutoAWQ量化
+
+h2. 安装
+
+{code:title=none|language=bash|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+conda create -n AutoAWQ python=3.12
+conda activate AutoAWQ
+pip install torch
+pip install autoawq
+{code}
+
+h2. 脚本
+
+默认AutoAWQ会从Huggingface上下载数据集mit-han-lab/pile-val-backup，会因为网络问题失败
+
+需事先手动下载，通过modelscope
+
+{code:title=none|language=bash|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+modelscope download --dataset mit-han-lab/pile-val-backup  --local_dir ~/modelscope/mit-han-lab/pile-val-backup
+{code}
+
+以{{Qwen&#47;Qwen3&#45;4B}}模型为例
+
+{code:title=none|language=bash|borderStyle=solid|theme=RDark|linenumbers=true|collapse=true}
+from awq import AutoAWQForCausalLM
+from transformers import AutoTokenizer
+from datasets import load_dataset
+
+# 加载本地数据集
+def load_calib_data():
+    data=load_dataset('/home/user/modelscope/mit-han-lab/pile-val-backup', split="validation")
+    return [text for text in data["text"] if text.strip() != '' and len(text.split(' ')) > 20]
+
+model_path = '/home/user/modelscope/Qwen/Qwen3-4B'
+quant_path = '/home/user/modelscope/Qwen/Qwen3-4B-awq'
+quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }
+
+
+# Load model
+# 加载模型
+
+
+model = AutoAWQForCausalLM.from_pretrained(
+    model_path, **{"low_cpu_mem_usage": True, "use_cache": False}
+)
+tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
+
+
+# Quantize
+# 量化
+
+calib_data=load_calib_data()
+model.quantize(tokenizer, quant_config=quant_config, calib_data=calib_data)
+
+
+# Save quantized model
+# 保存量化模型
+
+
+model.save_quantized(quant_path)
+tokenizer.save_pretrained(quant_path)
+
+
+print(f'Model is quantized and saved at "{quant_path}"')
+{code}
+
+h1. 大模型Dense、MoE 与 Hybrid-MoE 架构的比较
+
+在大模型架构设计中，Dense（全连接）、MoE（混合专家）和Hybrid-MoE（混合式MoE）是三种主流的参数组织方式，它们在模型容量、计算效率和应用场景上存在显著差异。以下从核心原理、技术特点、优缺点及适用场景进行系统对比：
+
+----
+
+h2. *1. 核心原理对比*
+
+||*架构类型*||*核心思想*||*典型模型*||
+|*Dense*|所有参数对所有输入生效，每层神经元全连接，统一处理所有输入特征。|GPT-3、BERT、LLAMA|
+|*MoE*|将模型划分为多个“专家”（子网络），每个输入仅激活部分专家，通过路由机制动态分配任务。|Switch Transformer、GShard|
+|*Hybrid-MoE*|混合Dense和MoE层：部分层全连接，部分层采用MoE结构，平衡计算效率和模型容量。|DeepSeek-MoE、Google GLaM|
+
+----
+
+h2. *2. 技术特点与性能对比*
+
+||*维度*||*Dense*||*MoE*||*Hybrid-MoE*||
+|*参数规模*|总参数量=激活参数量，随层数线性增长。|总参数量高（专家数×专家规模），但激活参数量低（仅激活部分专家）。|介于两者之间，MoE层数可控。|
+|*计算效率*|计算成本高（FLOPs与参数量正相关），适合小规模模型。|相同参数量下，FLOPs显著降低（仅激活部分专家）。|通过调整MoE层比例，灵活平衡计算开销。|
+|*训练稳定性*|收敛稳定，梯度传播路径简单。|路由机制易导致专家负载不均衡，需复杂正则化。|稳定性优于纯MoE，但仍需路由优化。|
+|*扩展性*|参数规模受硬件限制，千亿级后成本陡增。|可扩展至万亿参数（如GShard-1.6T），适合超大规模模型。|通过局部MoE化实现高效扩展，适配中等规模。|
+|*显存占用*|高（需存储全部参数梯度）。|显存需求更高（专家参数独立存储）。|显存介于两者之间，取决于MoE层占比。|
+|*应用场景*|通用任务、资源受限场景。|超大规模预训练、多任务学习。|需平衡性能与成本的工业级应用。|
+
+----
+
+h2. *3. 优缺点对比*
+
+h3. *Dense架构*
+
+* *优点*：
+** 结构简单，训练稳定性高。
+** 参数利用率最大化，适合小规模高精度任务。
+* *缺点*：
+** 计算成本随参数量指数级增长，难以扩展至超大规模。
+** 显存占用高，限制单卡可训练模型规模。
+
+h3. *MoE架构*
+
+* *优点*：
+** 计算效率高，相同FLOPs下模型容量更大。
+** 支持万亿级参数扩展，适合分布式训练。
+* *缺点*：
+** 路由机制复杂，易出现专家“坍缩”（部分专家未被激活）。
+** 显存和通信开销大，需定制化负载均衡策略。
+
+h3. *Hybrid-MoE架构*
+
+* *优点*：
+** 灵活性高，可通过调整MoE层位置平衡性能与成本。
+** 保留关键层的全连接特性，提升任务特定性能。
+* *缺点*：
+** 需精心设计MoE层分布，调参成本较高。
+** 仍面临部分MoE的稳定性挑战。
+
+----
+
+h2. *4. 典型应用场景*
+
+||*架构*||*适用场景*||
+|*Dense*|- 中小规模模型（&lt;100B参数） - 对训练稳定性要求高的任务（如对话生成） - 边缘设备推理|
+|*MoE*|- 超大规模预训练（&gt;500B参数） - 多任务/多模态学习 - 云端高性能计算集群|
+|*Hybrid-MoE*|- 中等规模模型（100B-500B参数） - 需兼顾通用性与效率的工业场景 - 长文本处理任务|
+
+----
+
+h2. *5. 技术选型建议*
+
+* *选择Dense的条件*：
+** 资源有限（单卡训练/推理）。
+** 任务单一，无需极高模型容量。
+** 追求极简架构和稳定收敛。
+* *选择MoE的条件*：
+** 追求极致模型性能（如AGI探索）。
+** 拥有大规模计算集群（千卡级）。
+** 多任务/多模态需求显著。
+* *选择Hybrid-MoE的条件*：
+** 需平衡模型容量与计算成本。
+** 部分任务依赖全连接层的强表征能力（如逻辑推理）。
+** 希望渐进式扩展模型规模。
+
+----
+
+h2. *6. 未来发展方向*
+
+# *Dense架构优化*：
+#* 参数高效微调（LoRA、Adapter）。
+#* 动态稀疏激活（如微软的DeepSpeed-MoE）。
+# *MoE架构改进*：
+#* 更智能的路由机制（如基于强化学习）。
+#* 专家共享与分层MoE设计。
+# *Hybrid-MoE创新*：
+#* 自动化MoE层分布搜索（NAS技术）。
+#* 异构专家设计（不同专家结构适配不同任务）。
+
+----
+
+h2. *总结*
+
+* *Dense*：简单可靠，适合资源受限场景，但扩展性差。
+* *MoE*：计算高效，扩展性强，但工程复杂度高。
+* *Hybrid-MoE*：折中方案，平衡性能与成本，需精细调优。
+
+实际选型需结合*任务需求*、*硬件资源*和*工程能力*综合评估。对于大多数企业级应用，Hybrid-MoE可能是当前的最优解，而科研前沿更倾向于探索纯MoE的极限能力。
+