经验首页 前端设计 程序设计 Java相关 移动开发 数据库/运维 软件/图像 大数据/云计算 其他经验
当前位置:技术经验 » 程序设计 » C++ » 查看文章
本地推理,单机运行,MacM1芯片系统基于大语言模型C++版本LLaMA部署“本地版”的ChatGPT
来源:cnblogs  作者:刘悦的技术博客  时间:2023/3/24 9:04:59  对本文有异议

OpenAI公司基于GPT模型的ChatGPT风光无两,眼看它起朱楼,眼看它宴宾客,FaceBook终于坐不住了,发布了同样基于LLM的人工智能大语言模型LLaMA,号称包含70亿、130亿、330亿和650亿这4种参数规模的模型,参数是指神经网络中的权重和偏置等可调整的变量,用于训练和优化神经网络的性能,70亿意味着神经网络中有70亿个参数,由此类推。

在一些大型神经网络中,每个参数需要使用32位或64位浮点数进行存储,这意味着每个参数需要占用4字节或8字节的存储空间。因此,对于包含70亿个参数的神经网络,其存储空间将分别为8 GB或12GB。

此外,神经网络的大小不仅取决于参数的数量,还取决于神经元的数目,层数和其他结构参数等。因此,70亿的神经网络可能会占用更多的存储空间,具体取决于网络的结构和实现细节。

因此这种体量的模型单机跑绝对够我们喝一壶,所以本次使用最小的LLaMA 7B模型进行测试。

LLaMA项目安装和模型配置

和Stable-Diffusion项目如出一辙,FaceBook开源的LLaMA项目默认写死使用cuda模式,这也就意味着必须有 NVIDIA 的 GPU来训练和运行,不过好在大神GeorgiGerganov 用 C++ 基于 LLaMA 项目重写了一个跑在 CPU 上的移植版本 llama.cpp应用。

llama.cpp首先适配的就是苹果的M系列芯片,这对于果粉来说无疑是一个重大利好,首先通过命令拉取C++版本的LLaMA项目:

  1. git clone https://github.com/ggerganov/llama.cpp

随后进入项目目录:

  1. llama.cpp

在项目中,需要单独建立一个模型文件夹models:

  1. mkdir models

随后去huggingface官网下载LLaMA的7B模型文件:https://huggingface.co/nyanko7/LLaMA-7B/tree/main

是的,主模型文件已经达到了13.5gb之巨,如果本地硬盘空间告急,请谨慎下载。

随后在models目录建立模型子目录7B:

  1. mkdir 7B

将tokenizer.model和tokenizer_checklist.chk放入和7B平行的目录中:

  1. ? models git:(master) ? ls
  2. 7B tokenizer.model tokenizer_checklist.chk

随后将checklist.chk consolidated.00.pth和params.json放入7B目录中:

  1. ? 7B git:(master) ? ls
  2. checklist.chk consolidated.00.pth params.json

至此,模型就配置好了。

LLaMA模型转换

由于我们没有使用FaceBook的原版项目,所以它的模型还需要进行转换,也就是转换为当前C++版本的LLaMA可以运行的模型。

这里通过Python脚本进行转换操作:

  1. python3 convert-pth-to-ggml.py models/7B/ 1

第一个参数是模型所在目录,第二个参数为转换时使用的浮点类型,使用 float32,转换的结果文件会大一倍,当该参数值为 1时,则使用 float16 这个默认值,这里我们使用默认数据类型。

程序输出:

  1. ? llama.cpp git:(master) ? python convert-pth-to-ggml.py models/7B/ 1
  2. {'dim': 4096, 'multiple_of': 256, 'n_heads': 32, 'n_layers': 32, 'norm_eps': 1e-06, 'vocab_size': -1}
  3. n_parts = 1
  4. Processing part 0
  5. Processing variable: tok_embeddings.weight with shape: torch.Size([32000, 4096]) and type: torch.float16
  6. Processing variable: norm.weight with shape: torch.Size([4096]) and type: torch.float16
  7. Converting to float32
  8. Processing variable: output.weight with shape: torch.Size([32000, 4096]) and type: torch.float16
  9. Processing variable: layers.0.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  10. Processing variable: layers.0.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  11. Processing variable: layers.0.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  12. Processing variable: layers.0.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  13. Processing variable: layers.0.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  14. Processing variable: layers.0.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
  15. Processing variable: layers.0.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  16. Processing variable: layers.0.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  17. Converting to float32
  18. Processing variable: layers.0.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  19. Converting to float32
  20. Processing variable: layers.1.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  21. Processing variable: layers.1.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  22. Processing variable: layers.1.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  23. Processing variable: layers.1.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  24. Processing variable: layers.1.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  25. Processing variable: layers.1.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
  26. Processing variable: layers.1.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  27. Processing variable: layers.1.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  28. Converting to float32
  29. Processing variable: layers.1.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  30. Converting to float32
  31. Processing variable: layers.2.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  32. Processing variable: layers.2.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  33. Processing variable: layers.2.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  34. Processing variable: layers.2.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  35. Processing variable: layers.2.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  36. Processing variable: layers.2.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
  37. Processing variable: layers.2.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  38. Processing variable: layers.2.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  39. Converting to float32
  40. Processing variable: layers.2.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  41. Converting to float32
  42. Processing variable: layers.3.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  43. Processing variable: layers.3.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  44. Processing variable: layers.3.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  45. Processing variable: layers.3.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  46. Processing variable: layers.3.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  47. Processing variable: layers.3.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
  48. Processing variable: layers.3.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  49. Processing variable: layers.3.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  50. Converting to float32
  51. Processing variable: layers.3.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  52. Converting to float32
  53. Processing variable: layers.4.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  54. Processing variable: layers.4.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  55. Processing variable: layers.4.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  56. Processing variable: layers.4.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  57. Processing variable: layers.4.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  58. Processing variable: layers.4.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
  59. Processing variable: layers.4.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  60. Processing variable: layers.4.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  61. Converting to float32
  62. Processing variable: layers.4.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  63. Converting to float32
  64. Processing variable: layers.5.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  65. Processing variable: layers.5.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  66. Processing variable: layers.5.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  67. Processing variable: layers.5.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  68. Processing variable: layers.5.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  69. Processing variable: layers.5.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
  70. Processing variable: layers.5.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  71. Processing variable: layers.5.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  72. Converting to float32
  73. Processing variable: layers.5.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  74. Converting to float32
  75. Processing variable: layers.6.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  76. Processing variable: layers.6.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  77. Processing variable: layers.6.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  78. Processing variable: layers.6.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  79. Processing variable: layers.6.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  80. Processing variable: layers.6.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
  81. Processing variable: layers.6.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  82. Processing variable: layers.6.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  83. Converting to float32
  84. Processing variable: layers.6.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  85. Converting to float32
  86. Processing variable: layers.7.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  87. Processing variable: layers.7.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  88. Processing variable: layers.7.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  89. Processing variable: layers.7.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  90. Processing variable: layers.7.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  91. Processing variable: layers.7.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
  92. Processing variable: layers.7.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  93. Processing variable: layers.7.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  94. Converting to float32
  95. Processing variable: layers.7.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  96. Converting to float32
  97. Processing variable: layers.8.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  98. Processing variable: layers.8.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  99. Processing variable: layers.8.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  100. Processing variable: layers.8.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  101. Processing variable: layers.8.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  102. Processing variable: layers.8.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
  103. Processing variable: layers.8.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  104. Processing variable: layers.8.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  105. Converting to float32
  106. Processing variable: layers.8.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  107. Converting to float32
  108. Processing variable: layers.9.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  109. Processing variable: layers.9.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  110. Processing variable: layers.9.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  111. Processing variable: layers.9.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  112. Processing variable: layers.9.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  113. Processing variable: layers.9.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
  114. Processing variable: layers.9.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  115. Processing variable: layers.9.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  116. Converting to float32
  117. Processing variable: layers.9.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  118. Converting to float32
  119. Processing variable: layers.10.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  120. Processing variable: layers.10.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  121. Processing variable: layers.10.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  122. Processing variable: layers.10.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  123. Processing variable: layers.10.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  124. Processing variable: layers.10.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
  125. Processing variable: layers.10.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  126. Processing variable: layers.10.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  127. Converting to float32
  128. Processing variable: layers.10.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  129. Converting to float32
  130. Processing variable: layers.11.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  131. Processing variable: layers.11.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  132. Processing variable: layers.11.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  133. Processing variable: layers.11.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  134. Processing variable: layers.11.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  135. Processing variable: layers.11.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
  136. Processing variable: layers.11.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  137. Processing variable: layers.11.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  138. Converting to float32
  139. Processing variable: layers.11.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  140. Converting to float32
  141. Processing variable: layers.12.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  142. Processing variable: layers.12.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  143. Processing variable: layers.12.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  144. Processing variable: layers.12.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  145. Processing variable: layers.12.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  146. Processing variable: layers.12.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
  147. Processing variable: layers.12.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  148. Processing variable: layers.12.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  149. Converting to float32
  150. Processing variable: layers.12.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  151. Converting to float32
  152. Processing variable: layers.13.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  153. Processing variable: layers.13.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  154. Processing variable: layers.13.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  155. Processing variable: layers.13.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  156. Processing variable: layers.13.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  157. Processing variable: layers.13.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
  158. Processing variable: layers.13.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  159. Processing variable: layers.13.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  160. Converting to float32
  161. Processing variable: layers.13.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  162. Converting to float32
  163. Processing variable: layers.14.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  164. Processing variable: layers.14.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  165. Processing variable: layers.14.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  166. Processing variable: layers.14.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  167. Processing variable: layers.14.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  168. Processing variable: layers.14.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
  169. Processing variable: layers.14.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  170. Processing variable: layers.14.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  171. Converting to float32
  172. Processing variable: layers.14.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  173. Converting to float32
  174. Processing variable: layers.15.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  175. Processing variable: layers.15.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  176. Processing variable: layers.15.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  177. Processing variable: layers.15.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  178. Processing variable: layers.15.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  179. Processing variable: layers.15.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
  180. Processing variable: layers.15.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  181. Processing variable: layers.15.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  182. Converting to float32
  183. Processing variable: layers.15.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  184. Converting to float32
  185. Processing variable: layers.16.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  186. Processing variable: layers.16.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  187. Processing variable: layers.16.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  188. Processing variable: layers.16.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  189. Processing variable: layers.16.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  190. Processing variable: layers.16.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
  191. Processing variable: layers.16.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  192. Processing variable: layers.16.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  193. Converting to float32
  194. Processing variable: layers.16.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  195. Converting to float32
  196. Processing variable: layers.17.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  197. Processing variable: layers.17.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  198. Processing variable: layers.17.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  199. Processing variable: layers.17.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  200. Processing variable: layers.17.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  201. Processing variable: layers.17.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
  202. Processing variable: layers.17.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  203. Processing variable: layers.17.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  204. Converting to float32
  205. Processing variable: layers.17.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  206. Converting to float32
  207. Processing variable: layers.18.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  208. Processing variable: layers.18.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  209. Processing variable: layers.18.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  210. Processing variable: layers.18.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  211. Processing variable: layers.18.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  212. Processing variable: layers.18.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
  213. Processing variable: layers.18.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  214. Processing variable: layers.18.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  215. Converting to float32
  216. Processing variable: layers.18.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  217. Converting to float32
  218. Processing variable: layers.19.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  219. Processing variable: layers.19.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  220. Processing variable: layers.19.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  221. Processing variable: layers.19.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  222. Processing variable: layers.19.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  223. Processing variable: layers.19.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
  224. Processing variable: layers.19.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  225. Processing variable: layers.19.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  226. Converting to float32
  227. Processing variable: layers.19.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  228. Converting to float32
  229. Processing variable: layers.20.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  230. Processing variable: layers.20.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  231. Processing variable: layers.20.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  232. Processing variable: layers.20.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  233. Processing variable: layers.20.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  234. Processing variable: layers.20.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
  235. Processing variable: layers.20.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  236. Processing variable: layers.20.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  237. Converting to float32
  238. Processing variable: layers.20.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  239. Converting to float32
  240. Processing variable: layers.21.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  241. Processing variable: layers.21.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  242. Processing variable: layers.21.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  243. Processing variable: layers.21.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  244. Processing variable: layers.21.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  245. Processing variable: layers.21.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
  246. Processing variable: layers.21.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  247. Processing variable: layers.21.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  248. Converting to float32
  249. Processing variable: layers.21.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  250. Converting to float32
  251. Processing variable: layers.22.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  252. Processing variable: layers.22.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  253. Processing variable: layers.22.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  254. Processing variable: layers.22.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  255. Processing variable: layers.22.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  256. Processing variable: layers.22.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
  257. Processing variable: layers.22.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  258. Processing variable: layers.22.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  259. Converting to float32
  260. Processing variable: layers.22.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  261. Converting to float32
  262. Processing variable: layers.23.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  263. Processing variable: layers.23.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  264. Processing variable: layers.23.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  265. Processing variable: layers.23.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  266. Processing variable: layers.23.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  267. Processing variable: layers.23.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
  268. Processing variable: layers.23.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  269. Processing variable: layers.23.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  270. Converting to float32
  271. Processing variable: layers.23.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  272. Converting to float32
  273. Processing variable: layers.24.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  274. Processing variable: layers.24.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  275. Processing variable: layers.24.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  276. Processing variable: layers.24.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  277. Processing variable: layers.24.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  278. Processing variable: layers.24.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
  279. Processing variable: layers.24.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  280. Processing variable: layers.24.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  281. Converting to float32
  282. Processing variable: layers.24.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  283. Converting to float32
  284. Processing variable: layers.25.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  285. Processing variable: layers.25.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  286. Processing variable: layers.25.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  287. Processing variable: layers.25.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  288. Processing variable: layers.25.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  289. Processing variable: layers.25.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
  290. Processing variable: layers.25.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  291. Processing variable: layers.25.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  292. Converting to float32
  293. Processing variable: layers.25.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  294. Converting to float32
  295. Processing variable: layers.26.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  296. Processing variable: layers.26.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  297. Processing variable: layers.26.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  298. Processing variable: layers.26.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  299. Processing variable: layers.26.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  300. Processing variable: layers.26.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
  301. Processing variable: layers.26.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  302. Processing variable: layers.26.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  303. Converting to float32
  304. Processing variable: layers.26.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  305. Converting to float32
  306. Processing variable: layers.27.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  307. Processing variable: layers.27.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  308. Processing variable: layers.27.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  309. Processing variable: layers.27.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  310. Processing variable: layers.27.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  311. Processing variable: layers.27.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
  312. Processing variable: layers.27.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  313. Processing variable: layers.27.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  314. Converting to float32
  315. Processing variable: layers.27.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  316. Converting to float32
  317. Processing variable: layers.28.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  318. Processing variable: layers.28.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  319. Processing variable: layers.28.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  320. Processing variable: layers.28.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  321. Processing variable: layers.28.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  322. Processing variable: layers.28.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
  323. Processing variable: layers.28.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  324. Processing variable: layers.28.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  325. Converting to float32
  326. Processing variable: layers.28.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  327. Converting to float32
  328. Processing variable: layers.29.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  329. Processing variable: layers.29.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  330. Processing variable: layers.29.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  331. Processing variable: layers.29.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  332. Processing variable: layers.29.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  333. Processing variable: layers.29.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
  334. Processing variable: layers.29.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  335. Processing variable: layers.29.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  336. Converting to float32
  337. Processing variable: layers.29.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  338. Converting to float32
  339. Processing variable: layers.30.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  340. Processing variable: layers.30.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  341. Processing variable: layers.30.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  342. Processing variable: layers.30.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  343. Processing variable: layers.30.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  344. Processing variable: layers.30.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
  345. Processing variable: layers.30.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  346. Processing variable: layers.30.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  347. Converting to float32
  348. Processing variable: layers.30.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  349. Converting to float32
  350. Processing variable: layers.31.attention.wq.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  351. Processing variable: layers.31.attention.wk.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  352. Processing variable: layers.31.attention.wv.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  353. Processing variable: layers.31.attention.wo.weight with shape: torch.Size([4096, 4096]) and type: torch.float16
  354. Processing variable: layers.31.feed_forward.w1.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  355. Processing variable: layers.31.feed_forward.w2.weight with shape: torch.Size([4096, 11008]) and type: torch.float16
  356. Processing variable: layers.31.feed_forward.w3.weight with shape: torch.Size([11008, 4096]) and type: torch.float16
  357. Processing variable: layers.31.attention_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  358. Converting to float32
  359. Processing variable: layers.31.ffn_norm.weight with shape: torch.Size([4096]) and type: torch.float16
  360. Converting to float32
  361. Done. Output file: models/7B//ggml-model-f16.bin, (part 0)

可以看到,如果转换成功,会在models/7B/目录生成一个C++可以调用的ggml-model-f16.bin模型文件。

LLaMA模型调用

接下来就可以调用转换后的模型了,首先在编译C++项目:

  1. make

程序返回:

  1. ? llama.cpp git:(master) ? make
  2. I llama.cpp build info:
  3. I UNAME_S: Darwin
  4. I UNAME_P: arm
  5. I UNAME_M: arm64
  6. I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -DGGML_USE_ACCELERATE
  7. I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++17 -fPIC -pthread
  8. I LDFLAGS: -framework Accelerate
  9. I CC: Apple clang version 14.0.0 (clang-1400.0.29.202)
  10. I CXX: Apple clang version 14.0.0 (clang-1400.0.29.202)
  11. cc -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -DGGML_USE_ACCELERATE -c ggml.c -o ggml.o
  12. c++ -I. -I./examples -O3 -DNDEBUG -std=c++17 -fPIC -pthread -c utils.cpp -o utils.o
  13. c++ -I. -I./examples -O3 -DNDEBUG -std=c++17 -fPIC -pthread main.cpp ggml.o utils.o -o main -framework Accelerate
  14. ./main -h
  15. usage: ./main [options]
  16. options:
  17. -h, --help show this help message and exit
  18. -i, --interactive run in interactive mode
  19. -ins, --instruct run in instruction mode (use with Alpaca models)
  20. -r PROMPT, --reverse-prompt PROMPT
  21. in interactive mode, poll user input upon seeing PROMPT (can be
  22. specified more than once for multiple prompts).
  23. --color colorise output to distinguish prompt and user input from generations
  24. -s SEED, --seed SEED RNG seed (default: -1)
  25. -t N, --threads N number of threads to use during computation (default: 4)
  26. -p PROMPT, --prompt PROMPT
  27. prompt to start generation with (default: empty)
  28. --random-prompt start with a randomized prompt.
  29. -f FNAME, --file FNAME
  30. prompt file to start generation.
  31. -n N, --n_predict N number of tokens to predict (default: 128)
  32. --top_k N top-k sampling (default: 40)
  33. --top_p N top-p sampling (default: 0.9)
  34. --repeat_last_n N last n tokens to consider for penalize (default: 64)
  35. --repeat_penalty N penalize repeat sequence of tokens (default: 1.3)
  36. -c N, --ctx_size N size of the prompt context (default: 512)
  37. --ignore-eos ignore end of stream token and continue generating
  38. --memory_f16 use f16 instead of f32 for memory key+value
  39. --temp N temperature (default: 0.8)
  40. -b N, --batch_size N batch size for prompt processing (default: 8)
  41. -m FNAME, --model FNAME
  42. model path (default: models/llama-7B/ggml-model.bin)
  43. c++ -I. -I./examples -O3 -DNDEBUG -std=c++17 -fPIC -pthread quantize.cpp ggml.o utils.o -o quantize -framework Accelerate

编译成功后,本地会生成一个main.cpp文件。

随后根据编译后输出的说明文档直接调用模型即可:

  1. ./main -m ./models/7B/ggml-model-f16.bin -p 'Hi i am '

程序输出:

  1. ? llama.cpp git:(master) ? ./main -m ./models/7B/ggml-model-f16.bin -p 'hi i am'
  2. main: seed = 1679400707
  3. llama_model_load: loading model from './models/7B/ggml-model-f16.bin' - please wait ...
  4. llama_model_load: n_vocab = 32000
  5. llama_model_load: n_ctx = 512
  6. llama_model_load: n_embd = 4096
  7. llama_model_load: n_mult = 256
  8. llama_model_load: n_head = 32
  9. llama_model_load: n_layer = 32
  10. llama_model_load: n_rot = 128
  11. llama_model_load: f16 = 1
  12. llama_model_load: n_ff = 11008
  13. llama_model_load: n_parts = 1
  14. llama_model_load: ggml ctx size = 13365.09 MB
  15. llama_model_load: memory_size = 512.00 MB, n_mem = 16384
  16. llama_model_load: loading model part 1/1 from './models/7B/ggml-model-f16.bin'
  17. llama_model_load: .................................... done
  18. llama_model_load: model size = 12853.02 MB / num tensors = 291
  19. system_info: n_threads = 4 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
  20. main: prompt: ' hi i am'
  21. main: number of tokens in prompt = 6
  22. 1 -> ''
  23. 13450 -> ' hi'
  24. 423 -> 'i'
  25. 25523 -> ' am'
  26. sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000
  27. hi i am a pythoner, but sunk to become a ruby

说实话,推理速度实在不敢恭维,也可能是因为笔者的电脑配置太渣导致。

结语

LLaMA 7B模型总体上需要纯英文的提示词(prompt),对中文的理解能力还不够,优势是确实可以单机跑起来,当然本地跑的话,减少了网络传输数据的环节,推理效率自然也就更高,对于普通的AI爱好者来说,足矣。

原文链接:https://www.cnblogs.com/v3ucn/p/17250139.html

 友情链接:直通硅谷  点职佳  北美留学生论坛

本站QQ群:前端 618073944 | Java 606181507 | Python 626812652 | C/C++ 612253063 | 微信 634508462 | 苹果 692586424 | C#/.net 182808419 | PHP 305140648 | 运维 608723728

W3xue 的所有内容仅供测试,对任何法律问题及风险不承担任何责任。通过使用本站内容随之而来的风险与本站无关。
关于我们  |  意见建议  |  捐助我们  |  报错有奖  |  广告合作、友情链接(目前9元/月)请联系QQ:27243702 沸活量
皖ICP备17017327号-2 皖公网安备34020702000426号