r/LocalLLaMA • 81일 전

레몬네이드, 실험적 vLLM ROCm 백엔드 추가

IMP

6/10

핵심 요약

오픈소스 LLM 서버인 레몬네이드(Lemonade)에 AMD GPU 환경을 위한 vLLM ROCm 백엔드가 실험적으로 추가되었습니다. 이번 업데이트로 인해 모델을 GGUF 포맷으로 변환할 필요 없이 .safetensors 형식의 LLM을 직접 구동할 수 있게 되어, AMD 그래픽카드 사용자들의 모델 활용성이 크게 향상되었습니다. 개발진은 핵심 기능은 구현되었으나 일부 불안정한 부분이 존재하며, 향후 개발 방향을 잡기 위해 커뮤니티의 피드백을 적극적으로 요청하고 있습니다.

번역된 본문

vLLM은 .safetensors 형식의 대규모 언어 모델(LLM)을 GGUF로 변환하기 전에 실행할 수 있는 기능을 제공하며, 이는 탐구해 볼 만한 새로운 엔진입니다. 개인적으로는 u/krishna2910-amd/, u/mikkoph, u/sa1sr1 님 덕분에 Lemonade에서 llama.cpp를 실행하는 것만큼이나 쉽게 사용해 볼 수 있을 때까지 한 번도 시도해 본 적이 없었습니다:

lemonade backends install vllm:rocm
lemonade run Qwen3.5-0.8B-vLLM

저희에게 이것은 실험적인 백엔드(backend)입니다. 필수적인 기능들은 구현되었지만 알려진 미흡한 점 rough edges들이 아직 존재하기 때문입니다. 저희는 이 기능을 어느 방향으로, 그리고 어느 정도까지 발전시켜야 할지 파악하기 위해 커뮤니티의 피드백을 원합니다. 흥미로우시다면 여러분의 생각을 저희에게 알려주세요!

빠른 시작 가이드: https://lemonade-server.ai/news/vllm-rocm.html GitHub: https://github.com/lemonade-sdk/lemonade Discord: https://discord.gg/5xXzkMu8Zk

원문 보기

원문 보기 (영어)

vLLM has the ability to run .safetensors LLMs before they are converted to GGUF and represents a new engine to explore. I personally had never tried it out until u/krishna2910-amd/ u/mikkoph and u/sa1sr1 made it as easy as running llama.cpp in Lemonade: ``` lemonade backends install vllm:rocm lemonade run Qwen3.5-0.8B-vLLM ``` This is an experimental backend for us in the sense that the essentials are implemented, but there are known rough edges. We want the community's feedback to see where and how far we should take this. If you find it interesting, please let us know your thoughts! Quick start guide: https://lemonade-server.ai/news/vllm-rocm.html GitHub: https://github.com/lemonade-sdk/lemonade Discord: https://discord.gg/5xXzkMu8Zk

vLLM ROCm(AMD) Lemonade 오픈소스 LLM 서버