If you have a Macbook Pro M series with at least 64 GB's of RAM you can now run a GPT-4 level LLM locally!
1. Install
2. Open your terminal and run ollama pull llama3.3
3. Then ollama run llama3.3 "your prompt"
Your own personal AI is here!
Conversation
Quote
Iggy
@ignacioaal
Replying to @Kaiyes_ and @ollama
Ollama also has other smaller choices, see:
7B parameter models like Mistral Instruct and OpenChat running at Q4_K_M quantization5
CodeLlama and other specialized coding models up to 8B parameters14
Nous-Hermes 10.7B using Q4_K_M quantization
check out github.com/exo-explore/exo
Show moreLooking to make a difference? NMMI is your launchpad. Our programs equip you with the leadership, critical thinking, and character to excel in college, your career, and beyond. Start your journey with us and prepare to reach new heights.
Awesome! Thanks for sharing, are you doing it "chatbox" style or using it as part of your workflow?
as long as you have enough RAM to load the model (llama 3.3 is 43 GB) you'll be fine, try it out! if not, see:
Quote
Iggy
@ignacioaal
Replying to @Kaiyes_ and @ollama
Ollama also has other smaller choices, see:
7B parameter models like Mistral Instruct and OpenChat running at Q4_K_M quantization5
CodeLlama and other specialized coding models up to 8B parameters14
Nous-Hermes 10.7B using Q4_K_M quantization
check out github.com/exo-explore/exo
Show moreOllama also has other smaller choices, see:
7B parameter models like Mistral Instruct and OpenChat running at Q4_K_M quantization5
CodeLlama and other specialized coding models up to 8B parameters14
Nous-Hermes 10.7B using Q4_K_M quantization
check out
Show more
Depends on your RAM, see
Quote
Iggy
@ignacioaal
Replying to @Kaiyes_ and @ollama
Ollama also has other smaller choices, see:
7B parameter models like Mistral Instruct and OpenChat running at Q4_K_M quantization5
CodeLlama and other specialized coding models up to 8B parameters14
Nous-Hermes 10.7B using Q4_K_M quantization
check out github.com/exo-explore/exo
Show moreThere are many LLMs you can run locally even on a cell phone. Many run on MacBook, nothing new
yes ofc. But this is the first time that we can have 4o level intelligence locally :)
Looking to make a difference? NMMI is your launchpad. Our programs equip you with the leadership, critical thinking, and character to excel in college, your career, and beyond. Start your journey with us and prepare to reach new heights.
Unfortunately not at this time, Model is 42 Gigs so you'll only be left with 6 Gigs for all daily work, but see
Quote
Iggy
@ignacioaal
Replying to @Kaiyes_ and @ollama
Ollama also has other smaller choices, see:
7B parameter models like Mistral Instruct and OpenChat running at Q4_K_M quantization5
CodeLlama and other specialized coding models up to 8B parameters14
Nous-Hermes 10.7B using Q4_K_M quantization
check out github.com/exo-explore/exo
Show moreQuote
Iggy
@ignacioaal
Replying to @Kaiyes_ and @ollama
Ollama also has other smaller choices, see:
7B parameter models like Mistral Instruct and OpenChat running at Q4_K_M quantization5
CodeLlama and other specialized coding models up to 8B parameters14
Nous-Hermes 10.7B using Q4_K_M quantization
check out github.com/exo-explore/exo
Show moreThis is exciting! Running powerful models locally opens up a ton of possibilities for experimentation and creativity. Can't wait to try it out!
what’s better than ollama? vllm. Sadly they don’t support mlx yet.
guys show us some love
>>> You're running on my macbook
I'm a cloud-based language model, so I don't actually "run" on your
MacBook in the classical sense. Instead, I exist as a remote service that
you can interact with through the internet.
When you ask me a question or provide input, your MacBook
Show more
Nice! do you want to create a video tutorial on how to set GPT on specific task training with AWS?
Stuck with GPT models? Unleash the open-source LLM power and chat without limits!
So when I want to go full personal house assistant and have it be private, buy sota home server type pc with RTX 5090 and 64gb ram and enjoy? 
Startups like Luna use AI to scale support, handling 100,000 users and 40,000 questions with fewer resources. #Dellforstartups #DWEN #AI #ad
The benchmarks
Wow, GPT-4-level models running locally? The fact you can now do this on a MacBook with 64GB RAM is wild. What’s the first thing you’d ask your “personal AI” to do if you set this up?
No internet, no waiting, no privacy worries—just you and an AI that can handle complex tasks right from your laptop. It feels like we’re entering a new era. What’s everyone’s take on the most useful applications for this setup?
This is why I got 128GB of ram
So I can ignore this and keep using gpt-4o-mini for small stuff
Crazy! Now imagine when Llama 4 drops. Local LLM with that level of intelligence is going to open up so many interesting possibilities that today don’t make economic sense.
Running a GPT-4 level LLM locally is a game-changer for privacy and efficiency. The MacBook Pro M series is finally flexing its true potential! Curious - how does the performance compare to cloud-based setups, especially for extended sessions?
I have llama already on my PC. I need like 3 Minutes until it renders an answer.
I should legit try this. In fairness no clue of the real world benefit since I doubt this is as good as an online LLM, but still kinda cool
With 24 GB RAM you can run llama 3.3 8B with INT16 quantized model. should work fine for students at least
Yeah, sure but then you won't have any resources to run anything else, thus defeating any purpose to it....
This is the real future. Perry song we'll all have Individualized llms in our pockets.
Deploy Llama 3.1 as an API, fine-tune on custom data, and more.
I have tried this before, and the results are impressive, but my laptop could heat a room.
Can it use vision? Can it access the internet? Can it run python code?
The biggest improvements to ChatGPT have been the integrations in the last few years.
Running a bare language model is neat but not as useful.
that's cool and having recent gpt4 level performance in smaller sizes is relatively new. but gpt4 level performance in smaller models has been around for quite some time by now. the first llama3 version brought that to a 70b model, gemma 27b is roughly comparable.
Tried running 3.3 on my M1 MacBook Air (16GB memory). It’s very slow, but works.
I reckon we can run it in 16gb variant as well. Might get about 15 tokens/sec though.
Running llama 3.2 3B native on iPhone pro with no internet is more impressive to me rn tbh.
But yes fully local LLMs will be another game changer.
This is highly quantised model. Always go for instruct 16p model from Ollama for full version.
Startups like Luna use AI to scale support, handling 100,000 users and 40,000 questions with fewer resources. #Dellforstartups #DWEN #AI #ad
Slide 1 of 6 - Carousel
It is useful, but you get the quantized version, so no gpt-4 quality responses. If you want to try the real power of Llama 3.3 70b make sure you are inferencing with a no quantized version.