It needs far more features apparently. Tons more. That's why Notepad++ is popular. Which also had a severe security vulnerability recently. Which was actively exploited by some state actor like China.
Strictly, no. But it was a vulnerability in the design of Notepad++, key elements here being the featureset that requires frequent updates and the lack of integrity checks during the upgrade process.
This has prompted me to move on from Notepad++ - it's sad, because I've used it for many years, but this is too much.
Recently, I was pleasantly surprised to discover that the Microsoft Store has a built-in CLI with that exact functionality. You just run `store updates` to check for updates to store-managed apps, and you can target specific items with `store update <update-id>`. Of course, there's also winget for non-store applications (`winget upgrade`). I find them pretty handy as I have become quite used to managing my Linux installations with pacman over the past year or so. I discovered the store CLI completely by accident. It's not widely advertised.
You can if you use the windows store. It's just that you usually install things outside of that, unlike in linuxes where you generally use the package manager that can handle updates for you
The OS provided option can be bare bones, stable, secure and just utilitarian. This promotes having people choose their own tools for the features they want and not really expecting much other than reliability from the OS version. They didn’t need to mess with a good thing.
Yeah, though I there is the same issue the other way round: Great prompt understanding doesn't matter much when the result has an awfully ugly AI fake look to it.
That's definitely true, and the medium also really makes a big difference as well (photorealism, digital painting, watercolor, etc.).
Though in some cases, it is a bit easier to fix visual artifacts (using second-pass refiners, Img2Img, ultimate upscale, stylistic LoRAs, etc.) than a fundamental coherency problem.
I was disappointed when Imagen 4 (and therefore also Nano Banana Pro, which clearly uses Imagen 4 internally to some degree) had a significantly stronger tendency to drift from photorealism to AI fake aesthetics than Imagen 3. This suggests there is a tradeoff between prompt following and avoiding slop style. Perhaps this is also part of the reason why Midjourney isn't good at prompt following.
While I don't doubt this was one influence, there was also an infamous problem with Dall-E 2, which was perfectly able to generate an astronaut riding a horse but completely unable to generate a horse riding an astronaut.
This problem is infamous because it persisted (unlike other early problems, like creating the wrong number of fingers) for much more capable models, and the Qwen Image people are certainly very aware of this difficult test. Even Imagen 4 Ultra, which might be the most advanced pure diffusion model without editing loop, fails at it.
And obviously an astronaut is similar to a man, which connects this benchmark to the Chinese meme.
The examples I saw of z-image look much more realistic than Nano Banana Pro, which is likely using Imagen 4 (plus editing) internally, which isn't very realistic. But Nano Banana Pro has obviously much better prompt alignment than something like z-image.
In your example, z-image and Nano Banana Pro look basically equally photorealistic to me. Perhaps the NBP image looks a bit more real because it resembles an unstaged smartphone shot with wide angle. Anyway, the difference is very small. I agree the lighting in Flux.2 Pro looks a bit off.
But anyway, realistic environments like a street cafe are not suited to test for photorealism. You have to use somewhat more fantastical environments.
I don't have access to z-image, but here are two examples with Nano Banana Pro:
These are terribly unrealistic. Far more so than the Flux.2 Pro image above.
> Also Imagen 4 and Nano Banana Pro are very different models.
No, Imagen 4 is a pure diffusion model. Nano Banana Pro is a Gemini scaffold which uses Imagen to generate an initial image, then Gemini 3 Pro writes prompts to edit the image for much better prompt alignment. The prompts above a very simple, so there is little for Gemini to alter, so they look basically identical to plain Imagen 4. Both pictures (especially the first) have the signature AI look of Imagen 4, which is different from other models like Imagen 3.
By the way, here is GPT Image 1.5 with the same prompts:
The first is very fake and the second is a strong improvement, though still far from the excellent cafe shots above (fake studio lighting, unrealistic colors etc).
>Nano Banana Pro is a Gemini scaffold which uses Imagen to generate an initial image, then Gemini 3 Pro writes prompts to edit the image for much better prompt alignment.
First of all how should you know the architecture details of gemini-3-pro-image, second of all how the model can modify the image if gemini itself is just rewriting the prompt (like old chatgpt+dalle), imagen 4 is just a text-to-image model, not an editing one, it doesn't make sense, nano banana pro can edit images (like the ones you can provide).
> I disagree, nano banana pro result is on a completely different league.
I strongly disagree. But even if you are right, the difference between the cafe shots and the Atlantis shots is clearly much, much larger than the difference between the different cafe shots. The Atlantis shots are super unrealistic. They look far worse than the cafe shots of Flux.2 Pro.
> Why? It's the perfect settings in my opinion
Because it's too easy obviously. We don't need an AI to make fake realistic photos of realistic environments when we can easily photograph those ourselves. Unrealistic environments are more discriminative because they are much more likely to produce garbage that doesn't look photorealistic.
I'm definitely using Nano Banana Pro, and your picture has the same strong AI look to it that is typical of NBP / Imagen 4.
> First of all how should you know the architecture details of gemini-3-pro-image, second of all how the model can modify the image if gemini itself is just rewriting the prompt (like old chatgpt+dalle), imagen 4 is just a text-to-image model, not an editing one, it doesn't make sense, nano banana pro can edit images (like the ones you can provide).
There were discussions about it previously on HN. Clearly NBP is using Gemini reasoning, and clearly the style of NBP strongly resembles Imagen 4 specifically. There is probably also a special editing model involved, just like in Qwen-Imahe-2.0.
Still the vast majority of models fail at delivery an image that looks real, I want realism for a realistic settings, if it can't do that than what's the point. Of course you can always pay people and equipment to make the perfect photo for you ahah
If the image of z-image turbo looks as good as the nano banana pro one for you, you are probably too used to slop that a model that do not produce obvious artifacts like super shiny skin it's immediately undistinguishable from a real image (like the nano banana pro one that to me looks as real as a real photo) and yes I'm ignoring the fact that in the z-image-turbo the cup is too large and the bag is inside the chair. Z-image is good (in particular given its size) but not as good.
It seems you are ignoring the fact that the NBP Atlantis pictures looks much, much worse than the z-image picture of the cafe. They look far more like AI slop. (Perhaps the Atlantis prompt would look even worse with z-image, I don't know.)
I have generated my own using your prompt and post it in the previous comment. You haven't posted a z-image one of Atlantis. I'm not at home to try but I have trained lora for z-image (it's a relatively lightweight model), I know the model, it's not as good as nano banana pro. Use what you prefer.
> I have generated my own using your prompt and post it in the previous comment.
Yes, and it has a very unrealistic AI look to it. That was my point.
> You haven't posted a z-image one of Atlantis.
Yes, I don't doubt that it might well be just as unrealistic or even worse. I also just tried the Atlantis prompts in Grok (no idea what image model they use internally) and they look somewhat more realistic, though not on cafe level.
The complex prompt following ability and editing is seriously impressive here. They don't seem to be much behind OpenAI and Google. Which is backed op by the AI Arena ranking.
Don’t get me wrong Gemini 3 is very impressive! It just seems to always need to give you an answer, even if it has to make it up.
This was also largely how ChatGPT behaved before 5, but OpenAI has gotten much much better at having the model admit it doesn’t know or tell you that the thing you’re looking for doesn’t exist instead of hallucinating something plausible sounding.
Recent example, I was trying to fetch some specific data using an API, and after reading the API docs, I couldn’t figure out how to get it. I asked Gemini 3 since my company pays for that. Gemini gave me a plausible sounding API call to make… which did not work and was completely made up.
Okay, I haven't really tested hallucinations like this, that may well be true. There is another weakness of GPT-5 (including 5.1 and 5.2) I discovered: I have a neat philosophical paradox about information value. This is not in the pre-training data, because I came up with the paradox myself, and I haven't posted it online. So asking a model to solve the paradox is a nice little intelligence test about informal/philosophical reasoning ability.
If I ask ChatGPT to solve it, the non-thinking GPT-5 model usually starts out confidently with a completely wrong answer and then smoothly transitions into the correct answer. Though without flagging that half the answer was wrong. Overall not too bad.
But if I choose the reasoning GPT-5 model, it thinks hardly at all (6 seconds when I just tried) and then gives a completely wrong answer, e.g. about why a premiss technically doesn't hold under contrived conditions, ignoring the fact that the paradox persists even with those circumstances excluded. Basically, it both over- and underthinks the problem. When you tell it that it can ignore those edge cases because they don't affect the paradox, it overthinks things even more and comes up with other wrong solutions that get increasingly technical and confused.
So in this case the GPT-5 reasoning model is actually worse than the version without reasoning. Which is kind of impressive. Gemini 3 Pro generally just gives the correct answer here (it always uses reasoning).
Though I admit this is just a single example and hardly significant. I guess it reveals that the reasoning training is trained hard on more verifiable things like math and coding but very brittle at philosophical thinking that isn't just repeating knowledge it gained during pre-training.
Maybe another interesting data point: If you ask either of ChatGPT/Gemini why there are so many dark mode websites (black background with white text) but basically no dark mode books, both models come up with contrived explanations involving printing costs. Which would be highly irrelevant for modern printers. There is a far better explanation than that, but both LLMs a) can't think of it (which isn't too bad, the explanation isn't trivial) and b) are unable to say "Sorry, I don't really know", which is much worse.
Basically, if you ask either LLM for an explanation for something, they seem to always try to answer (with complete confidence) with some explanation, even if it is a terrible explanation. That seems related to the hallucination you mentioned, because in both cases the model can't express its uncertainty.
Which still come out behind other than multi core, while using substantially more power.
Those panther lake comparisons are from the top end PTL to the base M series. If they were compared to their comparative SKUs they’d be even further behind.
The article said the M5 has significantly higher single core CPU performance, Panther Lake has significantly higher GPU performance. The Panther Lake devices had OLED screens, which consume significantly more power than LCDs, so they were at a disadvantage.
They consume more power at the chip level. You can see this in Intels spec sheets. The base recommended power envelope of the PTL is the maximum power envelope of the M5. They’re completely different tiers. You’re comparing a 25-85W tier chip to a 5W-25W chip.
They also only win when it comes to multi core whether that’s CPU or GPU. If they were fairly compared to the correct SoC (an M4 Pro) they’d come out behind on both multicore CPU and GPU.
This was all mentioned in my comment addressing the article. This is the trick that apples competitors are using, by comparing across SKU ranges to grab the headlines. PTL is a strong chip, no doubt, but it’s still behind Apple across all the metrics in a like for like comparison.
Apparently the practical power difference is smaller than what the spec sheets suggest, otherwise you wouldn't get this rather long battery life, especially considering that it also has OLED.
Are the Intel systems plugged in when running those tests? Usually when Apple machines do the tests then the difference between battery/plugged in is small if any.
reply