The 'p' in '240p' doesn't stand for pixels, it stands for progressive scan. Also 240p/480i/480p are the standard accepted terms for these low resolution video signals[0], nitpicking technical details as a 'gotcha' when people use standard terminology isn't helpful.
Phosphors may not be pixels, but 240p doesn't say anything about pixels. The number tells us how many lines, and the p tells us that each screenful of lines covers the whole picture (the p is for progressive, vs i for interlaced). The whole phrase 240p CRT TV tells us it's a normalish NTSC tv, not a hi-res tv with fancier electronics to work with digital tv and which would likely have more processing delays.
Interestingly the '240p' signal sent out by video game consoles of that era is really a hack, as 240p wasn't a standard signal supported by TVs of the time.
It's actually a 480i signal with the timing fiddled with so that the alternate lines still strike the same part of the screen (this is why games from that era had such noticeable scanlines - the CRT beam is only lighting up alternate horizontal lines).
This also means that a lot of more modern TVs (and even some upscalers marketed for retro gaming) do an extra terrible job of upscaling 240p signals because they run the same logic that they would if it was normal 480i, resulting in unnecessary flickering or dropped frames.
The Analog TV doesn't have a framebuffer, and neither do most consoles, until you get into the 3d era.
My understanding is that the timing of the vblank signalling that comes between fields determines weather the next field is an even field or an odd field. If the vblank signalling comes in the middle of the last scanline, the next field is an even field; if the vblank comes aligned with the end of the last scanline, the next field is an odd field.
If you always start vblank signalling in the middle of a scanline, you get all even fields, if you always start vblank signalling at the end of a scanline, you get all odd fields.
There's also two halves to this, sure the TV itself might not be made up of clean square pixels like an LCD. But the source image that's being sent to it absolutely does have a discrete horizontal and vertical resolution in square/rectangular pixels.
For more on that: https://www.youtube.com/watch?v=Ea6tw-gulnQ