Well, but you missed that data can not be executed.
Unless you store your MP3s and jpegs in .text, the memory pages all that stuff is in are marked not executable and will only cause a crash if you jump to it. Regardless of whether the bytes make useful instructions.
I didn't downvote it, but I feel like your comments have been technically correct but practically misleading.
It's possible to have executable data, but if you do, you generally have bigger problems: the exploit can simply write a complete first-stage into the data and execute it directly, and not bother going through return-oriented contortions.
The reality is that gadget harvesting is about analyzing program text --- actual binary machine instructions --- not about looking for ways to interpret JPEGs or MP3s or (I wrote DOCX and then PDF and then thought "huh bad examples") RTF files as instruction streams.
It's also true that you can exploit insane x86 encoding to synthesize unintended instructions, but that's (I think) less important than the simpler idea of taking whole assembled programs, harvesting very small subsequences, wiring them together with a forged series of stack frames, and achieving general computation.
Right. But in practice ROP targets the executable portions - any and all of them. If someone leaves something executable that they shouldn't, it'll use that. If only code is left executable, it's still often able to use that.
Remember, x86 can be parsed differently depending on offset. You jump into the middle of a multibyte instruction you get an entirely different instruction stream. And x86 doesn't have any real protection against that.
Unless you store your MP3s and jpegs in .text, the memory pages all that stuff is in are marked not executable and will only cause a crash if you jump to it. Regardless of whether the bytes make useful instructions.