Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

So if these are remotely real... And purely as a user of chatgpt not as an ai/ml/nn person... Don't instructions like this weaken the strength of output? Even when request doesn't directly conflict, there are probably myriad valid use cases when instructions will weakly contradict the request. Plus, doesn't it inject inaccuracy into the chain - e.g. it's assuming model confidently knows which artists are 100yo etc. What happens if there are artists where it's not clear or sources differ etc. And by the end, instructions seem nebulously complex and advanced. It feels like it's using so much of "AI juice" just to satisfy those! Somebody else here referenced Asimov laws of robotics which I never felt would be applied in such form, so I am in state of wondrous amusement that is actually how we program our AI, with seemingly similar issues and success :-)

Am I way off base?



If this is anything like stable diffusion, this will help dramatically in 99% of cases without interfering.

Some of these rules are protecting OpenAI from liability (don’t do X,y,X).

Things like clarifying gender are going to be helpful in most cases. That can likely still be easily overcome with some prompt hacking.

Ultimately, this is targeted at getting good results for the masses without having to spend a bunch of time tweaking positive and negative prompts.


The instructions don't clarify gender, they are actually contradictory and likely to be confusing. GPT is being told to make "choices grounded in reality" followed by the example "all of a given OCCUPATION should not be of the same gender or race". But many occupations are strongly dominated by one gender or another in reality, so the instruction is contradicting itself. Clearly the model struggles with this because they try repeating it several times in different ways (unless that's being interpolated by the model itself).

You've also got instructions like "make choices that may be insightful or unique sometimes" which is so vague as to be meaningless.

> this is targeted at getting good results for the masses

No it's not, it's pretty clearly aimed at avoiding upsetting artists, celebrities and woke activists. Very little in these instructions is about improving quality for the end user.


I find that in many cases the most recent things get more attention than other things.

e.g. for the following two approaches

1. intro, instruction, large body of text to work on

2. intro, large body of text to work on, instruction

I find that the second method gets desirable output far more consistently. It could be this would then mean if there are conflicting instructions, the second instruction will simply over-ride the first. This general behavior is also how prompt injection style jailbreaks like DAN work. You're using later contradictory instruction to bring about behavior explicitly forbidden.


No comment on the substance of the post, but from what I can tell it is actually the complete opposite of the three laws (at least how they operated pre-robot series, in Asimov's short stories). Perhaps that is what you meant?

Regardless, in the early stories, robots could not lie to us. It was indelibly programmed into the positronic brain. They would destroy themselves if put in a position where the three laws were violated.

Anyways, if that were possible with current LLMs I would think the hallucination problem would have been trivially addressed: just program in that the LLM can't tell a lie.


I think they get away with it here because the task they are asking it to do is not very difficult. Dalle3 is doing the actual generation, this is just doing some preprocessing.

>What happens if there are artists where it's not clear or sources differ etc.

I would imagine that if an artist was so niche that gpt-4 doesn't know if they died 100 years ago then it probably doesn't matter much if you copy them, and people won't ask for it much anyway.


This is one of the tradeoffs made to make the outputs safer. One of the ideas floating around is that some of the open source models are better simply because they don't undergo the same alignment / safety tuning as the large models by industry labs. It'll be interesting to see how LLMs improve because safety is a requirement but how can it be accomplished without reducing performance.


To avoid the alignment tax, maybe the system could be broken into 3:

1. Aligned model to check the prompt. It could provide feedback/dumber output for obviously unsafe prompts

2. Unaligned model for the common path.

3. Aligned model to check safety of the output. Tweaks or stops output.

For the common path, the prompt text goes to the unaligned model without modification, and the output goes to the user without modification.

The slither models could just be safe versions of the unaligned model.

This, of course, is at least 3x expensive.


AI cannot hurt you so "safety" just isn't the right word to use here. Nothing about this system prompt is concerned with safety, and it would clearly be better for the end users to just scrap the whole thing giving users direct access to DALL-E 3 without GPT sitting in the middle as a censor.

Now would such a thing be "safe" in legal terms, in the US justice system? Would it be "safe" for some of the employee's social lives? Maybe not, but, safety isn't the right word to use for those concerns.


I think those things are true and the "used a lot of AI juice" may be one reason that you can't combine DALLE with other modes.

But also, it's probably worthwhile from OpenAI's perspective to try to avoid the animosity of artists.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: