Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is a really nice paper which asks some critical questions for the future of NAS research.

However, it's important to note that this paper doesn't show that NAS algorithms as originally designed, with completely independent training of each proposed architecture, are equivalent to random search. Rather, it shows that weight sharing, a technique introduced by ENAS [1] which tries to minimize necessary compute by training multiple models simultaneously with shared weights, doesn't outperform random baselines. Intuitively, this makes sense: weight sharing dramatically reduces the number of independent evaluations, and thereby leads to far less signal for the controller, which proposes architectures.

The paper itself makes this fairly clear, but I think it's easy to misinterpret this distinction from the abstract.

[1] ] H. Pham, M. Y. Guan, B. Zoph, Q. V. Le, and J. Dean. Efficient neural architecture search via parameter sharing. ICML, 2018.



Thank you for pointing that out! I did misinterpret that part of the abstract!

On a perhaps related note, it seem a bit surprising to me because when I first started with neural networks about a year ago, I tried to shortcut hyperparameter search by reusing weights and noticed that independently trained models with the same hyperparameters would produce model with different performance. I naively assumed that such correlation is something I don't want and it was something that everyone knows about so I just moved on.

Edit: typo (pointed --> pointing)


I've noticed this too. I've got a paper coming up on the arxiv soon that discusses this phenomenon, and structured random architecture search, in the context of semantic segmentation networks.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: