Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm a software engineer working with scientist-turned-programmers, and what I've experienced is also exactly the opposite of the author. The code written by the physicists, geoscientists and data scientists I work with often suffers from the following issues:

* "Big ball of mud" design [0]: No thought given to how the software should be architected or what the entities that comprise the design space of the problem are and how they fit together. The symptoms of this lack of thinking are obvious: multi-thousand-line swiss-army-knife functions, blocks of code repeated in dozens of places with minor variations, and a total lack of composability of any components. This kind of software design (or lack of design, really) ends up causing a serious hit to productivity because it's often useless outside of the narrow problem it was written to solve and because it's exceedingly hard to maintain or add new features to.

* Lack of tests: some of this is that the scientist-turned-programmer doesn't want to "waste time" writing tests, but more often it's that they don't know _how_ to write good tests. Or they have designed the code in such a way (see above) that it's really hard to test. In any case--unsurprisingly--their code tends to be buggy.

* Lack of familiarity with common data structures and algorithms: this often results in overly-complicated brute-force solutions to problems being used when they needn't have and in sub-par performance.

This quote from the author stood out to me:

> I claim to have repented, mostly. I try rather hard to keep things boringly simple.

...because it's really odd to me. Writing code that is as simple as it can be is precisely what good programmers do! But in order to get to the simplest possible solution to a non-trivial problem you need to think hard about the design of the code and ensure that the abstractions you implement are the right ones for the problem space. Following the "unix philosophy" of building small, simple components that each do one thing well but are highly composable is undoubtedly the more "boringly simple" approach in terms of the final result, but it's a harder to do (in the sense that it may take more though and more experience) than diving into the problem without thinking and cranking out a big ball of mud. Similarly reaching for the correct data structure or algorithm often results in a massively simpler solution to your problem, but you have to know about it or be willing to research the problem a bit to find it.

The author did at least try to support his thesis with examples of "bad things software engineers do", but a lot of them seem like things that--in almost every organization I've worked at in the last ten years--would definitely be looked down on/would not pass code review. Or are things ("A forest of near-identical names along the lines of DriverController, ControllerManager, DriverManager, ManagerController, controlDriver") that are narrowly tailored to a specific language at a specific window in time.

> they care too much about the quality of their work and not enough about getting shit done.

I think the appearance of "I'm just getting shit done" is often a superficial one, because it doesn't factor in the real costs: other scientists and engineers can't use their solutions because they're not designed in a way that makes them work in any other setting than the narrow one they were solving for. Or other scientists and engineers have trouble using the person's solutions because they are hard to understand and badly-documented. Or other scientists and engineers spend time going back and fixing the person's solutions later because they are buggy or slow. The mindset of "let's just get shit done and crank this out as fast as we can" might be fine in a research setting where, once you've solved the problem, you can abandon it and move on to the next thing. But in a commercial setting (i.e. at a company that builds and maintains software critical for the organization to function) this mindset often starts to impose greater and greater maintenance costs over time.

[0] https://en.wikipedia.org/wiki/Anti-pattern#Big_ball_of_mud



> Lack of familiarity with common data structures and algorithms

This part I 100% agree with. I adapt a lot of scientific code as my day-to-day and most of the issues in them tend to be making things 100x slower than they need to be and then even implementing insane approximations to "fix" the speed issue instead of actually fixing it

>"Big ball of mud" design

Funny enough this was explicitly how my PI at my current job wants to implement software. In his opinion the biggest roadblock in scientific software is actually convincing scientists to use the software. And what scientists want is a big ball of mud which they can iterate on easily and basically requires no installation. In his opinion a giant Python file with a requirement.txt file and a Python version is all you need. I find the attitude interesting. For the record he is a software engineer turned scientist, not the other way around, but our mutual hatred for Conda makes me wonder if he is onto something ...

>I think the appearance of "I'm just getting shit done" is often a superficial one, because it doesn't factor in the real costs: other scientists and engineers can't use their solutions because they're not designed in a way that makes them work in any other setting than the narrow one they were solving for.

For the record my experience is the exact opposite. The crazy trash software probably written in Python that is produced by scientists are often the ones more easily iterated on and used by other scientists. The software scientists and researchers can't use are the over-engineered stuff written in a language they don't know (e.g. Scala or Rust) that requires them to install a hundred things before they are able to use it.


> The mindset … might be fine in a research setting

A vast amount of software is written for research papers that would be useful to people other than the paper’s authors. A lot of software that is in common use by commercial teams started off in academia.

One of the major issues I see is the lack of maintenance of this software, especially given all the problems written in your post and the one above. If the software is a big ball of mud, good luck to anyone trying to come in and make a modification for their similar research paper, or commercial application.

I don’t know the answer to this, but I think additional funding to biology labs to have something like a software developer who is devoted to making sure their lab’s software follows reasonably close to software development best practices would be a great start. If it’s a full time position where they’d likely stick around for many years, some of the maintenance issues would resolve themselves, too. This software-minded person at a lab would still be there even after the biology researchers have moved on elsewhere, and this software developer could answer questions from other people interested about code written years ago.


This is the goal of the RSE field, but it's often still quite rare :(

https://us-rse.org/


That's fantastic, I haven't heard of this group before! I wish there was a lot more effort spent here.

This seems like a much better way to spend one's software development time and experience than, say, ad-tech... at least in my humble opinion :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: