More

christopheraden · on Oct 18, 2019

The Ashley Madison Breach comes to mind. If the core demographic cares about not wanting their data on the platform to get out, they will vote with their feet. That said, I think this example is not the norm, and most people probably won't care for most applications. https://en.wikipedia.org/wiki/Ashley_Madison_data_breach

0_gravitas · on Oct 18, 2019

I mean wasn't it also revealed that an overwhelming amount of people on the site were male in the same breach? Id guess that would be a motivating factor more than the privacy

christopheraden · on Sept 25, 2019

> Have fun explaining to your spouse why your household's TV is showing more dating site ads than that of their friends.

The targeting is sometimes only _so_ good. While it works in aggregate well, sometimes there are laughable targetings, and it also depends on what audiences the advertiser wants to hit.

I get some pretty strange targeted ads through Facebook that don't seem at all relevant, and the "Why Am I Seeing This Ad" dropdown has very nebulous explanations ("Targeting Men between 25-35 in San Francisco"). I would imagine this problem to be similar with targeted ads on Live TV unless they keep the ads pretty generic or have vastly better targeting than FB or Google.

fyfy18 · on Sept 25, 2019

For some reason recently I've been seeing a lot of datings ads on YouTube. Not that I've been searching for anything like that. First it was Christian dating services (I'm not Christian), but clearly that didn't spark my interest so now they've starting showing me ads for Muslim dating services (I'm also not Muslim). I'm excited to see what's next :-)

tempWinHater · on Sept 25, 2019

Are you at least religious? Seems like a very specific subset of dating services.

paggle · on Sept 25, 2019

Hopefully not Shaker dating services.

m463 · on Sept 25, 2019

Remember targeting doesn't apply to the viewer, it applies to the advertiser.

In other words, they are targeting advertiser dollars not viewer interests.

christopheraden · on Dec 14, 2018

> I feel like this argument should be a class of fallacy. Lots of things have "existed" for a long time, but that doesn't mean previous iterations were effective.

This is pretty close to Survivorship Bias (https://en.wikipedia.org/wiki/Survivorship_bias). Outcomes of midwifery of yore is no different than present midwifery if you ignore the fatalities and only focus on successful deliveries.

christopheraden · on May 23, 2017

While it is a policy issue at its core, changing law and policy moves at a glacial pace, and it's not even a certainty that it'll get changed at all (the "nothing to hide" defense gets brought up a lot on these matters, and it's a pretty persuasive argument to those that can't recognize the fallacy). Technical solutions have the benefit of being much quicker to enact, albeit in a flawed way that is, if highly successful, a band-aid on a bullet wound.

christopheraden · on May 2, 2017

Sure, using hypothesis tests could pick out some of the structured examples in the Datasaurus, but in practice, things are often more subtle. Goodness of Fit tests to check for normality, in particular, are a little bit thorny, lacking power in small sample sizes, and rejecting normality for slight departures in higher sample sizes. My experience has been with assumption checking that by the time a hypothesis test has sufficient evidence to reject an assumption, you'd usually be able to see it visually.

Until you get into high dimensions, it probably doesn't hurt too much to visualize the data. Additionally, it can be helpful to understand what signal has been left in the residuals (ex: you fit a linear model, but failed to include a quadratic term), which is something hypothesis tests aren't as good at telling you.

eggie5 · on May 2, 2017

Yes, "lacking power in small samples sizes"

christopheraden · on May 8, 2016

A couple questions.

> OLS works fine in classification problems. And it has advantages.

Do you have more explanation of these advantages? I read through the link you sent, and a bit more about linear probability models. Such things were never discussed in my statistics curriculum (BS, MS, PhD), except for motivating why logistic regression was necessary. I'm not sure I understand the economist's arguments in favor of LPM. Both the interpretation and the distribution of the test statistics will be totally different with OLS versus Logistic Regression, and the overall probability of a defunct project is pretty small ( \hat{P(y=0)} = .07 )--enough where there would be pretty big differences. To be clear, my reservation is with the p-values in the OLS model, not the predictions it generates. While the models agree on the direction of the covariates, the magnitudes are quite different, even when you convert logit/probit to be on the same scale as LPM.

> Multicollinearity refers to perfect multicollinearity.

Perfect multicollinearity will definitely mess up the estimation, but even if Score and Comments are not perfectly collinear, it's difficult to talk about each one's effect on the probability individually, as is the interpretation of coefficients in a (logistic) regression. What does the VIFs look like for Score and Comments, in particular?

> I mention R² as one measure of predictive power.

But the outcome is binary, so you'll have a similar issue as Minimaxir's first point about OLS. If you wanted to talk about prediction accuracy, what about a confusion matrix, misclassification rate, or specificity/sensitivity/F1? Granted, you'll not want to predict on the same tagged examples that you trained the model on, but maybe you could split it 80-20? Or tag another 20-50? There are also R²-like measures you can use when the dependent variable is binary (a whole class of pseudo-R² measures).

I would be curious to see the relationship between these predictors and the response. It's usually been my experience that linearity is a strong assumption to make, and that I'd expect for something like comments or score that once it reached a certain threshold, there was no extra value added by getting more comments/score. Are the log-score and log-comments linear over their entire support?

christopheraden · on Oct 29, 2015

What about something ala Neovim? I've only ever looked at Tex from the perspective of a user (I don't program my own macros too often), so I don't know how hard it'd be, but why not a language overhaul?

wtbob · on Oct 29, 2015

The issue is that so many man-centuries (man-millennia?) are invested into TeX & LaTeX that sinking time into something else is very, very expensive (much like trying to build a better CPU than amd64 or arm64).

TeX is really, really amazingly powerful. It can do almost anything a typesetter could want to do, fairly easily, and it can do just about everything, one way or another. And its output is heart-achingly beautiful. Sadly, the code necessary to achieve that output ranges from…heart-achingly beautiful to heart-breakingly ugly.

There are other projects out there, of course. I do think that TeX & LaTeX are close to a local maximum, if not al the way there.

XML, in comparison, is a booger joke.

dschep · on Oct 29, 2015

> I do think that TeX & LaTeX are close to a local maximum, if not al the way there.

Funny you say that since TeX's version scheme (in part) is that it approaches pi. And IIRC will be pi upon Knuth's death.

jandrese · on Oct 29, 2015

XML isn't even really in the same field as TeX. TeX is for typesetting, that's all it does. XML is for encoding general data. It's like saying PDFs are way better than protocol buffers.

TeX is incredibly powerful, but it's also incredibly idiosyncratic. I also personally think Computer Modern is an ugly font.

leephillips · on Oct 29, 2015

TeX has long ago gained the power to use any OpenType/TrueType font on your system, so dislike of CM is really not a reason to avoid using it. (I happen to like CM, but I know it's disliked by multitudes.)

wtbob · on Oct 30, 2015

> XML is for encoding general data.

No, it's really not. XML is a markup language, not a data-encoding language. JSON, S-expressions, ASN.1 &c. are all data encodings; TeX, LaTeX, HTML and XML are markup languages.

> TeX is incredibly powerful, but it's also incredibly idiosyncratic.

Agreed.

> I also personally think Computer Modern is an ugly font.

Eh, it's not great on-screen, but it looks pretty good on paper. But TeX & LaTeX have supported multiple fonts since the beginning.

goalieca · on Oct 29, 2015

> XML, in comparison, is a booger joke.

A dried up booger on the floor.

CarVac · on Oct 29, 2015

The trouble is the ecosystem of packages that people would have to redo.

If you replaced TeX, LaTeX itself would need replacing, as would every single LaTeX package and class that you ever want to use, ones like microtype and stuff, would have to be rewritten.

To be honest, the multi-pass deal isn't that bad, but the macro expansion system is crazy complicated. Every once in a while after working in LaTeX I'll get the feeling I understand it, but that feeling inevitably dissipates after ten minutes or so.

e12e · on Oct 29, 2015

Very much this. As far as I can gather TeX is actually rather simple - considering that it basically does what PostScript does. Now, while coding raw PostScript might be fun as an exercise, most would prefer not to. LaTeX lands somewhere between PostScript, and something higher up. To meaningfully replace Tex/LaTeX/Metafont and even just a selection of "the best" LaTeX packages... would be a herculean task.

Making a "new" TeX probably wouldn't be that hard - but it's also something that wouldn't be that useful. I would very much like something that's both simpler and also keeps some of the lessons learned/implemented (word spacing/splitting, page layout, page breaks etc).

As for other "tools in the same space", I do like pandoc a lot. I want to like python's ReST (Re-Structured Text) - but that's a package I feel is in need of a rewrite/redesign. Many good ideas there - but figuring out how to take a simple document and produce simple, modern (preferably somewhat semantic) html for example -- or to produce a decent looking PDF without needing all of LaTeX/Texlive on hand isn't easy.

Rewriting ReST tools would be a lot of work, but I think if one didn't try for 100% backwards (output, plugin) compatibility it might be worthwhile.

The astute reader will notice that ReST/Pandoc deals with structured documents, and not really layout for paper/screen (both use TeX/LaTeX as an output target/pipeline). I don't know of anything that comes close to TeX/LaTeX for "rasterized" output.

On the other hand, I also don't know of any package/combination that'll make TeX/LaTeX produce anything but messy, 90s-style html -- that generally looks awful. Even if you were to try and force a modern set of CSS down over the resulting mess. If anyone knows of a modern hypertext package for TeX/LaTeX or some similar tool, I'd be happy to be proven wrong.

dragonwriter · on Oct 29, 2015

> I don't know of anything that comes close to TeX/LaTeX for "rasterized" output.

XSL:FO at one point seemed to have aspirations in that direction...

_delirium · on Oct 29, 2015

XeTeX and LuaTeX are projects you might be interested in.

christopheraden · on Oct 14, 2015

I use SAS professionally at my job, and R in all my academic/hobby work. R has a couple packages that give similar functionality as PROC SQL (about 95% of my SAS workflow, since it's far nicer than data steps for a lot of things). There's an ODBC package (RODBC), as well as SQLDF, which allows you to use SQL queries to manipulate data frames in R.

While there is (almost?) always a way to do a SQL query using idiomatic R, I have to admit that sometimes my brain thinks up a solution in SQL faster (a product of upbringing).

christopheraden · on Sept 16, 2015

>What kind of negative effect might result from a bunch of unqualified high school teachers teaching CS poorly? Is some exposure better than none regardless of teaching quality?

Is this problem similar enough to math that we can draw on data about mathematics education in schools? You can make a lot more money in industry with a math degree than you can teaching math to grade schoolers.

Certainly, we hear tons of stories about unqualified math and science teachers, and all the harm they do to a desire to learn math, but I'm sure some students get exposure to math that wouldn't normally have much desire to immerse themselves in it.

christopheraden · on Sept 15, 2015

I agree it's a barrier to have such crazy prices, but there are free resources available, especially on a topic as popular as Grammar of Graphics. Hadley Wickham (in my mind, synonymous with the concept, since he implemented Wilkinson's ideas in R's ggplot package), for instance, has numerous materials on it, including a short primer (http://vita.had.co.nz/papers/layered-grammar.pdf). It might not be as exhaustive as the Wilkinson text, but surely there's enough material out there to implement GG in JS, especially considering there's successful implementations in other languages?