Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Why average retention rates can lead to 50% error in CLV (custora.com)
49 points by pospischil on Aug 22, 2011 | hide | past | favorite | 31 comments


And if you think that sucks, wait until you either a) use CLV to justify marketing spend or b) use CLV to justify your valuation with an investor. Oof, self-inflicted damage.

One particular solution is to stop reporting CLV as a single number. I mean, if you happen to know that there are two disjoint sets of Good Customers and Bad Customers than that is very useful information to anyone who needs to know the CLV number to do their job. "What can we afford to spend to acquire a new customer? I have a new channel I want to try." "It really depends on whether you're getting Good Customers or Bad Customers. We can only pay $50 for BCs, but for GCs we can go $200+."

You then get into issues like "How do I tell the difference between a Bad Customer and a Good Customer at an equivalent vintage where neither have churned yet?" It may be the case that there are behaviors which you can use as a proxy for which group someone is likely to fall in. Dharmesh Shah talks often about a Customer Happiness Index that Hubspot uses, which is essentially a regression that predicts churn rate based on measurable customer behavior. It's all sorts of win to find that something like that works for your business. (Hypothetical example: if Dropbox found that customers who used photo sharing are the best possible Dropbox customers, it would make sense to test things like a) biasing marketing to target photo sharers or b) bias product design to push photo sharing as a feature.)


Absolutely - the next step is to calculate the CLV of different acquisition channels. Maybe Organic search results in a favorable mix of good vs mediocre vs bad customers, whereas affiliate marketing results in a poor mix.

You can use this information to help inform a new channel decision (a new paid search channel will likely be more similar to another paid search channel then it will be to an affiliate program).

Behavioral triggers (which Custora uses) get more complicated -- but maybe we'll touch on that in a future post.


Cohort analysis and other types of user segmentation can be useful in evaluating the quality of customers from different marketing channels, but you will still run into issues with averages, as discussed in the original article.


A proposed solution, for the critique of any mathematicians here:

Always split each channel into ten 10-percentile bands, instead of just taking an overall average. Then model total expected revenue per channel based on these bands.

Perhaps as few as three 33-percentile or five 20-percentile bands would be enough if the number of samples is low when first testing a channel.


Perhaps, if you are still fresh in integration and measure theory, you can go for a continuous solution, that doesn't require a choice of bands.


You don't have access to the "real" continuous distribution, though, but only a finite sample of points from it. Most non-parametric ways of modeling that are going to require some choice of smoothing parameter, either something like a histogram bin width, or a kernel-regression bandwidth (in either case, you can use the data to choose one, using cross-validation).


How about working with the distribution function instead of densities? That way you still have the sampling, but you don't have to decide on a bin size.


Also, for most SaaS apps, your first month churn is typically WAY higher than your second, third and fourth. You need to simulate in order to find the true LTV, also, be sure to include the fact that ARPU typically goes up over time as people upgrade their accounts, so account for that too.


Indeed! That is why it is so important to include heterogeneity of the customer base in calculations. You tend to lose the flighty customers early so retention rates increase the longer a customer has been using the service.


For anyone interested in some further reading: http://marketing.wharton.upenn.edu/documents/research/Schwei...

I included this technique in some work I did about a year ago, spoke a few times with the authors. Granted, this is geared towards operations that offer multiple services/varying levels of service as part of a "portfolio". Think telecom and finance. It addresses CLV, but includes a regular reassessment of value based on a customers pattern of behavior.

tldr; Instead of being service/no service it looks at a customer's propensity to change service (upgrade or downgrade), this includes downgrading to the point of termination.


I've always struggled with CLV calculations–both figuring it out and deciding if it's even valuable to know.

We have customers in the thousands and, while a handful leave every month, many have been paying monthly for 2, 3 or more years. So, doesn't the CLV for a business that has been around for 5+ years change constantly? What good is that?

Also, we have a setup fee for some products. So, once we keep a customer for 30 days (i.e. they're no longer eligible for a refund), they are worth at least the amount of the setup fee, right?

BUT, if my CLV is $1,500 for a product with a setup fee, does that mean it's okay for me to spend $1,200 to acquire 1 new customer? I have a hard time believing that ...


Sounds like you have some fantastic customers/a great product!

It's possible (and likely) that your CLV is changing over time as the mix of customers you are getting is changing - the key is to be able to calculate CLV as early as possible while still getting an accurate number.

If you are going to earn $1500 in profit from a customer, why wouldn't you be willing to spend $1200 to acquire more? The issue you may run into is that, if your $1500 customers found you organically, you can't necessarily expect customers you acquire via different means to be worth the same.

There are some other variables at play as well: are you cash constrained? How soon do you need your acquisition expense to be paid back?


This article stops short of giving the one piece that we need to take action on this: The formula.

So, given a database full of customer records, all of which have a signup date, some of which have a cancel date, what is the formula one would use to determine the average expected "lifetime" of a customer.

I suspect that the author doesn't want to give us this formula, as it's part of the secret sauce that the company behind the blog post sells. Still, maybe if we ask nicely enough we can guilt it out of him?


count(1) as cnt, age from ( select FN(cancled_at, now) as age from accounts where account.signup_date is between x and y ) group by age;

where FN is a function that takes two dates and gives you an integer unit (usually months.) this will give you the distribution of average account lengths for people that started between x and y. note that x-y should be 1 unit in FN(canceled_at, now), or 1 month (where a month is 30 days.)

Then what you can do is either simulate it, or (look at the data), see where you lumps lie so you can do some better back-of-the-napkin LTV calculations...

of course, this does not take into account upgrades or different account levels, etc..


I'd bet it's different for every company, so learning his formula would not really help. Learning the variables that he generates from his database and uses as inputs to his modeling would, on the other hand, be helful.


No, it's just statistics.

Sure, the reasons why people are leaving might be different, and their distribution of lifetimes might be the way they are because of business reasons. But the math to determine that distribution will be the same.

I actually went so far as to ask this question over at the Statistics stackexchange site. Being mathematicians, they debated it briefly, then concluded that it was possible to determine, which, to a mathematician, is the same thing as a solution.

I'm still hoping somebody who's done the calculation will share the magic SQL query.


Maybe we're talking past each other. I'm saying that the specific model for one site won't work for another site. But the procedure will work just fine, because the procedure is just regression or any of its cousins.


As mentioned below, it is possible to determine the customer distribution. We employ a few methods to accomplish this. Stay tuned for more content soon.


I have never looked in CLV, but I would

1) Set different account lifetime intervals (0-3 months, 3-6 months, 9-12 months, etc...).

2) Determine the percentage of customers we have in each intervals

3) Calculate the Interval Lifetime Value.

4) Now instead of having one CLV, I can say "For every 20 new sign ups we know 3 will stay for 3 months and we will break even on those, 50 will stay for one year and we will earn $1000 total from those, and the rest will stay for x months and we will earn $1500 from those...."


I understand why CLV is an interesting number. Maybe it is only really useful to know that you can spend money attract profitable customers. I used to think my business is awesome because I largely get repeat and referral business. When I started thinking about CLV I started to understand that I had no real formula for acquiring new customers. It's important to show you can take active steps to bring in new customers.


It looks to me that those calculations are only correct if you ignore future customers. Only your current customers are used for the calculation.

CLV assumes that everything continues as it is today, including gain and loss rates of customers. If you stop gaining new customers, then yes, it's going to be WAY out.


Customer Lifetime Value is a measure of the value per customer. So if acquisition rates are changing the value of the /customer base/ will change, but the CLV will not.


Future customers obviously affect revenue, but how do they affect CLV?


Future customers would be acquired via different channels than current customers, and therefore they'd have a different CLV (depending on the channel mix).


I stand corrected. Thanks guys.


Building a simple monte carlo or markov chain monte carlo simulation of attrition and upgrades to predict LTV, while their analysis of why average is not a great way to calculate, I'd be interested in reading about better alternatives.


I was baffled by the site for a moment giving me a blank page. Apparently css sets the opacity of the main area to 0 and javascript has to come in to override it.


Me too.

Text only Google cache so noscript users can read it:

http://webcache.googleusercontent.com/search?q=cache:http://...


The post title should really expand the acronym "CLV" since the title makes no sense unless you already know what CLV is.


If you're using CLV to determine the cap on the cost of acquisition then the average is important unless you know what kind of customer you're buying in advance.


Well, yes: average CLV is important for acquisition, but using an average retention rate to calculate CLV will lead to a grossly inaccurate CLV calculation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: