返回列表 发帖

Excel - fitting distribution

I was asked to "fit a data series" into distribution in excel. I seriously have no idea what this actually mean. I googled it and found some shareware but none explain what exactly does it mean. Can someone help?

Thank you in advance.

What are you trying to accomplish with this info? Do they just want you to build a histogram?

TOP

Usually it involves trying to fit a histogram of a distribution to some kind of theoretical distribution. So, for example, if it were a normal distribution, you'd be looking for the mean and standard deviation of a normal distribution that makes for a best fit between the theoretical and the actual data. If it's another distribution, there are different parameters, and you want to try to find the ones that make for the best fit.

You may have to transform the data before you do that, by the way. If you have levels data for stocks, you might need to turn that into returns to do the fitting. Or logged returns.

How you measure the fit is another challenge. The statistically purest way would be to run up a chi squared test and check for (lack of) significance. A more quick-and-dirty method might be to compute the difference between expected frequencies and observed frequencies for each bin of the histogram and try to minimize the sum of squares of the differences (or sum of absolute values of differences). You could then have solver try to solve for the parameters of your distribution while trying to minimize the sum-of-squares.

It's all a bit challenging if you don't know what distribution you are trying to fit.

TOP

Bchad's approach will work nicely if this is a one time thing, if you're going to be doing this regularly, you can buy excel add-ins that will do this with the click of a button. @Risk is a great program that has this capability in addition to hundreds of others. It's wildly expensive, but very powerful. You might be able to download a demo.

TOP

Remember to delete those pieces of data preventing a smooth distribution fit.

TOP

Inner Evil Voice Wrote:
-------------------------------------------------------
> Remember to delete those pieces of data preventing
> a smooth distribution fit.

I like to call this removing outliars, as in "I removed the outliars because they cannot be trusted. What's that, you disagree? If you're in cahoots with the outliars you must be a liar yourself, shut up."

TOP

Inner Evil Voice Wrote:
-------------------------------------------------------
> Remember to delete those pieces of data preventing
> a smooth distribution fit.


I now understand your username much better. ;-)

TOP

Hahaha, thanks guys for the prompt and insightful feedback.

My task is that I have a return series and I'm not sure what distribution it is so without jumping into conclusion that it's gaussian, I like to find another way to test it.

jmh530, you're correct that we don't want to pay anything. I work in a non-for-profit organisation so any cost saving is good saving. I used R briefly to do some cluster analysis but never looked into it more. I'll try that.

I'll try Bchad's approach shortly and yes...I'll "remove the outliers"...hahah

Much appreciated.

TOP

Then a first step would be to ahead and look at the skew and kurtosis figures. If it's normally distributed, it should have 0 skew and a kurtosis of about 3. Depending how different these figures are from that, it may or may not be necessary to go through the trouble of figuring out another distribution. For zero skew and high kurtosis, some people use a t-distribution with a smaller number of degrees of freedom.

Part of the problem is that it's often difficult to figure out what distribution you ought to be using. Usually it's the underlying model of probability that determines what distribution you expect... just throwing in a bunch of distributions and seeing which one fits best is effectively data-mining and often just as problematic as assuming something is normally distributed. Which distribution to use generally comes out of your theoretical understanding of what's going on - the parameters are what you determine from empirical sources.

Remember that if you are doing stock returns, you'll want to be looking at ln(total return) when you are doing the fit.

TOP

Thanks BChad. Questions... elementary questions.

1. Why ln(total return)?
2. my first set of data is inflation and inflation bond return. Then it'll move on to something else. Both series does not have large kurtosis nor skewness so I think you're right that normal is just as good as any others?

TOP

返回列表