I think I’m obsessed with numbers. They give me a feeling of control. Page views, trends, visitor count and more. Not measuring things, makes me sad. If we have a historical data, we can check if our changes worked. Are certain topics more popular? Which stories are more popular on Twitter compared to Facebook? Infinite amount of questions.
I’d like to build a better analytics engine for us. I’ll explain my constraints, and how I’d approach it. Primary plan is that someone can say – “just use X”. If that fails, I can still build it.
Problem definition
We have multiple WordPress installations with overlapping authors. They create blog posts, that are shared to Facebook and Twitter. Each post can include multiple embeds, that they produce – Sound Cloud, YouTube and similar. For each blog, we can also lookup into Google Analytics API and get stats on sessions, page views and time on site.
There are two primary limitations of these external data sources. Firstly, we’re rate limited – so we can only query them about once a day – per post URL. Secondly, we mostly get aggregated data.
We would query these external API’s about once a day. The only limitation we have is that we get aggregate/sum data from them. Facebook only gives us total number of likes, so we need to make subtract previous value. This way we get number of likes in that day.
Potential
Having all the information in one place, it would allow us a couple of things:
- Weekly reports for authors – sending them encouragements on how their stories did
- Information for content editors, what got most attention that week
- Identify old content that suddenly got interest
- Get information on success of embedded content (Sound Cloud, YouTube)
- Develop customised indicators – authors with most viewed YouTube videos
Potential Solution?
When researching this topic, there is a software stack that almost fits. It’s LogStash with Kibana. LogStash provides data storage and logging capabilities. Kibana support display of data in many different ways.
The other approach would be to just code it any web framework. But it seems like a huge duplication of work.
Technical Questions
Would ELK stack work? Can LogStash provide input filter that will automatically normalise data for me? It is the right technology stack at all?
Is there anything that solves this in a much better way?
Content Questions
Is it worth building this at all? Is it a good idea to attach numbers to (journalists) work? Did I miss any questions that would be worth exploring?
Would you use such a service?