Analytics Data on SQL Database – Best database and table design for billions of rows of data

This is not an article that I am writing but I’m mostly quoting a great gem on a stack overflow answer I came across when I was researching a DIY way to store and create analytics reports for a small to medium size project. The project’s type doesn’t matter because this is a generic problem and great solution.

Why not use analytics tools/services?

I am in constant search of the better alternatives or simpler versions of the solutions we use at my team. We certainly use many services and tools from open source to licensed software. But I still choose to understand, know and be able to apply these solutions by myself on a custom solution where I have full control over the data, output and user experience.

So I casually read and research how others approach the issues or queries wander in my mind.

Then I stumbled upon this stack overflow thread with a brilliant answer that contains steps to try out from scratch that I suggest any engineer to just try and play on their own time.

PostgreSQL and BRIN indexes

To create a sample table with 1.7 billion rows of a sample sensor data (temperature read from the sensor with timestamps in the logs):

So it took 22min to create the table. Largely, because the table is a modest 97GB. Next, we create the indexes,

It took a good long while to create the indexes too. Though because they’re BRIN they’re only 2-3 MB and they store easily in ram. Reading 96 GB isn’t instantaneous, but it’s not a real problem for my laptop at your workload.

Now we query it.

Update with timestamps

Here we generate a table with different timestamps in order to satisfy the request to index and search on a timestamp column, creation takes a bit longer because  to_timestamp(int) is substantially more slow than  now() (which is cached for the transaction)

Now we can run a query on a timestamp value instead,,

Result:

So in 83.321 ms we can aggregate 86,401 records in a table with 1.7 Billion rows. That should be reasonable.

Hour ending

Calculating the hour ending is pretty easy too, truncate the timestamps down and then simply add an hour.

It’s important to note, that it’s not using an index on the aggregation, though it could. If that’s your typical query you probably want a BRIN on  date_trunc('hour', tsin) therein lies a small problem in that  date_trunc is not immutable so you’d have to first wrap it to make it so.

Partitioning

Another important point of information on PostgreSQL is that PG 10 bring partitioning DDL. So you can, for instance, easily create partitions for every year. Breaking down your modest database into minor ones that are tiny. In doing so, you should be able to use and maintain btree indexes rather than BRIN which would be even faster.

This is a great answer to the topic around working with analytics data on SQL databases. Finally thinking about table partitioning is always a good plan-ahead strategy for any data gets over millions and have distributed data around timestamps.

Reference: Best database and table design for billions of rows of data

Integration and verification of iOS In-App Purchases

 Economy models in iOS apps use In-App purchases become very popular. Lots of developers pick iOS environment because of the flawless payments through iTunes.

If you’re planning to have a monetization model in your app, it has to go through Apple system and you have to use in-app purchases. There is no other way to accept payments from your iOS apps. There are pros and cons of using Apple in-app purchases. I’ll try to explain some of them.

The biggest con is Apple takes 30% of your sale. And another con is, there are difficulties and grayed areas in the integration of in-app purchases to your app and back-end. But the pros make all even. Because delegating whole payments to Apple is gonna affect your sales because Apple makes it so seamless that it reduces all money related steps to only one confirmation box. So it changes the purchase experience and makes it what it supposed to be. Most of the checkout or payment experiences on web, faces lots of drops because of unnecessary and boring steps like putting your credit card info, trying to give the trust to user that you’re a legitimate company and have legitimate payment system that you will not sell their info out or you won’t let hackers to pick your customer info up. All those buying experience changed in iTunes payments. So this is why you should want to integrate in-app purchases. Continue reading “Integration and verification of iOS In-App Purchases”

Profiling and optimization on Facebook PHP SDK

If you’re developing PHP based Facebook application, you might want to use (or already using) Facebook integration little more than just authentication and identification of your user. Even if you have the simplest Facebook app and using PHP SDK, you probably have regularly done API calls.

You write your app and you start to see performance issues. You start to optimize your database interactions, PHP code optimization, after you done with your application optimization if you still have performance problems it’s possibly from Facebook calls. Since you use SDK, you might not know what’s happening with Facebook communication. So you want to do profiling between your app and Facebook API servers.

You can add a basic timing profiling to your API calls to see how many calls you do, what kind of calls they are and how long they take to run.

Let’s dive in SDK, modify it a bit and start getting profiling information. Here is the actual method you need to modify in base_facebook.php file:

and we’re modifying it like this:

It simply appends a global array named “facebookApiCalls” and adds the API call details as “args” and timing as “duration”. So at the end of your page logic code, you can print this information after sorting it by duration and you can also filter to show only slow ones (for instance, the ones took over 200 milliseconds).

After this profiling you can start to identify the slow calls, also if you do same calls multiple times because of recursing, recalls etc…, you can see and optimize, combine them.

This optimization is not only a performance tweak for the user, also it will decrease the number of calls made between your server and Facebook servers.

Database Integration in PHPStorm, PyCharm or RubyMine

JetNrains released better database integration to their IDEs which applies to PhpStorm, PyCharm and RubyMine. They had database integration with some level of functionality but recently they released a video with their latest updates which is included in this post. I didn’t use database integration before in PHPStorm but i tried after seeing this video and i found it very productive and helpful.

After installing Java connectors (which you can do it with no extra effort) you’ll be good to go for connecting your database with many different database engines in your project. With database integration, you can review the structure of your database, access and manage your data and develop your SQLs with lots of cool features. These stuff are pretty standard so far, almost like a replacement of PHPMyAdmin. Nothing very new.

I found a particular connection between my code and my database very helpful, and that is, you can develop your SQL while you write your code, or you can test and run your SQL from your code without touching, copy/pasting. Also it does code IntelliSense while you write yourSQL which saves some time from mistypes and makes sql development more fun.

Check this video out for feature tour:

Source: http://blog.jetbrains.com/webide/2012/11/sql-support-and-database-tools/

PHPStorm: Most advanced PHP IDE so far

I’ve been using PHPStorm from day one of their beta release, and very happy with it.
They enhanced Java based NetBeans in the beginning, but it’s completely boosted with a lot of features.

Biggest problem developing web projects using PHP is the lack of tools and big effort requirements for creating a stable integrated development environment. There are very good simple and clean editors but none of them is not farther than a code intellisense enabled editors. What I mean is there are debuggers, advanced editors, database management tools, but all of them has their own ways, not communicating and not integrated. And it varies on different operating systems.

When I first tested phpstorm in the beta times, they had this minimal advanced editor with some half working hard to configure add-ons like svn support, debugger integration etc but wasn’t easy to get it up running. But they improved the initial configuration steps much easier, they touched lots of add-ons to get them more integrated with less configuration and they started to support modern languages for different web technologies (html5, less, sass, haml). Here is a couple of features that I like and probably you’ll find them very usefull as well.

Code Intellisense is not just for PHP, also most of the languages that you use for general PHP based web project (HTML, Javascript, CSS, XML). Also, code intelligence supports most of the PHP, Javascript frameworks and helps you to get faster coding.

Debugging PHP runtime with xdebug, you can catch, stop and debug your PHP app while it’s running. Also, makes the error handling way easier.

Version control system integration allows you to integrate your svn, git projects, access versions and manage working copy.

Database connectors support all SQL engines that Java not just allows to browse, alter your database structure also there is a database console that you can use code IntelliSense when you develop your SQL. This is a common feature for most IDEs so far but PHPStorm also uses database connections for every project when you write/browse or debug your PHP code if it’s running a SQL. You can run or use code IntelliSense when you’re writing your SQL in your code.

Also, PHPStorm has other ton of features like automatic deployment, automatic upload over FTP/SFTP protocols, zen coding, code snippets etc…

They released 6 major versions in 3 years that was basically touched version of NetBeans in the beginning but now it gives totally enhanced and different coding experience. Unfortunately, you need to pay $100, the first time and it gives free updates in minor releases. But if you want to update in major releases you need to upgrade your license in 1 year periods for $50. But it’s nothing compared to what you get.

JetBrains also develops most of the features in PHPStorm for their common product base which you can have similar or same features in their other IDEs for Ruby and Python developers. If you develop Python or Ruby, you should check PyCharm and RubyMine out.

PHPStorm’s homepage: http://www.jetbrains.com/phpstorm/