Analytics Data on SQL Database – Best database and table design for billions of rows of data

This is not an article that I am writing but I’m mostly quoting a great gem on a stack overflow answer I came across when I was researching a DIY way to store and create analytics reports for a small to medium size project. The project’s type doesn’t matter because this is a generic problem and great solution.

Why not use analytics tools/services?

I am in constant search of the better alternatives or simpler versions of the solutions we use at my team. We certainly use many services and tools from open source to licensed software. But I still choose to understand, know and be able to apply these solutions by myself on a custom solution where I have full control over the data, output and user experience.

So I casually read and research how others approach the issues or queries wander in my mind.

Then I stumbled upon this stack overflow thread with a brilliant answer that contains steps to try out from scratch that I suggest any engineer to just try and play on their own time.

PostgreSQL and BRIN indexes

To create a sample table with 1.7 billion rows of a sample sensor data (temperature read from the sensor with timestamps in the logs):

So it took 22min to create the table. Largely, because the table is a modest 97GB. Next, we create the indexes,

It took a good long while to create the indexes too. Though because they’re BRIN they’re only 2-3 MB and they store easily in ram. Reading 96 GB isn’t instantaneous, but it’s not a real problem for my laptop at your workload.

Now we query it.

Update with timestamps

Here we generate a table with different timestamps in order to satisfy the request to index and search on a timestamp column, creation takes a bit longer because  to_timestamp(int) is substantially more slow than  now() (which is cached for the transaction)

Now we can run a query on a timestamp value instead,,

Result:

So in 83.321 ms we can aggregate 86,401 records in a table with 1.7 Billion rows. That should be reasonable.

Hour ending

Calculating the hour ending is pretty easy too, truncate the timestamps down and then simply add an hour.

It’s important to note, that it’s not using an index on the aggregation, though it could. If that’s your typical query you probably want a BRIN on  date_trunc('hour', tsin) therein lies a small problem in that  date_trunc is not immutable so you’d have to first wrap it to make it so.

Partitioning

Another important point of information on PostgreSQL is that PG 10 bring partitioning DDL. So you can, for instance, easily create partitions for every year. Breaking down your modest database into minor ones that are tiny. In doing so, you should be able to use and maintain btree indexes rather than BRIN which would be even faster.

This is a great answer to the topic around working with analytics data on SQL databases. Finally thinking about table partitioning is always a good plan-ahead strategy for any data gets over millions and have distributed data around timestamps.

Reference: Best database and table design for billions of rows of data

Make your car smarter with Automatic

After I got my last car about 8 months ago, I’ve invested some money to make my car smarter like getting Apple car play installed. One of the things I did was to install a mini device to my car that makes my car a little bit smarter. The device name is Automatic.

This tiny gadget connects to the car’s diagnostic port which it receives it’s power and read and monitor car’s data like engine lights and stuff. The device contains GPS tracker and a sim card to continuously have low-speed connectivity like GPRS. This is more than enough to store it’s monitoring data in the cloud.

Here is the cool stuff you can do with having automatic on your car:

  • Parking tracking – probable the best feature. I hate the idea of trying to remember where I parked in the street. This removes the need for that. Open the app and check where the car is 🙂
  • Track when you started and finish driving. It’s like anti-theft alarm. You get a notification if someone starts the engine – sweet
  • Track the whole tour you drove. It draws the driving path on the map and you can see your driving history like uber receipts.
  • These driving records also contain avg speed, gas consumption…
  • Driving style – how often you stop. how aggressive you accelerate…
  • IFTTT integration – this opens the possibilities 100 times. You can set triggers like driving away from am geofence, or arriving at the geofence. This allows setting smart behaviors like Turn on garage lights when I arrive home.

It’s a little bit of a novelty features but if you are data nerd like me, it’s automatic tracking of your driving data which is enough value to me.

https://automatic.com

Using Heroku for a quick development environment

Heroku is an industry-changing service that is established in 2007. It transformed how developers create and deploy apps today. With its add-ons marketplace, Heroku became the development hub that you can easily enable 3rd party cloud services. These services can be in many different categories that a web application may require. From database services, caching, image processing to mail delivery and so on…

Heroku supports many modern development languages that are actively used with big communities like PHP, nodejs, ruby, python, go, java… The beauty of the Heroku applications is that, managed by Heroku and very very easy to understand. They are also very easy to scale, deploy apps in Heroku infrastructure… All Heroku apps are deployed to given app name’s subdomain under herokuapp.com or can be easily set to have a custom domain for free.

Essentially, Heroku runs on a command line interface and an internal git repository to manage versions of your code. When you set up a new project folder, Heroku CLI tool registers your app and assigns a git repository. Heroku CLI doesn’t initiate git repository on your folder, so if it’s a non-git folder, you need to git init on your project folder first.

I highly suggest all developers adapt Heroku in their workflow, at least for the sandbox & playground purposes.

I have created some boilerplate repositories in the past:

Time to take a break from your computer

I firmly believe that having regular breaks from long work sessions is a necessity. We work sitting long hours without much movement. It’s often we forget to take long or short breaks when we work on a task that occupies our brain. Also sometimes it’s hard to break a focus session.

Without a tool reminding us of the breaks, it’s very difficult to track when we started working and when to take a break. GTD methods like Pomodoro helps to track these moments but even with planned natural work blocks we miss taking a break from the screen.

I started to use a tool to remind break time with a schedule. First I used a paid app called Timeout.app. Then one of my colleagues found the open source alternative of the same tool. https://github.com/hovancik/stretchly

I’ve forked and modified the look and feel and some of the element’s positions for my own taste https://github.com/mfyz/stretchly

I highly suggest a timeout app like this one to be embedded in the workflow. Now I take much more regular breaks with the help of a tool like this. Sometimes I feel it becomes annoying because I feel I’m very focused. I don’t want to break the focus session but I also acknowledge the importance and the value of giving opportunity and space to our minds with these breaks.

The app is pretty straight forward to use. You set micro and regular breaks with an interval. I take 10 seconds micro break in every 10 minutes. Every after 2 micro breaks, I take a regular break of 5 minutes (every 30 minutes)

https://github.com/mfyz/stretchly

Work in a distraction free mode on your computer

I always tune my work style and seek for ways to increase my focus and productivity. From my previous posts (screen-less saturdays), you can see I am also sensitive to the screen time the distractions comes with the screens. There are endless ways to waste time as well as get distracted on screens as things pop-up. Namely notifications. God, I started to hate notifications. So much noise!

I’ve been using an app called focus.app for last few years here and ther. I’ve have been incorporating it to me pomodoro-like sessions. Focus is a paid app but cheap that helps tremendously to keep my focus together while I set a session for myself to be distraction-free and get stuff done.

The app is very minimal, sits on your menubar and you simply toggle in focus and unfocus modes. When in focus mode, app blocks predefined and extendable list of websites and apps. If these pages are open, they show an inspirational quote. If a blocked app is open when getting into focus mode, focus.app closes the app.

I set up my pomodoro length sessions with blocking all communication apps. Also turning my mac’s do not disturb mode on and with a custom script. With this scripts, I set my slack status to do not disturb so my team mates can see that I’m in focus mode and will not get response from me right away.

The last two things I explained is unfortunately done with a bash script. This script runs from focus app when getting in and out of focus modes. I also use some additional scripts that re-opens all apps and restore my “connected” work session after a focus session is completed.

I highly suggest focus to anyone can get easily distracted with an email they received, or a thing they wanted to check in twitter.

https://heyfocus.com/


Quick and dirty set up Graylog in 5 minutes with docker

Docker made things super easy if you are curious about a new open source tool to try and even use it with isolated installations on your machine. In this article, I’ll show quick steps to install and give graylog a try with a simple nodejs application to send logical errors to graylog instance.

1) Copy the docker-compose.yml file content below to a file then run:

2) Login to graylog with opening http://127.0.0.1:9000/ in the browser
Username: admin
Password: admin

3) Configure inputs: Go to System > Inputs
Add new “GELF UDP” configuration as global input using port 12201

4) Run the simple nodejs application below to send logs to graylog. First init npm and install graylog2 package from npm with:

docker-compose.yml

app.js

Best GTD method for geeks – Todo.txt

For most of the GTD (Get(ting) Things Done) mastery student, there is a constant research of better “todo” app or tool. I’ve been in this for a very long time and used many apps. Desktop, mobile, command line, cloud, API… You name it, I’ve probably used it for some time.

At the end, I found myself staying very plainly managing my todos without needing a lot of features. In fact, I needed not to worry about the features of the apps I used. This especially become an issue as I’m OCD (obsessed about “order” and tidy) and a little bit ADD (regularly distracted). When you have a todo app that looks ugly and you need to use their features in order to clean things up, it eats up your time aside of actually focusing on your todos.

Methods and apps

There are also a million apps (I wrote 4 of them for myself) does a combination of “todo” management and a specific type of method. Like pomodoro, or kanban or whatever is out there. This gets more dangerous because the method is actually completely independent from what the todo is, where it lives and how it lives. It can be written on a paper list. For instance, if you do pomodoro, the best way to do it actually use a kitchen timer. Literally, use that old school timer to perform your pomodoros. Otherwise, I almost always find it time consuming to think that things will be more connected and automated when you have a todo app that does the pomodoro. So you can click to a todo and a button to do pomodoro of that item. It sounds good but it’s opposite in practice.

Plain formats works best

For long time, I used cloud based (to sync between my devices) tools. I used evernote, then quip, then trello at some point, then few more. But I found it, the simplest when I can simply copy paste stuff to move around. Because you’ll be consistently re-prioritizing your todo list, editing, adding, removing, marking things done. It’s just how the process of GTD works. That’s why you need a method that is the most convenient and requires less adaptation and portability between platforms and environment. There are few fancy stuff you may want to have like:

  • A programmable interface (API/CLI) – for instance to have your top 3 todos for the day appears on a screen somewhere. Or query the last completed tasks.
  • Color coding or highlighting at least to distinct what’s done and what’s not done. Ideally, when you’re done with stuff, it should disappear from your screen but in some cases, you want to see them at least until the end of your day to be able to review.

Todo.txt

After many years trying different things, I came across with the todo.txt format. It’s a very low level and with few simple rules to give you the freedom to use whatever tool you want, wherever you want with having additional capabilities with community implementations on CLI, cloud, mobile etc…

https://github.com/todotxt/todo.txt

todo.txt format is so simple that is explained in one annotation below:

To be honest, I don’t use almost any of these things except the “done” marker. So for me, it’s as simple as todos are either not done or done. That’s it. What I want to do manually is always re-ordered them and have separators (which I simply use 3-5 dash characters “——” as an extra line).

I love this format is because I use it in a few different tools on my platforms. Wasn’t super happy with the desktop solutions, so I forked and enhanced a simple code editor written on electron/nodejs. Added a few capabilities and adjusted the color schema to my liking and open sourced published it (You can find, download and contribute to it here: https://github.com/mfyz/todox).

On mobile, it’s not that easy to have a custom code editor without getting my hands dirty with a lot of native coding which I felt lazy. Also, another point that I had to figure out was the sync between my devices. I live on apple ecosystem so I simply used iCloud drive of the text editor app I use on my iOS devices (Textastic).

Texastic supports textmate and sublime bundles (including custom syntax support and themes). I installed a sublime text implementation of todo.txt format and had color coding which all I needed on my mobile devices. Most of the time my activity on my mobile devices are simply adding new stuff to the list or mark them done.

Taking long (scrolling content) screenshots on Mac

Sometimes we need to take a screenshot of a long content mostly from scrolling applications. Most common example of this is full-length web page screenshot. There are chrome extensions we can use for taking full-length website screenshot. But there is not an easy way to take screenshot from other apps like native desktop apps or email content from mail clients.

XNip Screen Capture Tool

We can use Xnip Screen Capture tool that has all of the common screen capture software features and a feature We can use for taking long content screenshots called “Scrolling screenshot”

It’s is a freeware with upgrade ($2/yr subscription) but works perfectly for this purpose without the upgrade (it leaves watermark that can be cropped easily if needed).

http://xnipapp.com/

http://xnipapp.com/scrolling-capture/

https://itunes.apple.com/us/app/xnip/id1221250572?mt=12

I’ve been doing Screen-less (no screen) Saturdays more than a year now

I firmly believe screen-less or no screen time is a necessary concept for digital dwellers. Screen-less / No screen is very much like sabbatical practice where you don’t interact with technology as much, or to be more specific don’t interact with screens.

Let’s deconstruct this little bit.

Working with screens

I’m an internet worker which is my job to produce stuff for people to consume on their technological devices. It’s my job to think, create and bend the rules with computing and make technology smarter. Most of my day goes into research, write, plan, manage and do all of these on computer. Usually spend like 12-15 hours on screens.

There are so many occupations not centered around creating technology but using technology to create. By nature many jobs are very similar to mine, spending almost all the time behind screens while working.

Couch potato with screen

But there are a good chunk of humanity just “watching”. Probably majority of the people spending time on screens is just consuming. It’s a natural transition from non-interactive television era to screen time. Kevin Kelly has a great definition for this and he calls it “screening” not watching, not reading… We just scroll endlessly.

Too much screen time, too many screens to look at

Not just having A screen, we have screens everywhere. Phones, tablets, computers, tv screens for constant information for some jobs (like stock tickers, performance dashboards etc..) and finally wearables. We carry multiple screens with us all the fucking time…

Not a surprise everybody is ADD now

I’ve been doing reads on many different topics related with technology’s impact. Recently how screens affect little children since I have two little babies now, more curious on this topic these days.

What I see as a consistent pattern, mostly conclusions on these reads is screens are bad for certain period of childhood causing loss on attention span, making our brains more distracted. I understand the reasoning that our brain is designed to “see” stuff in graceful progression. Especially movements we expect from nature is not suddenly disappearing or suddenly appearing stuff. Sudden movements is “danger” signal for brain. We produce minimal stress hormones when stuff happens fast because naturally fast movements trigger us to “escape” from danger by it’s biological nature. The fast moving images and flashing visuals that makes brain to jump between things is essentially damaging. This sounds so familiar – think a random tv program 🙂

It’s growing number of kids, people have ADD (attention deficit disorder).

Keep it cool, keep it under control

Best way to cope with this is to have regular “stops” from screens. It’s just plain straightforward plan. Don’t look at screens, don’t use your gadgets for a while. To me once a week for a day is just perfect. Maybe once in every 6 months or once a year it can be weeklong stop. It’s not new concept guys! We call this just weekend and holidays 🙂

Cool it down a little…

Getting lazy with technology

Not interacting with screens or more broad with technology completely for a day is also about breaking the pattern of technology supported laziness where we lost the connection and need for retention of basic information like directions or shopping list.

When I first did no screen, without realizing, I got into a situation that I booked a yoga class the day before for Saturday and when the class time came, I realized I never knew the address of the studio I went twice. It was a little scary that I didn’t know this information, I managed to go out and try few different streets until I remember the correct street. Following week, I started to note the directions of the places I wanted to spend my day.

Getting creative with no screen time

A lot of good came out of no screen days:

  • read much more,
  • workout in many different ways, running, yoga, biking, hiking,
  • farmers market, vintage shop, random exploration of my neighborhood,
  • having less serious plans and schedule for Saturdays,
  • catching up with friends more,
  • meditating,
  • writing, life planning

It’s been over a year now and it became part of my weekly routine.

Exceptions

I have few exceptions that I don’t practice no screen Saturdays, when I travel, when I have a deadline that shit really happened and it’s very important (I try to keep this 2-3 times a year max, otherwise there is always important stuff). I only interact screens very very briefly (not longer than a minute or two) to open music, I also keep “kindle” out of “screen” category because you can’t do much other than reading.

Web, UI and browser automation with headless browsers

Wanted to give you a short information about browser automation. You visualize a desktop app when a “browser” comes to minds right? All browsers use an engine to render web pages on our screen. And these engines can actually work without rendering pages in the UI. All they need is to render the elements in memory. From there, it can allow us to interact with rendered pages programmatically without displaying the rendered page on our screens. There are browsers only works in this mode and they are called “headless” browsers. Means they have no UI. This browsers are meaningless for general consumers but it comes very handy to developer and testing community. Many companies build their testing and QA process utilizing these headless browser, do execute their UI flows with browser automation scripts. For instance, headless browser can be programmed to run and simulate the following user experience flow:

  • Load http://example.com web page,
  • Wait until page is completely rendered including javascript and css,
  • Fill “Fatih” to the field called “Name”,
  • Click to the button called “Send”,
  • Wait 5 seconds,
  • Take screenshot and save as JPEG

This can be very useful when doing regression tests.

Event utilizing screenshots with headless browsers will be very useful. There are many companies doing screenshot comparison for high level understanding changes done visually based on any given iteration on the code. This process simply takes and keeps versions of each page and in every release, it takes a new one with latest version and compares the pixels (and colors) to the previous version to determine a percentage for the change it detects. Then you can set some report and process to make sure you track of big changes to detect if a tiny css change blew a page you usually don’t test manually. It becomes very meaningful when you think about a web page with 100 different pages.

Are there any headless browsers I can use?

The well known headless browsers; the one named “Phantom” (and phantomjs) that is big on nodejs community. There is also headless chrome which is based on chromium.

There is an extensive list of all headless browsers out there here: https://github.com/dhamaniasad/HeadlessBrowsers

Happy browser automations 🙂