Categories
Blog: My thoughts MS SQL Server Programming SQL SQlite

Reading Large Binary Files: Child Play

Yes, that was funny title, but after your experience it you will agree to me. So, here is the story. I have been working on a software that read some recording from hardware device to database, we have 45 records per second for 30 days, so it is about 30x24x60x60 record entry with 45 columns in it. We have to make a desktop application so we choose .NET for it. The first version of software was released by my company 3 yrs ago, and the reading well, we were inexperience at that time to manage that data, and what we get is about 2 hrs to read all that data to database. Oh, I forgot to tell that 30 days entry was from one hardware device and we have 3-4 device :), so we took 2 hr to read all say 4 devices. Now that is not acceptable thing. So, we decide to rewrite the complete software to make use of some parallelism, as that is only way my team thing it is going to work.

I start the rewrite, and with only hope to reduce to 2 hr work to 30-45 minute I start writing code, but this time we make a exception from last time, instead of using TEXT ascii file or SQLite database, we opt to use SQL Server to store our data. Reason, well first pre-release version of software use Text file, we never get that part working for more than 15 days, and it always get out of memory for one or other reason. Then we start using Sqlite which is 5 times lighter on hardware and speed the reading and information access, however, there is part which still use text file. So, in order to avoid two source we opt for database only, and Since client already have Sql Server on seperate machine, we thought it is good to have seperate machine storing database, for long term and obviously LAN benefits. Since client already have SQL Server and we are using .NET I decide to go with Sql Server, only.

So, we start reading binary file, and start putting insert query for each record [just for testing the lowest speed], and it goes through in 12-13 minutes. wow, we already reduce 30-40 minute job to 12 minutes, just by using full time database. Now, the next challenge is to speed it up with known bulk import methods. couple of them that I tried are

1. Using Dataaset and Update feature of Dataset,

2. Using Long Insert query to send 100 or 1000 records in one “Execute”

3. Using SqlBulkCopy feature in .NET. This is obvious choice on speed, but in few cases it fails for me, so I have to look for first two options as well.

So, at end we get SqlBulkCopy as our tool to go with, now it doesn’t simply import the data in Database. We have to prepare background for it, SqlBulkCopy is used to send data from CSV files, so we create CSV file from binary read, than import this file to Staging Sql Table, and then transfer data from Staging Table to main table, all this is done in 2-3 minute flat. Yes, a 40 minute work is done in 3 minutes. Period.

The trick is, we reduce the number of operation needed to write to disk, we still create CSV, but perviously we are creating XML File, secondly, we have multiple file write procedure going, we remove all this. Infact to achieve that speed we even stop writing LOG file to trace error, we use SQL server to record those error for us. Infact removing Log Text file with SQL error log, itself speed things from 5 minutes to 3 minutes. though logs ar enot that long, but disk writing is very slow as compare to database insert.

All said, I will hardly use plain text file in future for long data read, it will be some sort of database for sure.

Categories
Article Blog: My thoughts Database MS SQL Server MySQL SQL SQlite

Database Optimization

Well, most of program use database for their web application. Eventually all web application does have database. But how many of those web application really works good. Well quite few. I am in web programming for more than 7 years now and during these years I have seen lot of developers building website using PHP or ASP.NET or ASP or PERL using SQL SERVER, MYSQL, Oracle or other database, but I always have a sense of making best application than them. But today I when I look back I realise that I was bad at writing code and making application in my initial years. Not that I don’t want to write a good code, but the fact my approach was not right. Some of the important thing that I learn over the year based on my experience are listed below. However before that I want to tell that not just me but all of us have read them in books but most of us might not have used it thinking we don’t need it as things works otherwise as well. So major stuff are:

1. Fast Database

2. Object oriented Code

3. Serialize flow of application.

In this post I try to mention Only Fast database. Well with Fast database I do not mean to pick the Fastest database on planet through bookmark testing, nor does I mean to use heavy hardware to make database run faster. These two things can really make your database appears to be faster but not really fast. I often ask a question in forum that I have few million records to parse, and can the DB engine handle it. I always get one answer if a database cannot handle few million record why we really need it.

This was very true statement, but do you really facing problem when you have only half million record and your site start taking 20-30 second to access database ? If so, then welcome to my past world. Well don’t worry you don’t need new hardware, you need new approach of programming your software and most importantly you need better database structure.

Starting with most commonly used stuff and I guess most importantly forgotten stuff.. Indexes. Yes, this is most common solution to make your site fast. Okay, so you already have define primary key in your table and it auto create index for it, but still not good enough. Well you need to index columns that you have used in your search and or in Join. Making index of one column does help, but making index of combination of 2 columns make it much better in queries that use the search on those two fields. For example, if you are searching Employee Last Name and their City from Employee table, having index on LastName and City seperately produce result slower than having index on Lastname and city name.

Now moving to next part of it. I have a sql query that has Client Code and User code in seperate column and I have a text box that allow user to enter ClientCode – UserCode in there, and I have to search it in database. I got little lazy and what I did I took the Textbox value as it is and did something like

where ClientCode + "-" + UserCode = @MyClientCode

Now, this looks good and works well for me, but this is not a good approach of writing it. In where clause never do the combination of column like that. Search easy column seperately, i.e.

where clientCode =@ClientCode and UserCode =@UserCode

the difference make your search works 10-15 times faster if you tend to have lot of records.

Third thing that comes to my mind is normalization of database and then creating views with indexes on view to join them if needed. I often see that database is not normalized and lot of Text field are used to store combination of values. For example if you want to have a product in different category then make a table Product_category, which store product ID and CategoryID, instead of storing comma seperate category IDs in product table. This seems awkward at first to change, but advantages are real. You can search product on category much faster as only Integer values are compare against the Regular expression match in comma seperated values. You can index the value in relation table, but not in text field. This approach is very common in developer in sub continent. If I am not wrong 70% of sites I see from other developer has this mistake.

Above are some error in design of database I have seen and some of them I did at my start of career, but I change my approach and today I am handling database with more than 10-20 million entries easily. Infact my site took less than 1 sec to search with a three table join. I remember the old version of this join that took 15 secs. I did great optimization there.

Anyways, making a software is your work, making it great is your choice.

Categories
SQlite

SQLite Bulk Insert on ADO.NET

Today, I was trying to find an alternative on reading a Binary data file to use information in my software. For this I choose to use SQlite the embedded database. It has quite a few steps before I achieve a acceptable speed with database and here is list of those that need to be taken care for BULK INSERT in SQLite database.

1. Try to simplify your for or while loop as much as you can. For example do not construct the Query within Loop. Give String Query before the start of loop.

2. Loop should be enclosed within the Transaction. Somebody techincally explain me that with Tranasction the fsync function which is responsible of ensuring the  data safely written, is called once only. If you don’t use Transaction fsync is called on every insert query and proves to be a bottleneck in performance.

3. Index your Table if you have to “update” the result within the Loop. In my case I have to first find one value in my Binary file, write it down to Database, then in one of condition to read of file, I might get other value to UPDATE my existing record. So, within my loop I have insert and update commands. And indexing makes it easier to find record when I got more than 50-60K record inserted. It make the speed upgrade of 1500% in my case with just 0.1M records. I have to read 1.5M records.

SQLite is really fun. It works charm with proper techniques used and store more than 1M record for me to work on.