Posted on 10-10-2007
Filed Under (Useable Hosting, Data) by dotWdot

A lot of data can place a lot of load on a server. As I discussed in a post a few weeks ago Data can translate nicely into cash, so if you have a nice good source of data then take it while you can. Free data is great and can be good to feed to pages, but as its free other people are likely to be using it - upping the likelihood of competition, google thinking your replicating and thus lower traffic. Free data not in the public eye is even better. You have users on your site? a lot of them? Then think how to use that - from users logging in and out, registering, chatting - whatever - use everything you can to produce good sources of information.

Anyway in this post I will discuss ways to manipulate, play with and organise data - in large chunks specifically. Lets say if you have a source with 40k lines of detail, with 10-30 fields for each line, and you want to mass update a few fields.

First the format: My skill base covers SQL so I will discuss with this as the basis, MSSQL is great, if you are using that then updates are easy as its presumably all local, good connection speeds and processor/memory control will allow for easy updating and manipulating. In a web situation Mysql is the obvious choice, and always my recommendation. But there are problems that may potentially crop up with using mysql to feed 40,000+ records, edit/update them and generally manipulate that much data usually when hosted by another company, that is running mysql locally is almost better than mssql for data manipulation.

Shared hosting and even quite a lot of virtual servers are heavily restricted in memory, script running times and processor load. There are as ever ways around this.

  1. Transfer the database to a local machine and rape the memory/processor/bandwidth like its yours (because it is) then transfer it back up.
  2. Get better hosting - a good dedicated server is not cheap - and depending on your office location / IT knowhow it can often be as cost effective to run your own server (e.g london offices.)

or…Trickle updating - When you have thousands of lines to update and you really want to get messy with each line e.g. create long manipulations of all the fields and then update them.

Here’s how I do it:

  1. Work out what you really want to do to the data - by planning how you would like it in an ideal way, titles here, dates in a good format here etc.
  2. Break the process into as many chunks as possible - if you can - do manipulations in small chunks - more likely to get through shared server load problems!
  3. When you understand what you want to do, add a single field to the data table structure - e.g. ‘process_status’ - From here then start a loop updating records doing all you need to - perhaps in steps and then update this field as you go. so for example: 5 steps to do to 200,000 lines, start a loop splitting the 200k into as bigger lumps as doable on the server. e.g. grab lumps of 200 lines from the server and do one step and line at the time and update the field as you go.
  4. To get round your shared hosting having limits on how long a script can run for - (often crashes on 1and1 when you run a script that updates loads of lines and takes more than a minute), because you have already split the load into 200’s or a relevant figure (depending on how long it takes to update a single line) simply let it do this 200 and then spit out a meta refresh line. Then you can leave it running over night and once its done each 200 records it will refresh, do the next 200 records and loop until done.


In essence:

  • Create a php script that splits the mass of data into smaller chunks e.g. 100, 200 - whatever
  • Let the script then spit out a meta refresh line which refreshes the page after a few seconds
  • It will then do its chunk, refresh and do the next - until its done.

Php and mysql can get round most shared hosting hold backs - with a little elbow grease!!!!

If anyone wants help with mass manipulating data, sql I can offer freelance data warehousing, manipulation and general processing - from back end to crystal reports.


If you like this post on mass data then you may be interested in the following:

  • Data = Dollar ? - An analysis on whether data sources can be money sources - Microsites Specifically
    Read More   
Post a Comment
Name:
Email:
Website:
Comments:
    • e-business, SEO, Internet Theory and a little Coding...


  • Blogroll