Friday, May 25, 2007

Using action+client caching to speed up your Rails application

Too many visitors are visiting your website and loads of dynamic data are being delivered to your clients?. Of those visitors, you have more people reading your site's content than people modifying it? meaning, you get lots more GET requests than POST, PUT or DELETE?

If the above questions are all answered with a YES, then, my friend, you are desperately in need of caching. Caching will help you lessen the load on your servers by doing two main things:
  1. It eliminates lengthy trips to the (slow by nature) database to fetch the dynamic data
  2. It frees precious CPU cycles needed in processing this data and preparing it for presentation.
I have faced the same situation with a project we are planning, we are bound to have much more GETS than any other HTTP command, and since we are building a Restful application we will have a one to one mapping between our web resources (urls) and our application models. The needs of our caching mechanism are the following:
  1. It needs to be fast
  2. It needs to be shared across multiple servers
  3. Authentication is required for some actions
  4. Page presentation changes (slightly) based on logged in user
  5. Most pages are shared and only a few are private for each user
We have two answer the following now, what caching technique and what cache store we will use?

The cache store part is easy, memcached seems like the most sensible choice as it achieves points 1 & 2 and is orthogonal to the other 3 requirements. So it is memcached for now.

Now, which caching technique?. Rails has several caching methods, the most famous of those is Page, Action and Fragment Caching. Greg Pollack has a great writeup on these here and here. Model caching is also an option, but it can get a bit too complicated, so I'm leaving it out for now, it can be implemented later though (layering your caches is usually a good idea)

Page caching is the fastest, but we will use the ability to authenticate (unless we do so via HTTP authentication, which I would love to, but sadly is not the case). This leaves us with action and fragment caching. Since the page contains slightly different presentation based on the logged in user (like a hello message and may be a localized datetime string) fragment caching would sound to be the better choice, no? Well, I would love to be able to use action caching after all, this way I can server whole pages without invoking the renderer at all and really avoid doing lots of string processing by Ruby.


There is a solution, if you'd just wake up and smell the coffee, we are in Web 2.0 and we should think in Web 2.0 age solutions for Web 2.0 problems. What if add little JavaScript to the page that dynamically displays the desired content based on user role. And if the content is really little, why not store it in a session cookie? Max Dunn implements a similar solution for his wiki here and thus the page is served the same with dom manipulation kicking in to do the simple mods for this specific user. Rendering of those is done on the client so no load on the server, and since the mods are really small, the client is not hurt either, and it gets to get the page much faster, it's a win win situation. Life can't be better!

No, It can!. In a content driven website, many people check a hot topic frequently, and many reread the same data they read before. In those cases, the server is sending those a cached page yes, but it is resending the same bits which the browser has in it's cache. This is a waste of bandwidth, and your mongrel will be waiting for the page transfer to finish before it can consume another request.

A better solution is to utilize client caching. Tell the browser to use the version in its cache if it is not invalidated. Just send the new data in a cookie and and let the page dynamically modify itself to adapt to the logged in user. Relying on session cookies for dynamic parts will prevent the browser from displaying stale data between two different session. But the page itself will not be fetched over the wire more than once, even for different users on the same computer.

I am using the Action Cache Plugin by Tom Fakes to add client caching capabilities to my Action Caches. Basically things go in the following manner:
  1. A GET request is encountered and is intercepted
  2. Caching headers are checked, if none exists then proceed
    else send (304 NOT MODIFIED)
  3. Action Cache is checked if it is not there then proceed
    else send the cached page (200 OK)
  4. Action processed and page content is rendered
  5. Page added to cache, with last-modified header information
  6. Response sent back to browser (200 OK + all headers)
So how to determine the impact of applying these to the application
  1. We need to know the percentage of GET requests, which can be cached as opposed to POST, PUT and DELETE ones
  2. Of those GET requests, how many are repeated?
  3. Of those repeated GET requests, how many originate from the same client?
Those numbers can tell us if our caching model works fine or not, this should be the topic of the next installment of this article

Happy caching

Thursday, May 24, 2007

Agile Process

My first reading about Agile was the Agile Manifesto. I think this was enough to get the whole picture of the agile philosophy.
I really don't recommend that you get a book to read about the agility; it will be a contradiction to read a book as long as the Agile methods emphasize working software as the primary measure of progress. It is better you spend the time understanding your working environment and maturity to apply the best process that can fit with it.
I lived a similar experience with eSpace when they were aiming to get a CMM certificate, CMMI or whatever they will call it after couple of years.
I think that finally the people here got the good decision by putting this plan away, and put their own standards, the standards best fitting with their teams and their environment. I doubt their was a need to some white collar guy to supervise our documents and our working styles to decide whether we fit to Mr XYZ standards or not. I also got the chance to meet some guys from another company having this certificate. What I found is that they modified their hierarchical structure just to accommodate the standard although their team wasn't capable at all to support this hierarchy.

Anyway, let's move to another point...
When we talk about the agile process we should identify very well any possible limitations before adopting it. It is very clear for me that it needs mature people, and this can be considered as one of the limitations.
Another factor that may limit the agile adoption is the "distributed software". Lately all our projects are distributed. Customers are from Europe, USA, Gulf. Developers are working remotely from different areas.
What are the challenges behind such distributed agile work???

The answer of this question can be summarized in knowing that agile methods mainly rely on informal processes to facilitate coordination while distributed software development typically relies on formal mechanisms.

Challenges in Agile Distributed Development
  1. Communication need vs. communication impedance: How can we achieve a balance in formality of communication in agile distributed environments?
  2. Fixed vs. evolving quality requirements: the Agile relies on ongoing negotiations between the developer and the customer while Distributed Development often relies on fixed, upfront commitments on quality requirements.
  3. People- vs. process-oriented control: We appreciate the people-orientation the most.
  4. Lack of cohesion: Generally speaking about the distributed environment, people may feel In distributed development, participants are less likely to perceive themselves as part of the same team when compared to co-located participants. the agile adds some more excitement to it :)

Practices
  1. Process Refactor: Continuous process adjustments instead of following strictly the agile practices. It is also recommended to document the requirements at different levels of formality.
  2. Knowledge Spreading: the team should share their knowledge regarding different domains(business, code, test cases..etc). Alot of tools were developed to reduce the overhead of knowledge-sharing activities. Code/process Repository is a must. the Wiki also is a very important way to share the How-Tos between members. I can't neglect the Bugzilla with its role in creating database to help teams report issues and assign priorities.
  3. Short Iterations: any iteration should not exceed 2 weeks. short iterations help to detect any misunderstandings for the project business and prevent any time waste.
  4. Start with well-understood functionalities: In order to create the best atmosphere for developers there should be a solid sand to start to be familiar with the processes, tools, and the application. this can be some kind different that the agile which advocates the development of features prioritized as critical by the customer.
  5. Improve Communication: Synchronized work hours are very important for the team. Also try to make the informal communication be done through formal channels. for example, let it through the emails so that it can be archived. In the distributed Projects it is recommended that the project leader/manager should be involved in the communication and the synchronization process than the Agile practice. finally, some daily mechanisms should be done to maintain minimal communication like morning online meeting.
  6. Building Trust: trust involves both the team and the customers.

References:
1. Ebert, C. and Neve, P.D. Surviving global software development. IEEE Software 18, 2 (Mar./Apr. 2001), 62–69.
2. Highsmith, J. and Cockburn, A. Agile software development: The business of innovation. IEEE Computer 34, 9 (Sept. 2001), 120–122.
3. Matloff, N. Offshoring: What can go wrong? IT Professional (July/Aug.2005), 39–45.

Wednesday, May 23, 2007

Virtual functions in C++


"In order to implement the concept of Polymorphism which is a corner-stone of OOP the C++ compiler has to find a way to make it possible."


Lets see how the story begins.

Derived classes inherit member functions of the base class but when some member functions are not exactly appropriate for the derived class they should provide their own version of these functions to override the immediate base class' functions and make their own objects happy. So if any of these functions is called for a derived object the compiler calls its class' version of that function.

This works quite fine when the types of objects are known at compile time so that the compiler knows which function to call for each particular object. The compiler knows where to find the copy of the function for each class and so the addresses used for these function calls are
settled at compile time. ( static binding )

Suppose that we have a lot of derived objects at different levels of the inheritance hierarchy that have a common base class and that they need to be instantiated at run time. Here the compiler does not know in advance what derived class objects to expect. These objects would be dynamically allocated and the code for handling these objects should be able to deal with all them.

It is perfectly legitimate to use base class pointers to point to these objects but that requires the compiler to handle them exactly the same way they would handle their base class objects. So they would call base class versions of member functions and none of the member functions specific for the derived class would be accessible.

To solve this problem
Virtual functions are used to allow dynamic binding.

"...It seems that our friend, the compiler of course, is very resourceful."


To support Polymorphism at runtime the compiler builds at compile time
virtual function tables ( vtables ). Each class with one or more virtual functions has a vtable that contains pointers to the appropriate virtual functions to be called for objects of that class. Each object of a class with virtual functions contains a pointer to the vtable for that class which is usually placed at the beginning of the object.

The compiler then generates code that will:

1. dereference the base class pointer to access the derived class object.
2. dereference its vtable pointer to access its class vtable.
3. add the appropriate offset to the vtable pointer to reach the desired function pointer.
4. dereference the function pointer to execute the appropriate function.

This allows dynamic binding as the call to a virtual function will be
routed at run time to the virtual function version appropriate for the class.

Impressive isn't it?

Well that made me try just for fun to write code that would do these steps instead of the compiler.

But as I did this another question evolved.

How does member functions get their "this" pointer ? ( pointer to the object the function is called for )

I know that the compiler should implicitly pass 'this' as an argument to the member function so that it can use it to access data of the object it is called for.

I used in my example a single virtual function that takes no arguments and returns void.
So at first I tried calling the destination virtual function with no arguments. The function was called already but the results showed it has used some false value for 'this' that pointed it somewhere other than the object and gave the wrong results.

So I tried calling the function and passing it the pointer to the object and it seemingly worked just fine.

Here's the code I tried...


#include <iostream>

using std::cout;
using std::endl;

class Parent {
public:
Parent( int = 0, int = 0 ); // default constructor
void setxy( int, int );
int getx() const { return x; }
int gety() const { return y; }
virtual void print();
private:
int x;
int y;
};

Parent::Parent( int a, int b )
{
setxy( a, b );
}

void Parent::setxy( int a, int b )
{
x = ( a >= 0 ? a : 0 );
y = ( b >= 0 ? b : 0 );
}

void Parent::print()
{
cout << " [ x: " << x << ", y: " << y << "] ";
}

class Child : public Parent {
public:
Child( int a = 0, int b = 0, int c = 0 , int d = 0 );
void setzt( int c, int d );
int getz() const { return z; }
int gett() const { return t; }
virtual void print();
private:
int z;
int t;
};

Child::Child( int a, int b, int c, int d )
: Parent( a, b )
{
setzt( c, d );
}

void Child::setzt( int c, int d )
{
z = ( c >= 0 ? c : 0 );
t = ( d >= 0 ? d : 0 );
}

void Child::print()
{
Parent::print();
cout << " [ z: " << z << ", t: " << t << "] ";
}

class GrandChild : public Child {
public:
GrandChild( int = 0, int = 0, int = 0, int = 0, int = 0);
void sete( int );
int gete() const { return e; }
virtual void print();
private:
int e;
};

GrandChild::GrandChild( int a, int b, int c, int d, int e )
: Child( a, b, c, d )
{
sete( e );
}

void GrandChild::sete( int num )
{
e = ( num >= 0 ? num : 0 );
}

void GrandChild::print()
{
Child::print();
cout << " [ e: " << e << " ]";
}

int main()
{
Parent parentObj( 7, 8 );
Child childObj( 56, 23, 6, 12 );
GrandChild grandchildObj( 4, 64, 34, 98, 39 );

// declare an array of pointers to Parent

Parent *parentPtr[ 3 ];

cout << "size of Parent = " << sizeof( Parent ) << " bytes\n";
cout << "size of Child = " << sizeof( Child ) << " bytes\n";
cout << "size of GrandChild = "
<< sizeof( GrandChild ) << " bytes\n";


parentPtr[ 0 ] = &parentObj; // direct assignment
parentPtr[ 1 ] = &childObj; // implicit casting
parentPtr[ 2 ] = &grandchildObj; // implicit casting

cout << "\nThe Derived objects accessed by"
" an array of pointers to Parent:\n\n";

for ( int i = 0; i < 3; i++ ) {
cout << "Object " << i + 1 << " : ";
cout << "\tvtable ptr (" << *( ( void ** ) parentPtr[ i ] ) << ")\n" ;
// vtable ptr at the beginning of the object

// initialize pointer to function

void (* funptr ) ( Parent * ) = NULL;

// assign to it pointer to function in vtable

funptr = *( *( ( void (*** ) ( Parent * ) ) parentPtr[ i ] ) );

cout << "\t\tpointer 1 in vtable is (" << ( void * ) funptr
<< ")\n\t\t( pointer to virtual function 1 'print()' )";

cout << "\n\n\t\tdata: ";

funptr( parentPtr[ i ] ); // call the 1st function in vtable
// and passing ( this ) to it
// without using parentPtr[ i ]->print();
cout << "\n" << endl;
}

return 0;
}


The output should look like this:


size of Parent = 12 bytes
size of Child = 20 bytes
size of GrandChild = 24 bytes

The Derived objects accessed by an array of pointers to Parent:

Object 1 : vtable ptr (0043FD90)
pointer 1 in vtable is (00401480)
( pointer to virtual function 1 'print()' )

data: [ x: 7, y: 8]

Object 2 : vtable ptr (0043FD80)
pointer 1 in vtable is (004015B8)
( pointer to virtual function 1 'print()' )

data: [ x: 56, y: 23] [ z: 6, t: 12]

Object 3 : vtable ptr (0043FD70)
pointer 1 in vtable is (004016E6)
( pointer to virtual function 1 'print()' )

data: [ x: 4, y: 64] [ z: 34, t: 98] [ e: 39 ]



In order to reach the function pointer to the desired function ( print() ) the parentPtr of the object which normally points to its beginning had to be casted to type pointer to pointer to pointer to function before it was dereferenced to give the vtabel pointer and then dereferenced again to give the first pointer to function in the vtable.

Polymorphism uses virtual functions in another interesting way. Virtual functions enables us to create special classes for which we never intend to instantiate any objects. These classes are called abstract classes and they only used to provide an appropriate base class that passes a common interface and/or implementation to their derived classes.

Abstract classes are not specific enough to define objects. Concrete classes on the other hand have the specifics needed to have a real object. To make a base class abstract it must have one or more pure virtual functions which are those having = 0 added at the end of its function prototype.

virtual void draw() const = 0;

These pure virtual functions should be all overridden in the derived classes for these to be concrete ones or else they would be abstract classes too.

Suppose we have a base class Hardware. We can never draw, print the production date or price unless we know the exact type of hardware we're talking about. So it looks that class Hardware could make a good example for an abstract base class.

Another example could be class Furniture and it might look something like this:


Class Furniture {
public:
...
virtual double getVolume() const = 0; // a pure virtual
function
virtual void draw() const = 0; // another one here
...
}

Here class Furniture definition contains only the interface and implementation to be inherited.
It even does not contain any data members.

That's it.
Hope you liked this article.

I will be happy to receive your comments.

Bullet Proof code using RegExp.

Regular expressions is a great way to find those hard to find strings.
Suppose you have a bunch of old code and you want to bullet proof it or you might be interested in auditing it and for starters you want to find those uninitialized variables using eclipse.

the pattern of uninitialized variable could be in the following format

(Var)(space or more)(Alphanumeric word)(Possible space or more)(semi-colon)


to search for (space or more) we use expression
(\s+)
\s means space
+ means one or more

to search for(Alphanumeric word) we use expression
(\w+)
\w means Alphanumeric character
+ means one or more

to search for (Possible space or more) we use expression
(\s*)
\s means space
* means zero or more

so the regular expression would be int(\s+\w+\s*); it will return all uninitialized ints.

It would be great if we collect those regular expressions and keep them in a library to be our arsenal towards bad code.

do u have more regular expressions to share?

Monday, May 21, 2007

Test Mail Server

You are on delivery, you need to test your application, and you need to make sure it sends email notification when the form is filled.. you look around wondering were did the system admin go.. and after a while when you catch him sitting in the buffet, and ask him for a mail server to test your logic, he stares at you for a while before replying " We got no mail server for testing". Well, at this point the degree of frustration reaches its peak, and you start wondering why in the world do we hire those sys admins?

well you won't need them anymore, just use Gmail, yes you can use your gmail account (sure you have one).. here is the configuration

address => "smtp.gmail.com",
port => 587,
domain => "yourdomain.com",
authentication => :plain,
user_name => "yourgmailaccount",
password => "yourgmailpassword"

Friday, May 18, 2007

MogileFS revisited

So i got this reply on my recent post

"Please recall that MogileFS has no POSIX file API. All file transfers
are done via HTTP. So, it really isn't a drop-in replacement for NFS
or any other network file system. You need to add logic to your
application to deal with MogileFS.

Also, you can't do updates to a file; you must overwrite the entire
file if you make any changes.

MogileFS is primarily intended for a write-once/read-many setup."

So how would this fit in our system, for a starter I think it won't be of much impact, since we are storing system images. The idea of updating files won't be an issue, as images intend to be very large, and once stored it is either replaced by a newer image or used to restore a system. Also we are going to use Ruby on Rails to interface with the system Imager, our ope source imaging system, and ruby has a plugin for MogilrFS, so it won't be a problem to integrate it, and everything seems ok.

What about other systems, how could be MogileFS useful in other systems.. Would these issues be a problem for application in need for a smart storage? Lets take a Mail system for example, we have multiple servers serving a domain, and users' mail boxes are spread among these servers, The file in this case will be the emails, and since we need no update on the emails, write once/read many condition will be fulfilled. Although if the mail service was not tailored made or customized, it will be hard to integrate MogileFS, meaning if you are using a ready made Mail server like Sendmail or Qmail, you will find difficulties to make MogileFS your storage engine.

As a conclusion MogileFS is better used with applications that are developed with MogileFS as its storage engine in mind. Although you can use it with out of the box systems, it won't be smooth ride, but fr sure there are some systems which will not benefit from MogileFS like file sharing or workflow systems.
Still I can't wait to try it over, and keeping you updated.

Thursday, May 17, 2007

Hi, I'm Ruby on Rails - Part 1

What do you get when you cross the Mac vs PC commercials and Rails Envy? Ruby on Rails ads to get everyone hyped for Railsconf, that's what!




Update a newly added column in a migration

One of the very interesting features I like about Rails is migrations. It is a version control system that keeps track of all database changes. You can easily move your database to any previous version with its schema and data.

During my last project, I have tried to create a migration that adds a column to a table and then updates that column.
def self.up
add_column :file_types, :mime_type, :string
q = FileType.find_by_name('quicktime')
q.update_attributes :mime_type => 'video/quicktime'
end
If you run that migration, the new column would be added successfully but no data would be updated. Why is that ?!

The problem is that you are trying to update the column, mime_type, immediately after adding it and before allowing the model, FileType, to detect the new changes (strange, I know, but true).

The solution, as documented, is simple. You just need to call
reset_column_information to ensure that the model has the latest column data before the update process.

Here is the code modified:
def self.up
add_column :file_types, :mime_type, :string
q = FileType.find_by_name('quicktime')
FileType.reset_column_information
q.update_attributes :mime_type => 'video/quicktime'
end
And here is the code of reset_column_information
def reset_column_information
read_methods.each { |name| undef_method(name) }
@column_names = @columns = @columns_hash = @content_columns = @dynamic_methods_hash = @read_methods = @inheritance_column = nil
end
It simply resets all the cached information about columns, which will cause them to be reloaded on the next request.

Although this problem has a solution, a really worse problem should be mentioned here. In the first case, when you don't call reset_column_information, you don't get any error! The column simply doesn't update. Additionally, if you go back to the previous version and then re-run the migration, surprise, you get no problems and the column updates successfully!

I don't know if this is a reported bug, but it is a strange behavior. However, this won't prevent me from developing more and more Rails applications.

Wednesday, May 16, 2007

MogileFS Storage engine!

I came across this today, It seemed interesting.. MogileFS is intended for storage hungry applications, its all about spreading your files across cheap devices on different hosts, something like RAID+NFS+DataReplication.

The Idea is very nice and simple, you have multiple servers, and every server has multiple devices, you sum up all these storage units into one big storage, you have a tracker application that you consult when reading or writing to this huge storage, and the tracker take responsibility of saving your data and making sure that your data is available even if multiple hosts went off line.

This application just came in time, we just had an idea of a project that takes images from your server and store it on a network storage, so if something wrong happens to your server you can simply take this image and restore it back to your server, or you can even restore this image on a different server to clone it, or something like that. The challenge was where to store all of these images. By doing a simple calculation, if you have 100 users and every user has a 10 G.B. image, then you are bound to maintain a tera of storage.. and scalability will be an issue.

With MogileFS you will gain three advantages here, one, you will have cheap disks on cheap servers with your storage distributed on it. Two, you will gain from this distribution by installing the application on all of these servers, and so gaining high availability. Three, scaling will be as simple as adding a server to this farm. So with about half the price of a SAN and its expensive disks, you will get high availability for your storage and application. Ofcourse we will have to manage this distributed environment. One of the ways to tackle it is to create no slave architecture, all servers are masters, and every server can detect on which server the user’s image is stored by consulting the tracker. So when a user logs in, he will first go to any server according to Round and Robin algorithm, and from this server he will be redirected to the server storing his image, where he can get served, while eliminating the network communication overhead.


This architecture can be implemented with any storage intensive application, or any application that used to rely on NFS, as NFS has proven to be unreliable in heavy production environments.

I like this tool very much, and I can’t wait to test it on our application.. so I will keep you posted with any updates.

Saturday, May 12, 2007

Web Antivirus

Web Services are increasingly becoming an essential part of your everyday life. How much time you spend surfing the internet pages?
To be more specific how much you feel now that Google is too much involved in your daily routine? Can you imagine your life without Google? your search, your Calendar, your email, your blog, ...etc

Well, it seems that you will look for Google to be your web antivirus. Before you access a page, type the url in google search and pray that you won't get "this site may harm your computer".
you have just to obey, otherwise your PC will be affected.

the story begins with researchers from the firm surveyed billions of sites, subjecting 4.5 million pages to "in-depth analysis". Actually they found 450,000 pages guilty.

It is sufficient only one visit from you to make the attacker able to detect and exploit a browser
vulnerability. Therefore, the goal of the attacker becomes identifying web applications with vulnerabilities that enable him to insert small pieces of HTML in web pages.
An example for this is iframes, which can successfully install a malware binary "drive-by-download".
Are the web masters, or the site creators are responsible for this?
The answer is, it is not always the case.

User Contribution

Many web sites feature web applications that allow visitors to contribute their own content. This is often in the form of blogs, profiles, comments, or reviews. they usually support only a limited subset of the hypertext markup language, but in some cases poor sanitization or checking allows users to post or insert arbitrary HTML into web pages.

Advertising
Although web masters have no direct control over the ads themselves, they trust advertisers to show non-malicious content. Sometimes, advertisers rent out part of their advertising space; in this case the web master needs to trust the ads provided from a company that might be trusted by the first advertiser. And so on, you may find nested relations which considered as pitfall in the trust relation by making it a transitive one.

Third-Party Widgets
A third-party widget is an embedded link to an external JavaScript or iframe that a web master uses to provide additional functionality to users. Example for this, Google Analytics :)

Webserver Security
The contents of a web site are only as secure as the set of applications used to deliver the content, including the actual HTTP server, scripting applications (e.g. PHP, ASP etc.) and database backends. If an attacker gains control of a server, he can modify its content to his benefit. For example, he can simply insert the exploit code into the web server’s templating system. As a result, all web pages on that server may start exhibiting malicious behavior. Although the team has observed a variety of web server compromises, the most common infection vector is via vulnerable scripting applications. They observed vulnerabilities in phpBB2 or InvisionBoard that enabled an adversary to gain direct access to the underlying operating system. That access can often be escalated to super-user privileges which in turn can be used to compromise any web server running on the compromised host. This type of exploitation is particularly damaging to large virtual hosting farms, turning them into malware distribution centers.

Exploitation Mechanisms
A popular exploit they encountered takes advantage of a vulnerability in Microsoft’s Data Access Components that allows arbitrary code execution on a user’s computer.
Typical steps taken to leverage vulnerability into remote code execution:
  • The exploit is delivered to a user’s browser via an iframe on a compromised web page.
  • The iframe contains Javascript to instantiate an ActiveX object that is not normally safe for scripting.
  • The Javascript makes an XMLHTTP request to retrieve an executable.
  • Adodb.stream is used to write the executable to disk.
  • A Shell.Application is used to launch the newly written executable.
Another popular exploit is due to a vulnerability in Microsoft’s WebViewFolderIcon. The exploit Javascript uses a technique called "heap spraying" which creates a large number of Javascript string objects on the heap. Each Javascript string contains x86 machine code (shellcode) necessary to download and execute a binary on the exploited system. By spraying the heap, an adversary attempts to create a copy of the shellcode at a known location in memory and then redirects program execution to it.

Detecting Dangerous Pages
Simply, by monitoring the CPU and the processes executed on accessing the page. When some unknown processes are added to the list, this will be a strong sign that a drive-by download has happened.

Google will be more and more involved into our life, it will report to you malicious sites for free....
anyway, it is not a big deal, you can do it yourself for some levels. but there a little bit sophisticated cases when you need multilevel reverse engineering...

Reference: Google Research Paper

Update:
Google online security blog, the latest news and insights from Google on security and safety on the internet.

Microsoft takes actions to defend vulnerabilities claim.

Wednesday, May 9, 2007

Upgrade your Experience with Google Analytics

Few days ago Google Analytics has released a new version. The new user UI enables easier use of the reports and metrics within the data sets,

New Google Analytics Visitor Overview

NEW:
  • Email reports and improved clarity of graphs allow users to explore and discover new insights
  • Customizable dashboards ensure the right data gets to the right people at the right time
  • Plain language descriptions of the data allow users to take action to improve their web site
This is awesome :)

Sunday, May 6, 2007

Where do you want to go today?

I have been conducting systems administration interviews for a while now, and I used to ask that one question in every interview, which is better “Linux” or “Windows”. I used to settle for a simple answer like “Depending on the environment”, this answer could get the guy into our payroll on the spot. This was in the old days when I was still young and foolish. In these old days, I used to forgive my Windows when it hangs like forever, trying to do something I don’t know about, or when my server bails out of me for no good reason or with no trace, but you know people do grow up.

For a long period now I have been playing with operating systems, including Linux and Windows. After being a loyal follower to the Microsoft technologies, I had a paradigm shift. I saw the beauty of Linux, and I touched base with the meaning of operating system. And day after day I started to understand how Linux outperforms Windows, lets take for example, why does sometimes, Windows stop responding to your requests, and start playing busy. Your Hard disk lids start blinking, and no matter how much you click anywhere your computer never give you attention. In this article we will try to explain why this happens on Windows and rarely on Linux.

Lets imagine your process getting into the OS and praying that it reaches the CPU before it starves. according to this article, In Windows, the Kernel scheduler has two queues, a foreground queue with Round and Robin algorithm and a background Queue with First in First out algorithm, and the scheduler uses many priority algorithms along with other algorithms to decide to get your poor process into which queue. The problem here that Windows scheduler works with a multilevel queue technique, meaning once you are in the queue you are stuck there until your time come to get into the CPU or starve to death.. its simple.. but a retarded one too. So, what happens when many background processes get into the background queue? These processes are not time sliced as the ones in the foreground queue, so once they get in, they will never get out until they finish.. and to make things even better the scheduler chooses between the two queues with a probability of 80% for the background queue to 20% for the foreground queues, so if odds are against you, which seems to be always this way with me, your process has to wait a long time until it get served, and so all what you get is the freezing screen and the busy Hard disk signal.

So what does Linux do, Linux scheduler is a bit smarter, it uses a technique called multilevel queue with feedback. Hmmmm.. Feedback gives the impression that the process is able to discuss its state with the scheduler, and not like windows accept its fate to be doomed in the never lands. Yes, Linux has more than two queues with different algorithms and priority, and processes can move from one queue to another according to its state, so if a process was stuck for a long time in a queue and didn’t get served, its priority increase and it get moved to another VIP queue where processes get served at once.

Some say all this complexity in the Linux scheduler will create an overhead and slow things down when you have a large number of processes. But Linux uses O(1) scheduling algorithms, so as Windows to give it credit, which means its not subject to the number of processes, and its smart, and it really respect your requests and doesn’t give you the sense that the computer is doing a much more important thing than your pathetic request.

So where do you want to go today? I know where I am going.

Saturday, May 5, 2007

Comtinues... The code that changes the world!

H here was talking about the code that will change the world. I think that it's not the code itself that will change the world, not the way you write it, or the coding style. What really will change the world of computer is the tendency to write a certain type of code. For example, when the concept of garbage collection is invented, more and more are tending to leave their native frameworks and start investing time learning new Tech. like Java, C#, Ruby. Now if you ask a student from the first year CSE about the pointers, it is probably that he don't know about it at all. That also made the system programmers rare.

Most of the native language users are Linux gurus, graphics developers, or system programmers. And that very few in Egypt and the Arab world in general. The problem in that is, we tend to follow the path of technology that they draw to us.

For me, I am not a fan of Managed code, but i have to use it to do most of my job. But the true me is with native code, where there is no abstractions except the fact that your native code is finally translated into some system calls.

Finally, If we tend to do something new that will make us drive part of the wheel of tech, we can practice with contributions to the managed world. But we need also to invent our own native tools.

Thursday, May 3, 2007

svn:ignore is disabled in Eclipse!

Currently I use Eclipse, with some plugins, for my Ruby on Rails development. A cool feature of Subversion is that you can have an ignore list to prevent files/folders from being checked when synchronizing with the repository. And Eclipse support that feature. The problem I had was that whenever I right-click on a file, I find the "Add to svn:ignore..." link disabled!

After some Googles, I found that you cannot add files to svn:ignore once they have been checked in. So, if you create a new file/folder, that doesn't exist in the repository, in your local project, you would find the link enabled for that file.

Wednesday, May 2, 2007

Mysql Network Monitor Adviser

We had this installed on one of our DataBase servers early today, I had the feeling that this will be JAMT (Just Another Monitoring Tool) telling you CPU utilization, Memory Usage.. and maybe some extra readings on Cache hit ratio and running queries, what else could it be, nothing more then some queries that could be done by some Bash/Perl scripts and shown with a nice web AJAX interface, Nothing I can't do.. or so I thought.

The installation went smoother than I expected, the idea that it installs Apache, mysql, tomcat and Java beside the instances already running on the server worried me a bit, i feared they might conflict with running production instances, but again I was wrong, the installer detected running services and installed itself somewhere else on different ports.. hmm smart.

The installation was done by just running a bin file for the server and another for the agent, and the setup walks you through an interactive installation.. till now i was not that impressed, it was just a clean installation script, something expected from MySQL.

The server ran smoothly and so did the agent, and ofcourse they communicated without any problems or interference from myside, and then the show began.

At the beginning it was everything i expected, some monitoring scripts shown on a flashy web interface, until i saw this icon, telling me i have a problem..



Table Scan and Query cache.. Interesting.
Once i clicked on the Query cache i got this message
"Advice
Evaluate whether the query cache is suitable for your application. If you have a high rate of INSERT / UPDATE / DELETE statements compared to SELECT statements, then there may be little benefit to enabling the query cache. Also check whether there is a high value of Qcache_lowmem_prunes, and if so consider increasing the query_cache_size."

Its not JAMT, its an adviser, it detects poor performance and configuration, and walk you through steps to analyze and fix your problems.

MySQL is impressing me day after day, i don't know if Oracle has such tool, but i know SqlServer doesn't!

What i see is that MySQL puts the client in mind, this poor DBAdmin who just sits days after days trying to resolve performance issues, and strange application activities, and then MySQL uses technology to serve him well, and make his life easier. On the other hand other Players in this market sector target the business owners, and how they could impress them using eye catching slogans like "High performance", "High availability", "Load balancing".. MySQL knows better that it can't do this without its real clients, the DBadmins and the SYSAdmins, so it keeps things simple and works with them to make MySQL a better place for Data.

I am really interested to know how MySQL is going to astonish me again next time, I am waiting for this, and i think it will be soon.