Automated Production Testing and Monitoring

With almost two years under my belt of having an app in production on Microsoft’s Azure I’ve learned some hard, hard lessons. One of the biggest is you need to have active testing and monitoring of your production environment from both inside and outside the environment. It sounds like a no-brainer, but it’s so far down the priority list for most developers and startups that it takes a long time to get to, if ever.

production

Some history is in order; Resgrid is a system designed to provide logistics and management capabilities for first responder organizations like volunteer and career fire departments, EMS, public safety, search and rescue, HAZMAT and more.

Resgrid was founded by myself and a partner staxmanade. It started as a simple website and a couple of mobile apps with one page and some big buttons. A year later it’s now a complete end to end management and logistics system that runs on Windows Azure.

Because our market is first responders we need to ensure uptime and that information is relayed quickly, having something down for even a short period of time can impact our customers and in our market there is no good time to go down.

Create Test Accounts/Data

One of the first hard lessons what not creating seed data that we could use for testing and verification on the production system. We use Entity Framework as our backing repository mechanism and it’s very easy to add data into the Seed method of the Configuration for migrations:

protected override void Seed(Contexts.DataContext context)
{
            //  This method will be called after migrating to the latest version.

            //  You can use the DbSet<T>.AddOrUpdate() helper extension method 
            //  to avoid creating duplicate seed data. E.g.
            //
                context.People.AddOrUpdate(
                  p => p.FullName,
                  new Person { FullName = "Andrew Peters" },
                  new Person { FullName = "Brice Lambson" },
                  new Person { FullName = "Rowan Miller" }
                );
}

Use this method and seed your database! What you going to seed you may ask yourself? Anything that you may need to log into the system and perform actions, customer records, login records, etc. Even if you never plan to do production testing, create the data, and you’ll have it just in case.

Do no try and use the Seed method after you have data in the system already, it starts to not turn out well. We had an instance where we tried creating a test department after we had customers in the system already and it created over 300 test departments.

Monitor/Test Externally

Finding our your system is down from your customers is not good practice. You should know that there is an issue, well before your customers let you know. A critical part of this, is testing your site from a place on the Internet that isn’t on the same network or backbone as your system. Azure recently introduced Endpoint Status Monitoring that helps with this, but it’s just high level check.

If your using Team City as your CI server you can setup configurations that run on a schedule that can test your system with complex code, scripts, calls and much more. Have your CI server from another provider, like if your using Azure, have your CI server on Amazon, etc. Also I recommend using a backup service like Pingdom to provide backup monitoring and uptime analytics.

Fully Test Critical Processes

In Resgrid, we send out dispatches (calls) via email, text messages and push notifications. This is a mission critical process for us as our customers need these systems to work. Although we cannot guarantee delivery of the message we can say that if it never gets sent, they will never receive it. So we have jobs that run that perform user actions that would generate those messages and we monitor to ensure we receive them. We test the full flow in production at least once an hour.

Test Multiple Paths

If you have a website and an API site living on different systems, test them both. Don’t just rely on testing say the website path to ensure the system is working end to end.

You don’t always have to automate

You don’t have to automate everything, if it takes you 5 minutes to check something and it takes you 8 hours to develop an automated solution you would have to test something 96 times to break even. As a small/micro business you need to use your time where it’s most effective and that will bring in new customers and keep existing ones happy. If a manual test works and is not high friction, that might be your best bet.

About: Shawn Jackson

I’ve spent the last 18 years in the world of Information Technology on both the IT and Development sides of the aisle. I’m currently a Software Engineer for Paylocity. In addition to working at Paylocity, I’m also the Founder of Resgrid, a cloud services company dedicated to providing logistics and management solutions to first responder organizations, volunteer and career fire departments, EMS, ambulance services, search and rescue, public safety, HAZMAT and others.