Monday, June 25, 2012

VENUE (INDIE DESK - LOS ANGELES DOWNTOWN)

UC riverside (System Admin)

PAGER DUTY (ENTERPRISE LEVEL gateway for alerting phones/email/SMS)
http://www.pagerduty.com/
$18  per user

Nagios is feeding pagerduty...

Graphite and Collectd 

Jenkins (Business workflow automation and reporting)
Model business processes that get forgoten if it was a cron job
- Like if the batch job finished (Central dashboard where you can view)



CDN at Edgecast (Andrew Lientz VP managed service) - Santa Monica
http://www.centreon.com/ (Front end to MYSQL db created by nagios)
Cacti and centreon work well (and still do) for  the basic hardware features 
- Cacti works great for HDD failure reporting
Pintrest (DDOS tool in webbrowser) 120 connection from browser

TUNE TO THE APPLICATION
intro notion of "GRID" [http grid] 
SNMP MIBS into the applications
initial color coding to deal with various features
add versioning and customer information to the status of the server


THE GRIDS
- HTTP
 - flash
 - windows
 - local load balancers
 - log processors

- Used by the 24/7 NOC to watch for application and server issues

Juniper 960 is half a rack in size (telco and CDN buy it)

THE NETWORK
- multiple data centers (do not have a backbone)
   ip transit, peering and dark fiber connectivity

- 24/7 Monitoring by clients using keynote, Gomez and Catchpoint

- Routers and switches with constant packet drops for one reason or another

ROUTING VIEW
- 1st generation monitoring tool inhouse developed (They levarage cacti for that routing view)
 -- has color code system
 -- realtime graphing system

EXTERNAL VIEW
 - monitring and pinging outside servers and the routes to the servers - if a route goes down??


Too MUCH DATA
- better way to dashboard
EVEN WITH
 - 24/7 Mon
 - cnetreon for snmp traps and hardware failures
 - routing views
 - thir party monitring


REDUCE NOISE
 - Focus on warings and alerts
 - Take what we have learned for the grid and put in alarms for each
 

NEW 2nd GEN SERVERS VIEW


THE CUSTOMER
- Refine the tools for the customer
 -- NOC
  - Content owners
 -- Engineering
 -- DevOps (Software rollout)
 -- Capacity Planning (Massive capacity issue, where to bild the next dc)



DASHBOARD? is it built from scratch or some opensource project????


THE END USER
- watching out network isn't enough
 - We need to develop QoS tools
 - Look at all the networks not just the ones we directly connect to
 - Leverage beacons (google analytics - end user measurement) and content provider relationships to give proper end to end measurements


bad first time byte is dns issue
bad last byte time is a route issue


Lance Lakey  lancelakey@gmail.com
Hack night  in Hollywood

@lancelakey on GitHub and twitter

MAtthew King  (Software Engineering TX

Redis

is an open source


Readis
monitor
inspect

No comments: