VENUE (INDIE DESK - LOS ANGELES DOWNTOWN)
UC riverside (System Admin)
PAGER DUTY (ENTERPRISE LEVEL gateway for alerting phones/email/SMS)
http://www.pagerduty.com/ $18 per user
Nagios is feeding pagerduty...
Graphite and Collectd
Jenkins (Business workflow automation and reporting)
Model business processes that get forgoten if it was a cron job
- Like if the batch job finished (Central dashboard where you can view)
CDN at Edgecast (Andrew Lientz VP managed service) - Santa Monica
Cacti and centreon work well (and still do) for the basic hardware features
- Cacti works great for HDD failure reporting
Pintrest (DDOS tool in webbrowser) 120 connection from browser
TUNE TO THE APPLICATION
intro notion of "GRID" [http grid]
SNMP MIBS into the applications
initial color coding to deal with various features
add versioning and customer information to the status of the server
THE GRIDS
- HTTP
- flash
- windows
- local load balancers
- log processors
- Used by the 24/7 NOC to watch for application and server issues
Juniper 960 is half a rack in size (telco and CDN buy it)
THE NETWORK
- multiple data centers (do not have a backbone)
ip transit, peering and dark fiber connectivity
- 24/7 Monitoring by clients using keynote, Gomez and Catchpoint
- Routers and switches with constant packet drops for one reason or another
ROUTING VIEW
- 1st generation monitoring tool inhouse developed (They levarage cacti for that routing view)
-- has color code system
-- realtime graphing system
EXTERNAL VIEW
- monitring and pinging outside servers and the routes to the servers - if a route goes down??
Too MUCH DATA
- better way to dashboard
EVEN WITH
- 24/7 Mon
- cnetreon for snmp traps and hardware failures
- routing views
- thir party monitring
REDUCE NOISE
- Focus on warings and alerts
- Take what we have learned for the grid and put in alarms for each
NEW 2nd GEN SERVERS VIEW
THE CUSTOMER
- Refine the tools for the customer
-- NOC
- Content owners
-- Engineering
-- DevOps (Software rollout)
-- Capacity Planning (Massive capacity issue, where to bild the next dc)
DASHBOARD? is it built from scratch or some opensource project????
THE END USER
- watching out network isn't enough
- We need to develop QoS tools
- Look at all the networks not just the ones we directly connect to
- Leverage beacons (google analytics - end user measurement) and content provider relationships to give proper end to end measurements
bad first time byte is dns issue
bad last byte time is a route issue
Hack night in Hollywood
@lancelakey on GitHub and twitter
MAtthew King (Software Engineering TX
Redis
is an open source
Readis
monitor
inspect