Eric Wagner 669 Grand View Avenue, Apt. 1 San Francisco, CA, 94114 (415-203-7176) eric@devopsman.com
|
|
INNOVATION AND SCALABILITY
- Able to quickly build out diverse geographically distributed teams of high performers using an extensive network of Silicon Valley and tech company contacts.
- Scaled worldwide MongoDB Infrastructure to handle 500 million writes per hour and reads of over 700 million an hour.
- More than 20 years of experience directing and managing multiple engineering and operational teams.
- Director of organizations of up to 60 people.
- Architected, deployed, and managed online video platform scaled to millions of concurrent viewers.
- Managed back-end infrastructure of high-visibility live video events for Lady Gaga, Adele, Kat, 50 Cent, Justin Bieber, the Jonas Brothers, John Legend, and many others.
- Central to the success of a 1.5 million core distributed AI/genetic algorithm training system.
- Developed Apple Dynamic HTTP Streaming server used to stream live video of top Premiere Radio personalities to tens of thousands of iPad and iPhone viewers daily.
- Built transcoding system that handles 20,000 to 50,000 hours of transcoding a day.
- Live stock data feed operations, including NASDAQ, NYSE, CME, FX data feeds, and trading in multiple venues, including Currenex, Hotspot FX, Morgan Stanley FIX, and Interactive Broker API trading.
- Responsible for operational risk reduction for entered equity positions of tens of millions of dollars per day.
- Continuously monitored and improved the network and application services for the ESPN 2011 Cricket World Cup online video site streamed to hundreds of millions of viewers over the two-month event.
- Scaled up and improved services, including video on demand and transcoding, for OWN TV and America’s Funniest Videos user-generated content ingestion.
- Back-end infrastructure design used by HomeAway to host VOD videos in a call-to-action Super Bowl ad.
- Designed MySQL/MariaDB/TokuDB DB cluster that can perform tens of thousands of transactions a second.
- Enjoys working in a high-pressure, 24/7/365 environment.
EXPERIENCE
Netskope, Santa Clara, California 8/16 to Present
Provider of highly scalable cloud-based security infrastructure.
Senior Director of Ops, QA, and Service Delivery
Platform Engineering Operations – Critical and involved in multiple aspects of platform engineering, service delivery, technical operations, and project management while moving through several roles at Netskope.
- Goal-oriented Senior Director with more than 20 years of experience with the skills required to direct and manage large teams engineering, planning, and running multiple and highly scaled sets of technical infrastructure.
- Transitioned QE testing and Netskope product deployment from quarterly to monthly deployments.
- Directed all operations while Netskope went through a quick growth phase from 2016 to 2018.
- Harmonized data service development and operations with the goals of Platform Engineering and Netskope.
- Acted as manager and coach of the data operations team and successfully transitioned team into separate groups and added a management layer.
Massively Scalable Data Infrastructure – Central to the success of Netskope’s massively scaled data systems:
- Scaled Mongo clusters from single-digit sets of shards to clusters with up to 200 shards, with a total of over 800 shards spread across multiple clusters. All the clusters combined can handle over 500 million writes an hour and reads over 700 million.
- Managed deployment and operations of over 300 Redis clusters (sentinel and cluster.)
- Guided development and implementation of a worldwide Kubernetes-based Ceph file and a key-value store.
- Designed ScyllaDB (Cassandra clone) multi-regional replication strategy.
- Directed multiple teams who managed a worldwide data streaming network using Kafka and Pulsar.
- Managed migration of over 200TB of Mongo data to ClickHouse (OLAP database management system.)
- Spearheaded project moving async security events from Mongo to Yugabyte (Postgres).
- Designed and managed deployment of full HA and DR MariaDB/Galera clusters across 20+ data centers.
- Instrumental in deploying and managing Looker-based analytics using GCP BigQuery as the data store.
SRE Dev/Ops – Created ticketing and development operation deployment process to quickly and continuously deploy fixes and features for both the Netskope product and supporting systems:
- Agile lead for operations and data systems.
- Implemented versatile and low-error change management policy and automation.
- Directed Ansible to Salt automation migration.
Sentient Investments Management, San Francisco, California 4/11 to 8/14, 4/15 to 8/16
Delivers massively distributed AI and genetic algorithms to solve complex problems in various fields, including equities and FX trading.
Director of Trading and AI Training Operations
1.5 Million Core Distributed AI Training System – Central to the success of Sentient Technology’s 1.5 million core distributed AI/genetic algorithm training system:
- Designed, implemented, and managed data collection and control services for Sentient Technology’s distributed genetic algorithm training system.
- Implemented and managed Mesos/Chronos Docker management system.
- Private CDN design and management used for data package deployment (enabled by apache, HAProxy, NGINX, Varnish, and a private cloud)
- MySQL schema design, query performance analysis, partitioning, and server tuning to support 30k queries per second on ‘big data’ sized databases.
- Core reporting and alerting service design, coding, implementation, and monitoring using Nagios and custom services
- Proxies: SOCKS/Dante, HAProxy (load balancing, TCP proxy), forward and reverse proxies (Apache)
- Tomcat server management and tuning, Tomcat NIO connector implementation, Java tuning
- Ubuntu server administration (12.04 and 14.04)
- Python, PHP, Perl, shell scripting
Development Operations – Created development operation deploy process to quickly and continuously deploy fixes and features to AI training and trading systems:
- Deployed and managed development operation systems, including JIRA, Crucible/Fisheye, Bamboo, and Stash
- Integrated build systems with svn and git source control
- Developed custom deployment and development operations services
Trading Operations – Responsible for operational risk reduction for entered positions of tens of millions of dollars per day. Directed, directly managed, operated, and responsible for low latency order fills for all Sentient Technology trading systems, including equities and foreign exchange trading systems:
- Live stock data feed operations including NASDAQ, NYSE, CME, TMX/TSX, FX data feeds, and trading in multiple venues including Currenex, Hotspot FX, Deutsche Bank, Morgan Stanley FIX (as an execution broker), and Interactive Broker API trading.
- Pre-trading stock and sector analysis to create equities block list based on liquidity limits, corporate actions, and other indicators and news. Responsible for watching various news sources for certain exceptions requiring immediate position exits.
- Responsible for quickly fixing trading issues and applying trading exceptions such as emergency shutdowns during certain market conditions, and dealing with issues such as halted stocks, boxed trades, and other serious trading issues.
- Quickly reacted and resolved global stop loss and trading limit issues.
- Continuous monitoring of all trading systems to ensure that all systems stayed within expected exposure, order, and AUM deployment thresholds.
- Quickly fixed capital deployment issues such as problems with hedging, incorrect and problematic order fills, and unexpected algo trading behavior.
- Trading systems: MS Flow Manager and Passport, Eze castle, Interactive Broker TWS, Activ data feed.
- Developed and managed daily backtest, analysis, and reporting services.
Twelvefold Media, San Francisco, California 8/14 to 4/15
AI advertising placement and tracking.
Director of Operations
Advertising bid management services – Designed, implemented, and managed servers and services to support Twelvefold Media’s advertising bid management system.
- Managed Hadoop / HDFS / map-reduce system used as AD URL index.
- Responsible for ParAccel (Actian Matrix) data warehouse.
- PSQL database administration.
- Budgeted, specified, assembled, and configured Solr cluster.
Cloud service management – Managed all of Twelvefold’s cloud services, including the development of a cloud services backup system specifically designed to back up Twelvefold’s HDFS AD URL index.
Office move - Managed IT and infrastructure for office move.
Kyte (purchased by KIT Digital), San Francisco, California 1/07 to 4/11
Provider of online high capacity multiscreen broadcast quality video (live and on-demand).
Director of Scalability and System Architecture
Scalability and System Architecture – Central to the success of the Kyte service by applying extensive
experience in scalability and network and software infrastructure, including:
- Designed and managed the entire Kyte system infrastructure stack: CDN Cache > Cisco firewalls > F5 load balancers > Apache HTTPD > Apache Proxy/Cache > AJP Bridge > Tomcat JVM (with both NIO HTTP and MINA socket connectors) > Oracle Coherence > MySQL DB cluster all running on multiple CentOS/Redhat Enterprise Linux servers with dynamic upscaling of capacity using Rackspace and AWS cloud services.
- Video delivery scalability up to millions of concurrent viewers and 600TB+ a month of media delivery.
- 1.6 billion video views over the life of the product.
- Designed back-end infrastructure for TV quality live streams (soon to be HD quality) streaming from multiple wireless connection backpacks (Live Pro Unwired).
- Full support for video consumption and production from and to multiple mobile devices, including iPhones, Blackberries, and Symbian-based devices.
Deployment Engineering – Developed Kyte’s deployment and installation system using a combination of Bash, Perl, and PHP. The system includes such features as automatic rollback, levels of approval, multiple tier deployments, highly structured release and patch control, control of cloud and CDN services, and full integration with Kyte’s continuous integration environment.
Continuous Integration – Deployed and maintained Kyte’s continuous integration system. Components include Perforce, JIRA, Confluence, Crucible, Fisheye, CruiseControl, Hudson, and EC2/AWS-based performance testing.
Transcoding Cluster – Development and managed Kyte’s high capacity multi-featured transcoding cluster:
- Supports all common and 99% of all uncommon video and audio codecs and containers, including h.264, VP8, AAC, MP3, FLV, SWF, MP4, AVI, MOV, MPEG-TS, AMR, 3GP, and M3U8
- Can transcode up to 50,000 hours of video a day (and can scale much higher by adding transcoding nodes)
- Uses multiple versions of ffmpeg, mencoder, and MP4Box and a custom probing and media fixup system to determine the best transcoding attributes, bitrates and codecs to use as well as attempt fixes of various types of video and media corruption in incoming videos
- Maintained, built, and modified all transcoding tools (written in C/C++, Perl and Bash).
- After extensive testing and development, the cluster now creates best-in-industry multi-bitrate iPhone, and iPad transcodes
Software and Application Service Tiers – Designed, implemented, and monitored all of Kyte’s software and application service tiers and tier components, including:
- Coherence caching and queuing cluster running on Tomcat Java servlet servers
- Transcoding cluster
- FMS and RTMP(E) services
- Apple Dynamic Streaming cluster
- Multiple origin and multiple feature live and on demand streaming system that currently streams 500 to 600TB a month
- Multi-tier cached system (including CDN and various layers of proxy caching)
- High capacity multiple VIP load balancing
- Apache HTTP, proxy cache, advanced rewrite rules, AJP, virtual hosts, etc.
- Tomcat NIO (non-blocking IO) implemented to allow massive improvements in concurrency and memory usage
- Asynchronous API with MINA
- Monitoring (Nagios and custom statistics graphing service) and the main Kyte reporting server (used for reporting stats to customers using the Kyte control and management console)
- MySQL cluster
Internet and Network Infrastructure – Responsible for Kyte’s Internet and network environment including both the Kyte product infrastructure and corporate environment.
- High capacity multiple VIP load balancing
- CDNs: Limelight, Akamai Monitoring: Nagios and custom tools
- Multiple Linux distributions
- Netfilter/iptables firewalls (IPv4 and IPv6)
- Named/bind, DHCP, radvd, NFS, NetApp, F5 load balancers, Cisco firewalls.
- Audit and logging systems
- System security and compliance (for multiple jurisdictions)
SenSage, San Francisco, California 3/05 to 1/07
Real-time, scalable, and enterprise-level event data and log file analysis software.
IT Director
Operations and Infrastructure – After only two weeks on the job, was instrumental in the success of SenSage’s move from an IT perspective to a new office. The move, which included multiple Linux servers, firewalls, routers, desktops, and other systems, was completed within a very short period of time, allowing SenSage to continue operations with no downtime for critical systems and a small amount of downtime for all other systems.
Technology Management – Introduced and applied a full set of professional operational and IT services to SenSage.
- Reorganized and maintained the entire network (including co-location/hosted services, corporate and QA), the SenSage website, all servers (Linux and Windows), Linux-based routers and firewalls, email, the PBX and voicemail, etc.
- Webmaster for www.sensage.com including site updates and redesigns.
- Responsible for and experienced with: Linux, Solaris, all versions of the Windows OS and NOS, SMS, Cisco routers/switches, 802.11b, Oracle, MySQL, Sendmail, Spamassassin, ClamAV, TCP/IP, DNS, DDNS, DHCP, NIS, NFS, VPNs (PPTP, IPSEC, etc.), various firewalls, bandwidth control, monitoring systems, Apache, IIS, Squid, etc.
Engineering Success – Added significantly to the success of SenSage’s engineering efforts by stabilizing and enhancing SenSage’s engineering, development, and QA infrastructure.
Vignette, San Francisco, California 12/03 to 3/05
Provider of enterprise-level content management, portal, business efficiency, and collaboration products.
Engineering Services Manager
Dynamic and Effective Management – Quickly took control and re-energized an Engineering Services team that had undergone a succession of managers and staffing changes. After only a month, the team was functioning at its highest level and had become one of the most respected teams at Vignette.
Development Services – Designed and implemented the core infrastructure of the Vignette development lab – A system that included Bind, Exchange 2000/2003, DHCP, and SMTP servers; Cisco switches and routers; Linux, Windows, HPUX, Solaris, and AIX servers; Oracle and DB2; the monitoring systems Netsaint and MRTG; as well as Perl based automation scripts.
- Managed 300 test and development servers, including Linux (RedHat and SuSE), Solaris, HPUX, and AIX, Exchange servers (5.5, 2000, 2003), Windows servers (NT, 2000, 2003), and various Active Directory implementations.
- Developed a system image service to quickly create images for all Vignette-supported server platforms to create on-the-fly configurations for testing and development.
Creative and Unique Solutions – Took control of the resources in Vignette’s development lab by creating an application based on Vignette’s Portal and Builder technology. This allowed the management of all lab resources. In addition to tracking individual systems, the new lab portal can track configurations that use multiple machines, links directly into Vignette’s bug tracking system tracks when systems are checked in and out, and it even integrates with Nagios and MRTG using a Perl script to track uptime and performance levels fully.
Intraspect, Brisbane, California 2002 to 12/03
Developer of enterprise-level collaboration software. Company acquired by Vignette (see above.)
IT Manager
High-Performance Email: Sendmail and Anti-SPAM – Used years of email and Sendmail experience to implement a high-uptime and high-performance email system that delivered email quickly and reliably.
- Designed and implemented a RedHat Linux-based anti-spam system based on Sendmail and Spamassassin that successfully blocked almost all spam.
- Responsible for a seamless Exchange 2000 to 2003 upgrade.
- When initially hired, quickly resolved numerous Exchange-related issues, including lost email, Exchange database crashes, and poor Outlook performance. Received a commendation from the CEO of Intraspect for this work.
Active Directory/Windows 2000/2003 Implementation – Successfully implemented and managed a multiple domains, redundant, scalable, and reliable Windows 2000/Active Directory installation in a mixed Windows, Macintosh, Linux, and Solaris environment.
XDegrees, Inc., Mountain View, California 2001 to 2002
Creat enterprise level cost-effective alternative to VPNs and extranets for highly secure, cross-firewall information exchange which, after XDegrees acquisition by Microsoft, is now integrated into several Microsoft products.
IT Manager
Web/Internet Operations – Responsible for five-nines uptime using Cisco routers, switches, and load balancers, fully redundant Linux, Windows 2000, and Solaris-based servers, Apache web servers, MS SQL, Oracle, ISC DNS, Sendmail, as well as the XDegrees XIN (eXtended Information Network).
Team Building – Built a strong and responsive IT team from the ground up, a team that added significantly to the success of XDegrees.
Software Development and QA – Designed and managed the software development system, including CVS, bug-tracking, the QA lab, and code and build testing (using Rational Purify, Bugzilla, Bonsai, and Mozilla Tinderbox).
Zaplet, Inc., Redwood Shores, California 1999 to 2001
Developer of the Zaplet Appmail System, an email-based collaborative business process management software for mission-critical business processes such as supply chain management and customer relationship management. Zaplet recently merged with MetricStream.
IT Manager
Team Management - Directly managed the 12 IT team members.
Information Strategy – Planned, from a technical, strategic, and budgetary perspective, Zaplet’s entire application, network, server, and telecom infrastructure, including:
- Web systems (BEA WebLogic and iPlanet web servers and caching servers)
- Application servers (Weblogic, JRun and the iPlanet application server)
- Network operating system (Window 2000 Servers and Active Directory)
- Email (Exchange and Sendmail)
- Development and QA systems (over 50 Sun Solaris systems including Ultra 5s, Netra T1s, 220Rs, and E450s)
IT Support – Created a complete work order system with appropriate services levels, notification, and escalation that gave the company a strong and reliable IT support infrastructure. This infrastructure has been integral to the success of the company’s product and to multiple company initiatives.
Telecommunications – Provided a complete, innovative, and cost-effective system for all corporate telecommunication needs. Directed the transition from a lower-end PBX to a higher-capacity PBX enabling over 500 handsets, VoIP, least-cost routing, and full call-center services such as ACD, CTI, and IVR.
BUT WAIT, THERE’S MORE!
Photography: https://500px.com/p/ericwagnerphoto/galleries/landscapes
Web design/admin: mariva.com, basetree.com
EDUCATION
MCSE
CNE
University of Vermont
B.A. Anthropology
Lambda Alpha (Anthropological Honor Society)