Sending WAN/Internet Bandwidth Usage Data to Splunk from Tomato Routers using Splunk HEC
A while back I wrote about sending data from SmartThings and other home devices data to Splunk so I can monitor what goes on in my home via Splunk Dashboards. In addition to SmartThings devices, I also pulled data from other data sources such as network routers, Windows event logs, weather data retrieval scripts, etc.
To monitor our Internet bandwidth usage I wrote a Node.js program to scrape the data from the admin web UI for my Verizon Actiontec MI424WR router. Here‘s the code for that.
Last week I upgraded my internet to Verizon Fios Gigabit and with that upgrade, the Actiontec router was replaced with another router: a Netgear R7000 running Advanced Tomato (open source Linux-based firmware for Broadcom based Wi-fi routers). Advanced Tomato has a pretty click interface to monitor bandwidth, but I still want the data in my Splunk instance.
Luckily, Advanced Tomato runs a variant of Linux, so all I needed was a shell script to calculate bandwidth usage data and send to Splunk via the Splunk Http Event Collector.
I found a script by WaLLy3K that already had the bandwidth calculation logic and all I had to add was a little more code to send the data to Splunk.
Step-by-step Instructions
Enable JFFS Partition on Your Router
Enable JFFS Partition on your router so that you have permanent storage for your script. Otherwise if you saved your script in /tmp, it’ll be gone after the next reboot. Log into your router’s admin UI, choose Administration/JFFS, select Enabled and Save.
Create Your Script
SSH into your router and create a shell script at /jffs/bandwidth.sh with the content from here. Update the splunkUrl variable with your Splunk HEC URL. If you are not able to SSH, make sure you have SSH Daemon enabled under Administration/Admin Access.
For more info on installing Splunk HTTP Event Collection, see my previous post.
# this is just an excerpt of the code. For full code see
# https://github.com/chinhdo/shell-scripts/blob/master/sh/bandwidth.sh
...
wan_iface=`nvram get wan_iface`
calc(){ awk "BEGIN { print $*}"; } # Calculate floating point arithmetic using AWK instead of BC
checkWAN () {
[ -z $1 ] && sec="1" || sec="$1"
netdev=`grep "$wan_iface" /proc/net/dev`
pRX=$(echo $netdev | cut -d' ' -f2)
pTX=$(echo $netdev | cut -d' ' -f10)
sleep $sec
netdev=`grep "$wan_iface" /proc/net/dev`
cRX=$(echo $netdev | cut -d' ' -f2)
cTX=$(echo $netdev | cut -d' ' -f10)
[ $cRX \< $pRX ] && getRX=`calc "$cRX + (0xFFFFFFFF - $pRX)"` || getRX=`calc "($cRX - $pRX)"`
[ $cTX \< $pTX ] && getTX=`calc "$cTX + (0xFFFFFFFF - $pTX)"` || getTX=`calc "($cTX - $pTX)"`
dlBytes=$(($getRX/$sec)); ulBytes=$(($getTX/$sec))
[ $dlBytes -le "12000" -a $ulBytes -le "4000" ] && wanStatus="idle" || wanStatus="busy"
getDLKbit=$(printf "%.0f\n" `calc $dlBytes*0.008`); getULKbit=$(printf "%.0f\n" `calc $ulBytes*0.008`)
getDLMbit=$(printf "%.2f\n" `calc $dlBytes*0.000008`); getULMbit=$(printf "%.2f\n" `calc $ulBytes*0.000008`)
}
Create another shell script /jffs/bandwidth-env.sh with the following content:
export SPLUNK_AUTH="YOUR_SPLUNK_AUTH_KEY"
/jffs/bandwidth.sh
To test your script run it manually and confirm the data is showing in Splunk:
/jffs/bandwidth-env.sh
Schedule Your Script
To schedule your script, you can use the Scheduler (Administration/Schedule) in the router’s web admin UI. I have an automatic reboot scheduled at 4 AM, so I scheduled a custom script at 4:15 AM to run the bandwidth-env.sh script:
To start the script right away, spawn a process for it:
/jffs/bandwidth-env.sh &
Additional Info
Here’s a little bit of info on how the script works. The raw bandwidth data is read from /proc/net/dev.
Per redhad.com, /proc/net/dev
"
Lists the various network devices configured on the system, complete with transmit and receive statistics. This file displays the number of bytes each interface has sent and received, the number of packets inbound and outbound, the number of errors seen, the number of packets dropped, and more.”
For our purpose, we are interested in the first column which contains the cumulative number of bytes received by the interface, and the 10th column, which contains the number of bytes sent.
The script retrieves the current data, then sleeps for a number of seconds, and reads the updated data. The download/upload Mbit/s data is calculated by taking the difference and divide by the time elapsed. There’s also some logic to handle when the counters wrap around the max value back to zero.
Here’s how the data shows up in my Splunk Home dashboard: