NetInfo: Free Enrichment for IP Addresses
If you find yourself needing to enrich IP addresses with their network, autonomous system information, subnet details and geolocation data, there’s a lot of options, but many of the refined APIs require payment and the free libraries lack automation around data updates. Building on existing tooling, I wrapped the PyASN and MaxMind GeoIP libraries in a simple Flask application to provide a simple API for enriching an IP address with network data.
Get the tool on Github: https://github.com/9b/netinfo
As a security analyst, IP addresses are one of the most common data types you will come across. By themselves, there’s not too much value, but with the proper enrichment, they can reveal interesting trends or patterns that could influence larger decision making. It feels like every few months, I find myself needing an API to get the current network and autonomous system for an IP address. There’s plenty of great services to enrich IP addresses, but many of them charge a fee after several hundred queries and for my most recent project, spending thousands of dollars was not in the budget. I wanted to build a service that I could run locally without needing to provide constant care-and-feeding.
PyASN: Standing on the Shoulders of Giants
Looking at the existing libraries, I noticed that PyASN did nearly everything I wanted, but had two key issues––it lacked code to automate the updates to the database and it didn’t provide an easy-to-call web service. Fortunately, the library comes with a number of helpful utilities to download differential updates, transform data formats and gather additional metadata. Leveraging the existing functions, I was able to pull together a wrapper service within a day which I have labeled, NetInfo.
What does PyASN actually do?
Before explaining how NetInfo works, it’s helpful to understand what PyASN does behind the scenes. For nearly 20 years, the University of Oregon has collected and published routing data observed across the globe in a project named Route Views. Route Views publishes bzip compressed updates every 2 hours in the format of an MRT data file. PyASN uses the published archive files as a foundation and transforms them into a structure that can then be queried much like a database.
In order to use the file, a user must first download the archive, convert it into a processed format and then load it using PyASN. Once loaded, a user can run queries for IP addresses or different AS prefixes. Unfortunately, there’s not a great way to reload the database when a new MRT file is published; all the utilities provided assume a more manual approach to processing the data. This is where a wrapper like NetInfo was needed in order to reduce the burden of updating the databases.
MaxMind is one of the most known geoip services, so wrapping their existing implementation just made sense. Unlike PyASN, there’s less management of the database, though it does get updated several times a month. Knowing this, I added a processing script to pull down the latest free version of the GeoIP Lite database and keep that loaded into memory.
NetInfo is a free service that wraps PyASN as a web service and automates the updating process of the routing database.
Web Service Setup
The web service wrapper is achieved through a basic Flask application that exposes queryable endpoints returning JSON and a simple status page. Placing the IP enrichment within a web service makes using the data in production a lot easier. There’s no need to load any special libraries or environments, I can just call a URL and be assured I will get back the data I need.
A majority of the code for NetInfo is centered around data management. Using celery, I schedule a single job to reach out to the Route Views website and check to see if a new archive has been published. In the event that one is found, the code will download it, convert it to a proper format, update the published database and make note of the changes within a configuration file. What’s nice about this approach is that the automation relieves the user of having to even know there are files being routinely published.
In order to keep the database current, a decorator is added to each API function that will check to see how long it’s been since the database was last refreshed. Each half hour, the database is refreshed to ensure queries are returning the most accurate information. Again, there’s nothing for the user to do in order to make this happen.
All of my servers run a version of Ubuntu Server LTS. To further streamline the management of NetInfo, I provided two systemd service wrappers that can be installed in order to make both the web services and celery jobs controllable through systemctl. The primary benefits to these service wrappers is just an abstraction of management. I don’t need to worry about remembering the commands to start uwsgi or celery, I can just use systemctl to start/stop/restart each process. Additionally, logs from each service flow into standard syslog which gives me a common troubleshooting log file to work against. Finally, if my servers ever get knocked offline or rebooted, I know the services will come back up automatically without my intervention.
What’s the Best Solution?
At the start of this article, I pointed out the fact that many paid services exist to enrich IP addresses. My goal in creating this wrapper was to get the information I needed for free because the number of queries I was making simply exceeded the budget for my project. If I weren’t limited by funding, I’d happily use one of the paid services. If you aren’t interested in running your own service or want to get even more enrichment data within a single call, I would recommend checking out IPInfo.io. They provide free access for a restricted number of queries per day, their pricing is reasonable and unlike other enrichment services, they include the network that the IP address belonged to, not just the AS network.