Skip to content

transit_service_analyst documentation

stefancoe edited this page May 5, 2022 · 58 revisions

Overview

transit_service_analyst is a python library that provides access to GTFS files as Pandas DataFrames & GeoPandas GeoDataFrames for a specific date, as well as several functions that should help bootstrap a wide array of service related analysis, including geospatial analysis.

Representing Route Level Service

The GTFS specification does not have an explicit way to identify unique service by route. The transit_service_analyst package handles this by finding all unique stop sequences by route, or route permutations, and labels each using the first trip_id encountered (by route) with a unique stop sequence as a representative trip_id called rep_trip_id (This idea is borrowed from INRO Emme travel modeling software). Therefore, transit_service_analyst will often represent service at a disaggregate level that is based on rep_trip_id. If a given route contains the same service pattern (stop sequence) for each trip, then that route will have just one representative trip (rep_trip_id). An example of this would be a circulator route. If a route has an inbound and an outbound schedule pattern as its only difference, then it will have two representative trip ids. Some routes will have more than two, for example routes that have both local and express service or include skip stop patterns. Since both route_id, direction_id & route_type are always maintained in functions that return Dataframes with rep_trip_id as the unique identifier, subsequent data aggregations are always possible. The design of the tool is to represent route level service at its most disaggregate level so that all route permutations are represented.

Please checkout this link for some example notebooks.

Installation

Enter the following in a command prompt:
pip install transit-service-analyst
This will install transit_service_analyst in your current python environment. You can visit the PyPI page here:
https://pypi.org/project/transit-service-analyst/

Example:

import transit_service_analyst as tsa
service_tool = tsa.load_gtfs('c:/gtfs_folder', 20210914)
service_tool.get_total_trips_by_line().head(2)

rep_trip_id route_id direction_id total_trips
1 001e912 0 9
15 009f09a 1 121

Usage:

Import:
import transit_service_analyst as tsa

Tool Access:
The entry point to the transit_service_analyst library is through the load_gtfs command:
tsa.load_gtfs(<gtfs_dir>, <service_date>)

  • <gtfs_dir> String. The location of the GTFS files.

  • <service_date> Integer. The date in YYYYMMDD format that represents the service date of interest. The idea here is to pick a date that is typical of the service you wish to analyze. For example, we use a non holiday Tuesday in May to represent weekday spring service.

  • <start_minute> Integer. The first minute of the first hour for which service will be represented.

  • <end_minute> Integer. The last minute for which service will be represented. Note- many transit agencies schedule trips after midnight as occuring on the following day. For example, a trip leaving at 1:00 AM will have a departure time of 25:00:00.

Tool Properties & Methods

  • calendar - GTFS calendar.txt as a Pandas DataFrame.
  • routes - GTFS routes.txt as a Pandas DataFrame. Only records specific to service date are included.
  • shapes - GTFS shapes.txt as a Pandas DataFrame. Only records specific to service date are included.
  • stop_times - GTFS stop_times.txt as a Pandas DataFrame. Only records specific to service date are included.
  • stops - GTFS stops.txt as a Pandas DataFrame. Only records specific to service date are included.
  • trips - GTFS trips.txt as a Pandas DataFrame. Only records specific to service date are included.
  • service_ids - A list containing each service_id specific to service date, start_time & end_time.
  • schedule_pattern_df - A Pandas DataFrame containing a record for each unique rep_trip_id. Other columns are route_id, orig_trip_id, & shape_id.
  • get_lines_gdf() - Returns a Geopandas GeoDataFrame containing a record for each unique rep_trip_id. The geometry is from the shape_id used for this trip.
  • get_line_stops_gdf() - Returns a GeoPandas GeoDataFrame containing a record for each stop for each rep_trip_id. The geometry is from stop_lat and stop_lon columns on the stops file.
  • get_tph_by_line() - Returns a Pandas DataFrame containing the number of trips by hour for each unique rep_trip_id.
  • get_tph_at_stops() - Returns a Pandas DataFrame containing the number of trips by hour for each unique stop_id.
  • get_service_hours_by_line() - Returns a Pandas DataFrame containing the number of service hours by rep_trip_id.
  • get_total_trips_by_line() - Returns a Pandas DataFrame containing the total number of trips by rep_trip_id.
Clone this wiki locally