Skip to content
This repository was archived by the owner on Aug 8, 2024. It is now read-only.
This repository was archived by the owner on Aug 8, 2024. It is now read-only.

Unable to pickle parsed output #27

@evan-burke

Description

@evan-burke

I'm trying to do some multiprocess/distributed processing of apache logs, which uses serialization/deserialization via pickle for moving data between scheduler/worker processes.

However, deserialization fails on the parsed outputs, in my case specifically time_received_tz_datetimeobj and time_received_utc_datetimeobj, for input strings like:

import apache_log_parser
import pickle 

mylist = ['157.55.39.31 - - [21/Mar/2019:07:56:41 +0000] "GET / HTTP/1.1" 200 6878 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"',
          '40.77.167.37 - - [21/Mar/2019:07:59:11 +0000] "GET / HTTP/1.1" 301 469 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"'
         ]

logparser = apache_log_parser.make_parser('%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"')

parsed_logline = logparser(mylist[0])
_ = pickle.dumps(parsed_logline)
# this causes error:  
pickle.loads(_)

(This is in python 3.66, and apache log parser 1.7.0, by the way.)

I can fix this in my implementation by converting the '0000' timezone to UTC:

def to_utc(datetimeobj):
	if str(datetimeobj.tzinfo) == "'0000'":
		return datetimeobj.astimezone(datetime.timezone.utc)
	else:
		return datetimeobj

parsed_logline['time_received_tz_datetimeobj'] = to_utc(parsed_logline['time_received_tz_datetimeobj'])
parsed_logline['time_received_utc_datetimeobj'] = to_utc(parsed_logline['time_received_utc_datetimeobj'])

But this seems like something more appropriate to do in the parser. That said, I'm not sure if this would break backwards compatibility with other Python versions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions