Skip to content

Blocking exception in XML filter for XPATH parsing #3284

Closed
@clementdevos

Description

@clementdevos

Hello All,

I'm using logstash to parse XML files downloaded with wget using XPATH to transform it to JSON data.

It sometimes occurs that the xpath parsing fails resulting in a blocking exception :

Exception in filterworker {"exception"=>#<Nokogiri::XML::XPath::SyntaxError: /v2:Report/v2:Indicator[@id='1']/@value>, "backtrace"=>["nokogiri/XmlXpathContext.java:123:in `evaluate'", "/home/ubuntu/bin/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/nokogiri-1.6.1-java/lib/nokogiri/xml/node.rb:159:in `xpath'", "org/jruby/RubyArray.java:2409:in `map'", "/home/ubuntu/bin/logstash-1.4.2/vendor/bundle/jruby/1.9/gems/nokogiri-1.6.1-java/lib/nokogiri/xml/node.rb:150:in `xpath'", "/home/ubuntu/bin/logstash-1.4.2/lib/logstash/filters/xml.rb:103:in `filter'", "org/jruby/RubyHash.java:1339:in `each'", "/home/ubuntu/bin/logstash-1.4.2/lib/logstash/filters/xml.rb:102:in `filter'", "(eval):117:in `initialize'", "org/jruby/RubyProc.java:271:in `call'", "/home/ubuntu/bin/logstash-1.4.2/lib/logstash/pipeline.rb:262:in `filter'", "/home/ubuntu/bin/logstash-1.4.2/lib/logstash/pipeline.rb:203:in `filterworker'", "/home/ubuntu/bin/logstash-1.4.2/lib/logstash/pipeline.rb:143:in `start_filters'"], :level=>:error} 

When this happens logstash stops processing the files, i cannot kill logstash with Ctrl+C and i have to kill it with its pid then flush the sincedb file. I also remove the downloaded xml files to prevent them from being inserted twice in elasticsearch when i start logstash again.

When logstash starts again, the files get parsed without problem...

The files are downloaded with a ".xml.dwl" extention to prevent the file input from using them.
The downloaded files are then transformed then moved in the folder the file watches to a ".xml" extension file.
Is it possible that logstash tries to read the XML files while they're not fully moved to the destination, even though they're 40KB max ?

Is it possible to make logstash fails safely and not interrupt the parsing?

I'm currently running logstash 1.4.2

Cheers.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions