OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

cti-users message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [cti-users] Parsing corrupt STIXPackages in python-stix


The schema is not used in the Python parse process for the most part according to my reading. The exception is thrown from a silly error checking function for integer valued fields. But I didn't see a clear way, yet, to short circuit the unwanted check. Meanwhile I have a whole to of busted data I want to process that's being lost due to this issue. I might try to do some surgery on it with XPath. 

Matthew. 

> On May 3, 2016, at 5:25 PM, Stuart Maclean <stuart@apl.washington.edu> wrote:
> 
> Hi Matthew,
> 
> I cannot speak for the python parser, since I have only used Java to
> parse stix/cybox.  But if the python parsers are generated from the .xsd
> schema files, like I did for Java, I think the fact that the input
> document is not 'valid' against the xml schema means that you do indeed
> get one big error condition, halting the entire ingest of the input doc
> and leaving you with nothing.
> 
> In my experience, the way around this is a truly awful pre-processing
> step where you 'sanitize' your input docs via use of awk, sed, and THEN
> pass those to the xml parser stage.  Of course these tools are
> line-oriented and do not grok xml data content at all.  Better would
> probably be XPath or XSLT to fix the input, something I have seen but
> never done.
> 
> If you are feeling really ambitious and know the Data Binding
> technologies fairly well (which for Java means JAXB+xjc, not sure of the
> Python equiv) you could amend the .xsd files to 'accommodate' your input
> docs, again a hack.
> 
> The whole notion of schemas as a rigorous definition of allowable
> documents for a vocabulary is a double-edged sword.  Great when they
> work, but awkward in exactly your situation.
> 
> If I find any details of how 'note failure, continue parse' in the Java
> tools at least, I'll follow up.  Like I say, Python is not my arena.
> 
> 
> Stuart
> 
> 
>> On 05/03/2016 04:00 PM, Matthew Hall wrote:
>> I am running into issues parsing STIX Packages containing corrupted Indicators 
>> and/or Observables reliably with python-stix.
>> 
>> Performing some research on the python-stix code, it appears there is not a 
>> good way to catch exceptions at a very granular, per-entity level.
>> 
>> There is some code in the stix.utils.parser module, which in theory seems like 
>> it would help with this, but it doesn't appear to have granular 
>> exception-catching capability either.
>> 
>> Therefore, when the code comes across a CybOX FileObj w/ a bogus 
>> Size_In_Bytes, the exception disrupts parsing the entire STIX Package not just 
>> the corrupted / invalid entity:
>> 
>> <FileObj:Size_In_Bytes condition="Equals">380058 bytes</FileObj:Size_In_Bytes>
>> 
>> ValueError: invalid literal for long() with base 10: '380058 bytes'
>> File ".../venv/lib/python2.7/site-packages/cybox/common/properties.py", line 514, in _parse_value
>>  return long(value, 0)
>> 
>> How can I perform a best-effort parse with python-stix in order to operate as 
>> properly as possible in such situations?
> 
> This publicly archived list provides a forum for asking questions,
> offering answers, and discussing topics of interest on STIX,
> TAXII, and CybOX.  Users and developers of solutions that leverage
> STIX, TAXII and CybOX are invited to participate.
> 
> In order to verify user consent to OASIS mailing list guidelines
> and to minimize spam in the list archive, subscription is required
> before posting.
> 
> Subscribe: cti-users-subscribe@lists.oasis-open.org
> Unsubscribe: cti-users-unsubscribe@lists.oasis-open.org
> Post: cti-users@lists.oasis-open.org
> List help: cti-users-help@lists.oasis-open.org
> List archive: http://lists.oasis-open.org/archives/cti-users/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
> CTI Technical Committee: https://www.oasis-open.org/committees/cti/
> Join OASIS: http://www.oasis-open.org/join/
> 



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]