[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [cti-users] Parsing corrupt STIXPackages in python-stix
The schema is not used in the Python parse process for the most part according to my reading. The exception is thrown from a silly error checking function for integer valued fields. But I didn't see a clear way, yet, to short circuit the unwanted check. Meanwhile I have a whole to of busted data I want to process that's being lost due to this issue. I might try to do some surgery on it with XPath. Matthew. > On May 3, 2016, at 5:25 PM, Stuart Maclean <stuart@apl.washington.edu> wrote: > > Hi Matthew, > > I cannot speak for the python parser, since I have only used Java to > parse stix/cybox. But if the python parsers are generated from the .xsd > schema files, like I did for Java, I think the fact that the input > document is not 'valid' against the xml schema means that you do indeed > get one big error condition, halting the entire ingest of the input doc > and leaving you with nothing. > > In my experience, the way around this is a truly awful pre-processing > step where you 'sanitize' your input docs via use of awk, sed, and THEN > pass those to the xml parser stage. Of course these tools are > line-oriented and do not grok xml data content at all. Better would > probably be XPath or XSLT to fix the input, something I have seen but > never done. > > If you are feeling really ambitious and know the Data Binding > technologies fairly well (which for Java means JAXB+xjc, not sure of the > Python equiv) you could amend the .xsd files to 'accommodate' your input > docs, again a hack. > > The whole notion of schemas as a rigorous definition of allowable > documents for a vocabulary is a double-edged sword. Great when they > work, but awkward in exactly your situation. > > If I find any details of how 'note failure, continue parse' in the Java > tools at least, I'll follow up. Like I say, Python is not my arena. > > > Stuart > > >> On 05/03/2016 04:00 PM, Matthew Hall wrote: >> I am running into issues parsing STIX Packages containing corrupted Indicators >> and/or Observables reliably with python-stix. >> >> Performing some research on the python-stix code, it appears there is not a >> good way to catch exceptions at a very granular, per-entity level. >> >> There is some code in the stix.utils.parser module, which in theory seems like >> it would help with this, but it doesn't appear to have granular >> exception-catching capability either. >> >> Therefore, when the code comes across a CybOX FileObj w/ a bogus >> Size_In_Bytes, the exception disrupts parsing the entire STIX Package not just >> the corrupted / invalid entity: >> >> <FileObj:Size_In_Bytes condition="Equals">380058 bytes</FileObj:Size_In_Bytes> >> >> ValueError: invalid literal for long() with base 10: '380058 bytes' >> File ".../venv/lib/python2.7/site-packages/cybox/common/properties.py", line 514, in _parse_value >> return long(value, 0) >> >> How can I perform a best-effort parse with python-stix in order to operate as >> properly as possible in such situations? > > This publicly archived list provides a forum for asking questions, > offering answers, and discussing topics of interest on STIX, > TAXII, and CybOX. Users and developers of solutions that leverage > STIX, TAXII and CybOX are invited to participate. > > In order to verify user consent to OASIS mailing list guidelines > and to minimize spam in the list archive, subscription is required > before posting. > > Subscribe: cti-users-subscribe@lists.oasis-open.org > Unsubscribe: cti-users-unsubscribe@lists.oasis-open.org > Post: cti-users@lists.oasis-open.org > List help: cti-users-help@lists.oasis-open.org > List archive: http://lists.oasis-open.org/archives/cti-users/ > List Guidelines: http://www.oasis-open.org/maillists/guidelines.php > CTI Technical Committee: https://www.oasis-open.org/committees/cti/ > Join OASIS: http://www.oasis-open.org/join/ >
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]