[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [cti-users] Parsing corrupt STIXPackages in python-stix
On Tue, May 03, 2016 at 09:19:18PM -0700, Stuart Maclean wrote: > Hi Matthew, could you post a larger snippet of the offending > document, to include the outermost Observable element at least? You > have piqued my interest in whether Java's xml validation handlers > could help out here, small consolation I know for you in the Python > world. > > Stuart The original data is limited distribution / restricted TLP so I can't share it. But it can be reproduced by corrupting any standard file IOC's size field... I kludged around the problem by writing an XML repair function called on new input documents to get rid of the corruption in the field. The concept would be pretty similar in Java, with a bit more grungy XPath code. Matthew def repair_xml(root): nodes = root.findall('.//{*}Size_In_Bytes') for node in nodes: file_size_match = re.search(r'[\d,]+', node.text) file_size = file_size_match.group(0).replace(',', '') if file_size_match else 0 node.text = file_size return root
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]