OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

cti-users message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [cti-users] Parsing corrupt STIXPackages in python-stix


On Tue, May 03, 2016 at 09:19:18PM -0700, Stuart Maclean wrote:
> Hi Matthew, could you post a larger snippet of the offending
> document, to include the outermost Observable element at least?  You
> have piqued my interest in whether Java's xml validation handlers
> could help out here, small consolation I know for you in the Python
> world.
> 
> Stuart

The original data is limited distribution / restricted TLP so I can't share 
it. But it can be reproduced by corrupting any standard file IOC's size 
field...

I kludged around the problem by writing an XML repair function called on new 
input documents to get rid of the corruption in the field. The concept would 
be pretty similar in Java, with a bit more grungy XPath code.

Matthew

def repair_xml(root):
    nodes = root.findall('.//{*}Size_In_Bytes')

    for node in nodes:
        file_size_match = re.search(r'[\d,]+', node.text)
        file_size = file_size_match.group(0).replace(',', '') if file_size_match else 0
        node.text = file_size

    return root


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]