parsing - Creating a tree/deeply nested dict from an indented text file in python -
basically, want iterate through file , put contents of each line nested dict, structure of defined amount of whitespace @ start of each line.
essentially aim take this:
a b c d e
and turn this:
{"a":{"b":"c","d":"e"}}
or this:
apple colours red yellow green type granny smith price 0.10
into this:
{"apple":{"colours":["red","yellow","green"],"type":"granny smith","price":0.10}
so can send python's json module , make json.
at moment i'm trying make dict , list in steps such:
{"a":""} ["a"]
{"a":"b"} ["a"]
{"a":{"b":"c"}} ["a","b"]
{"a":{"b":{"c":"d"}}}} ["a","b","c"]
{"a":{"b":{"c":"d"},"e":""}} ["a","e"]
{"a":{"b":{"c":"d"},"e":"f"}} ["a","e"]
{"a":{"b":{"c":"d"},"e":{"f":"g"}}} ["a","e","f"]
etc.
the list acts 'breadcrumbs' showing last put in dict.
to need way iterate through list , generate dict["a"]["e"]["f"]
@ last dict. i've had @ autovivification class has made looks useful i'm unsure of:
- whether i'm using right data structure (i'm planning send json library create json object)
- how use autovivification in instance
- whether there's better way in general approach problem.
i came following function doesn't work:
def get_nested(dict,array,i): if != none: += 1 if array[i] in dict: return get_nested(dict[array[i]],array) else: return dict else: = 0 return get_nested(dict[array[i]],array)
would appreciate help!
(the rest of extremely incomplete code here:)
#import relevant libraries import codecs import sys #functions def stripped(str): if tab_spaced: return str.lstrip('\t').rstrip('\n\r') else: return str.lstrip().rstrip('\n\r') def current_ws(): if whitespacing == 0 or not tab_spaced: return len(line) - len(line.lstrip()) if tab_spaced: return len(line) - len(line.lstrip('\t\n\r')) def get_nested(adict,anarray,i): if != none: += 1 if anarray[i] in adict: return get_nested(adict[anarray[i]],anarray) else: return adict else: = 0 return get_nested(adict[anarray[i]],anarray) #initialise variables jsondict = {} unclosed_tags = [] debug = [] vividfilename = 'simple.vivid' # vividfilename = sys.argv[1] if len(sys.argv)>2: jsfilename = sys.argv[2] else: jsfilename = vividfilename.split('.')[0] + '.json' whitespacing = 0 whitespace_array = [0,0] tab_spaced = false #open file codecs.open(vividfilename,'ru', "utf-8-sig") vividfile: line in vividfile: #work out how many whitespaces @ start whitespace_array.append(current_ws()) #for first line whitespace, work out whitespacing (eg tab vs 4-space) if whitespacing == 0 , whitespace_array[-1] > 0: whitespacing = whitespace_array[-1] if line[0] == '\t': tab_spaced = true #strip out whitespace @ start , end stripped_line = stripped(line) if whitespace_array[-1] == 0: jsondict[stripped_line] = "" unclosed_tags.append(stripped_line) if whitespace_array[-2] < whitespace_array[-1]: oldnested = get_nested(jsondict,whitespace_array,none) print oldnested # jsondict.pop(unclosed_tags[-1]) # jsondict[unclosed_tags[-1]]={stripped_line:""} # unclosed_tags.append(stripped_line) print jsondict print unclosed_tags print jsondict print unclosed_tags
here recursive solution. first, transform input in following way.
input:
person: address: street1: 123 bar st street2: city: madison state: wi zip: 55555 web: email: boo@baz.com
first-step output:
[{'name':'person','value':'','level':0}, {'name':'address','value':'','level':1}, {'name':'street1','value':'123 bar st','level':2}, {'name':'street2','value':'','level':2}, {'name':'city','value':'madison','level':2}, {'name':'state','value':'wi','level':2}, {'name':'zip','value':55555,'level':2}, {'name':'web','value':'','level':1}, {'name':'email','value':'boo@baz.com','level':2}]
this easy accomplish split(':')
, counting number of leading tabs:
def tab_level(astr): """count number of leading tabs in string """ return len(astr)- len(astr.lstrip('\t'))
then feed first-step output following function:
def ttree_to_json(ttree,level=0): result = {} in range(0,len(ttree)): cn = ttree[i] try: nn = ttree[i+1] except: nn = {'level':-1} # edge cases if cn['level']>level: continue if cn['level']<level: return result # recursion if nn['level']==level: dict_insert_or_append(result,cn['name'],cn['value']) elif nn['level']>level: rr = ttree_to_json(ttree[i+1:], level=nn['level']) dict_insert_or_append(result,cn['name'],rr) else: dict_insert_or_append(result,cn['name'],cn['value']) return result return result
where:
def dict_insert_or_append(adict,key,val): """insert value in dict @ key if 1 not exist otherwise, convert value list , append """ if key in adict: if type(adict[key]) != list: adict[key] = [adict[key]] adict[key].append(val) else: adict[key] = val
Comments
Post a Comment