parsing - Creating a tree/deeply nested dict from an indented text file in python -


basically, want iterate through file , put contents of each line nested dict, structure of defined amount of whitespace @ start of each line.

essentially aim take this:

a     b         c     d         e 

and turn this:

{"a":{"b":"c","d":"e"}} 

or this:

apple     colours         red         yellow         green     type         granny smith     price         0.10 

into this:

{"apple":{"colours":["red","yellow","green"],"type":"granny smith","price":0.10} 

so can send python's json module , make json.

at moment i'm trying make dict , list in steps such:

  1. {"a":""} ["a"]
  2. {"a":"b"} ["a"]
  3. {"a":{"b":"c"}} ["a","b"]
  4. {"a":{"b":{"c":"d"}}}} ["a","b","c"]
  5. {"a":{"b":{"c":"d"},"e":""}} ["a","e"]
  6. {"a":{"b":{"c":"d"},"e":"f"}} ["a","e"]
  7. {"a":{"b":{"c":"d"},"e":{"f":"g"}}} ["a","e","f"]

etc.

the list acts 'breadcrumbs' showing last put in dict.

to need way iterate through list , generate dict["a"]["e"]["f"] @ last dict. i've had @ autovivification class has made looks useful i'm unsure of:

  1. whether i'm using right data structure (i'm planning send json library create json object)
  2. how use autovivification in instance
  3. whether there's better way in general approach problem.

i came following function doesn't work:

def get_nested(dict,array,i): if != none:     += 1     if array[i] in dict:         return get_nested(dict[array[i]],array)     else:         return dict else:     = 0     return get_nested(dict[array[i]],array) 

would appreciate help!

(the rest of extremely incomplete code here:)

#import relevant libraries import codecs import sys  #functions def stripped(str):     if tab_spaced:         return str.lstrip('\t').rstrip('\n\r')     else:         return str.lstrip().rstrip('\n\r')  def current_ws():     if whitespacing == 0 or not tab_spaced:         return len(line) - len(line.lstrip())     if tab_spaced:         return len(line) - len(line.lstrip('\t\n\r'))  def get_nested(adict,anarray,i):     if != none:         += 1         if anarray[i] in adict:             return get_nested(adict[anarray[i]],anarray)         else:             return adict     else:         = 0         return get_nested(adict[anarray[i]],anarray)  #initialise variables jsondict = {} unclosed_tags = [] debug = []  vividfilename = 'simple.vivid' # vividfilename = sys.argv[1] if len(sys.argv)>2:     jsfilename = sys.argv[2] else:     jsfilename = vividfilename.split('.')[0] + '.json'  whitespacing = 0 whitespace_array = [0,0] tab_spaced = false  #open file codecs.open(vividfilename,'ru', "utf-8-sig") vividfile:     line in vividfile:         #work out how many whitespaces @ start         whitespace_array.append(current_ws())          #for first line whitespace, work out whitespacing (eg tab vs 4-space)         if whitespacing == 0 , whitespace_array[-1] > 0:             whitespacing = whitespace_array[-1]             if line[0] == '\t':                 tab_spaced = true          #strip out whitespace @ start , end         stripped_line = stripped(line)          if whitespace_array[-1] == 0:             jsondict[stripped_line] = ""             unclosed_tags.append(stripped_line)          if whitespace_array[-2] < whitespace_array[-1]:             oldnested = get_nested(jsondict,whitespace_array,none)             print oldnested             # jsondict.pop(unclosed_tags[-1])             # jsondict[unclosed_tags[-1]]={stripped_line:""}             # unclosed_tags.append(stripped_line)          print jsondict         print unclosed_tags  print jsondict print unclosed_tags 

here recursive solution. first, transform input in following way.

input:

person:     address:         street1: 123 bar st         street2:          city: madison         state: wi         zip: 55555     web:         email: boo@baz.com 

first-step output:

[{'name':'person','value':'','level':0},  {'name':'address','value':'','level':1},  {'name':'street1','value':'123 bar st','level':2},  {'name':'street2','value':'','level':2},  {'name':'city','value':'madison','level':2},  {'name':'state','value':'wi','level':2},  {'name':'zip','value':55555,'level':2},  {'name':'web','value':'','level':1},  {'name':'email','value':'boo@baz.com','level':2}] 

this easy accomplish split(':') , counting number of leading tabs:

def tab_level(astr):     """count number of leading tabs in string     """     return len(astr)- len(astr.lstrip('\t')) 

then feed first-step output following function:

def ttree_to_json(ttree,level=0):     result = {}     in range(0,len(ttree)):         cn = ttree[i]         try:             nn  = ttree[i+1]         except:             nn = {'level':-1}          # edge cases         if cn['level']>level:             continue         if cn['level']<level:             return result          # recursion         if nn['level']==level:             dict_insert_or_append(result,cn['name'],cn['value'])         elif nn['level']>level:             rr = ttree_to_json(ttree[i+1:], level=nn['level'])             dict_insert_or_append(result,cn['name'],rr)         else:             dict_insert_or_append(result,cn['name'],cn['value'])             return result     return result 

where:

def dict_insert_or_append(adict,key,val):     """insert value in dict @ key if 1 not exist     otherwise, convert value list , append     """     if key in adict:         if type(adict[key]) != list:             adict[key] = [adict[key]]         adict[key].append(val)     else:         adict[key] = val 

Comments

Popular posts from this blog

curl - PHP fsockopen help required -

HTTP/1.0 407 Proxy Authentication Required PHP -

c# - Resource not found error -