标签:
For a given chemical formula represented by a string, count the number of atoms of each element contained in the molecule and return an object.
1 water = ‘H2O‘
2 parse_molecule(water)
3 # return {H: 2, O: 1}
4
5 magnesium_hydroxide = ‘Mg(OH)2‘ parse_molecule(magnesium_hydroxide)
6 # return {Mg: 1, O: 2, H: 2}
7
8 var fremy_salt = ‘K4[ON(SO3)2]2‘
9 parse_molecule(fremySalt)
10 # return {K: 4, O: 14, N: 2, S: 4}
这个题大意就是将分子表达式转化成原子(词典表示),在codewars上难度是3kyu,难点在于各种条件的分析,防止越界,还有分子式中的各种限制。
我的思路大概就是把方括号,大括号都转换成括弧先,依次把最内层,然后外层的括弧展开,最后得到一个没有括弧的表达式,这就很好处理了。这里有个寻找最内层括弧的问题,我的理解是,先找到第一个‘)‘,然后往前找与之对应的‘(‘,用展开后的结果代替‘(...)2‘,我用2代替括弧后面的数字,有可能这个数字是1,自然就省略了,我们要在转换过程中把1补上。在最后的处理中,我们也要注意1是被省略的,需要计算时加上。
代码如下:
1 def parse_molecule(formula): 2 formula_dict = {} 3 #替换[]{}为() 4 for bracket in ‘[{‘: 5 formula = formula.replace(bracket, ‘(‘) 6 for bracket in ‘]}‘: 7 formula = formula.replace(bracket, ‘)‘) 8 9 if ‘(‘ in formula: 10 has_bracket = True 11 else: 12 has_bracket = False 13 while has_bracket: 14 #寻找最内层的() 15 for i in range(len(formula)): 16 if formula[i] == ‘)‘: 17 break 18 for j in range(len(formula[:i])-1, -1, -1): 19 if formula[j] == ‘(‘: 20 break 21 #如果有省略的1,补上 22 if i+1 == len(formula) or not formula[i+1].isdigit(): 23 sub_formula = formula[j: i+1] 24 #为了防止后面的replace出错,设置了临时变量,否则 25 #如果直接sub_formula = formula[j: i+1] + ‘1‘ 26 #sub_formula变成了formula中没有的子串,就不执行 27 #这个循序就会一直进行下去 28 tmp = sub_formula + ‘1‘ 29 else: 30 sub_formula = formula[j: i+2] 31 tmp = sub_formula 32 parsed_sub_formula = parse_paren(tmp) 33 formula = formula.replace(sub_formula, parsed_sub_formula) 34 if ‘(‘ in formula: 35 has_bracket = True 36 else: 37 has_bracket = False 38 #处理没有()的分子表达式 39 i = 0 40 while i < len(formula): 41 j = i+1 42 if j < len(formula) and formula[j].islower(): 43 j += 1 44 tmp = formula[i: j] 45 #注意边界的处理防止j越界 46 #我这里有个小bug,我假设的是原子下标都是最多两位,如果出现三位 47 #就会把第三位当做一个元素且下标为1 48 #没想到也通过了 49 if j < len(formula) and formula[j].isdigit(): 50 k = j+1 51 if k < len(formula) and formula[k].isdigit(): 52 formula_dict[tmp] = formula_dict.get(tmp, 0) + int(formula[j: k+1]) 53 i = k+1 54 else: 55 formula_dict[tmp] = formula_dict.get(tmp, 0) + int(formula[j]) 56 i = j+1 57 elif j < len(formula) and formula[j].isupper(): 58 formula_dict[tmp] = formula_dict.get(tmp, 0) + 1 59 i = j 60 elif j == len(formula): 61 formula_dict[tmp] = formula_dict.get(tmp, 0) + 1 62 break 63 64 return formula_dict 65 66 def parse_paren(sub_formula): 67 result = {} 68 times = int(sub_formula[-1]) 69 i = 1 70 while i < len(sub_formula)-2: 71 j = i+1 72 if sub_formula[j].islower(): 73 j += 1 74 tmp = sub_formula[i: j] 75 if sub_formula[j].isdigit(): 76 k = j+1 77 #此处也是假设原子下标为最多两位 78 if k < len(sub_formula)-2 and sub_formula[k].isdigit(): 79 result[tmp] = result.get(tmp, 0) + int(sub_formula[j: k+1])*times 80 i = k+1 81 else: 82 result[tmp] = result.get(tmp, 0) + int(sub_formula[j])*times 83 i = j+1 84 elif sub_formula[j].isupper() or sub_formula[j] == ‘)‘: 85 result[tmp] = result.get(tmp, 0) + 1*times 86 i = j 87 88 t = [] 89 for key, val in result.iteritems(): 90 t.append(key) 91 t.append(str(val)) 92 return ‘‘.join(t) 93 94 #测试的时候故意加了一些乱七八糟的分子表达式,但还符合规则 95 print parse_molecule(‘K4[ON(SO3)2]2‘) 96 print parse_molecule(‘(H2O)H10‘) 97 print parse_molecule(‘(OH123)2‘)
虽然也通过了,但是代码中的bug有时间再改(不知何时了,反正被折磨得够呛,下次下次......水平太差了)
不过好像用正则表达式更好的样子,那就stay tuned...
标签:
原文地址:http://www.cnblogs.com/FARAMIR/p/4634470.html