Like I always believed, there is a way to represent each and every natural language sentence using mathematical notations. Its been an year since I have been playing around with various AI algorithms and after countless failed attempts today I made some progress.
A sentence can be divided into three parts as explained in Subject-Verb-Object. So a good AI who can understand a user must be able to parse a sentence or a question into these three categories.
This is what I discovered:
- Identify the types of words that does not have any ambiguous meaning ex:
in, for, but, what, why , the, an etc.
- Replace the remaining words or nouns in the sentence with
'α', 'β', 'γ', 'δ', 'ε', 'ζ', 'η', 'θ' and so on as they appear in a sentence.
- Remove all determinants like
The, An, A
- Replace some of the identified keywords with mathematical notations. ex:
of
becomes ∈ (I am yet to identify notations for the rest of the keywords) - Replace all verbs (action words) with Δ
- When you run this algorithm, all samples of sentences show a similar pattern and just be replacing the variables you can identify the object, subject and verbs.
Examples:
Question: What Polynesian people inhabit New Zealand ? The above question after running through the code becomes { "symbol": "What α Δ β", "sentence": "What Polynesian people inhabit New Zealand ?", "processed": " What α inhabit β ", "qtype": "what", "α": "Polynesian people", "β": "New Zealand" }
As you can see in the output above, our question What Polynesian people inhabit New Zealand
can be represented symbolically as What α Δ β
The equations of type α Δ β
implies α (subject), Δ (verb), β (object)
. So a smart bot needs to look up for New Zealand and search for the words “Polynesian people” and “inhabit”.
Few more sample outputs:
[ { "symbol": "What α Δ β", "sentence": "What actor first portrayed James Bond ?", "processed": " What α portrayed β ", "qtype": "what", "α": "actor first", "β": "James Bond ?" }, { "symbol": "What α Δ β", "sentence": "What Soviet leader owned a Rolls-Royce ?", "processed": " What α owned β ", "qtype": "what", "α": "Soviet leader", "β": "Rolls-Royce ?" }, { "symbol": "What α Δ β", "sentence": "What crop failure caused the Irish Famine ?", "processed": " What α caused β ", "qtype": "what", "α": "crop failure", "β": "Irish Famine ?" }, { "symbol": "What α Δ β", "sentence": "What country 's people are the top television watchers ?", "processed": " What α are β ", "qtype": "what", "α": "country ' s people", "β": "top television watchers ?" }, { "symbol": "Which α Δ β", "sentence": "Which NBA players had jersey number 0 ?", "processed": " Which α had β ", "qtype": "which", "α": "NBA players", "β": "jersey number 0 ?" }, { "symbol": "Which α Δ β", "sentence": "Which country did Hitler rule ?", "processed": " Which α did β ", "qtype": "which", "α": "country", "β": "Hitler rule ?" }, { "symbol": "Which α Δ β", "sentence": "Which language has the most words ?", "processed": " Which α has β ", "qtype": "which", "α": "language", "β": "most words ?" } ]
There are some sample outputs which are of type: α Δ β in γ
{ "symbol": "Which α Δ β in γ", "sentence": "Which Ventura County police department seized the largest cocaine shipment in it 's history ?", "processed": " Which α seized β in γ ", "qtype": "which", "α": "Ventura County police department", "β": "largest cocaine shipment", "γ": "it ' s history ?" }, { "symbol": "Which α Δ β in γ", "sentence": "Which cats pursued Tweety Pie in his first cartoon appearance ?", "processed": " Which α pursued β in γ ", "qtype": "which", "α": "cats", "β": "Tweety Pie", "γ": "his first cartoon appearance ?" }, { "symbol": "Which α Δ β in γ", "sentence": "Which team won the Super Bowl in 1968 ?", "processed": " Which α won β in γ ", "qtype": "which", "α": "team", "β": "Super Bowl", "γ": "1968 ?" }
The formula still works but has an extra parameter γ
which gives more meaning to our object.
After running my program on more than 5000 questions, I have around 100 different equations. While most of them fall into the above 3 types but there are few much longer like α Δ β as γ with δ
and α ∈ β in which γ is Δ without δ
which I am yet to equate to SVO.