Like I always believed, there is a way to represent each and every natural language sentence using mathematical notations. Its been an year since I have been playing around with various AI algorithms and after countless failed attempts today I made some progress.
A sentence can be divided into three parts as explained in Subject-Verb-Object. So a good AI who can understand a user must be able to parse a sentence or a question into these three categories.
This is what I discovered:
- Identify the types of words that does not have any ambiguous meaning ex:
in, for, but, what, why , the, an etc. - Replace the remaining words or nouns in the sentence with
'α', 'β', 'γ', 'δ', 'ε', 'ζ', 'η', 'θ' and so on as they appear in a sentence. - Remove all determinants like
The, An, A - Replace some of the identified keywords with mathematical notations. ex:
ofbecomes ∈ (I am yet to identify notations for the rest of the keywords) - Replace all verbs (action words) with Δ
- When you run this algorithm, all samples of sentences show a similar pattern and just be replacing the variables you can identify the object, subject and verbs.
Examples:
Question: What Polynesian people inhabit New Zealand ?
The above question after running through the code becomes
{
"symbol": "What α Δ β",
"sentence": "What Polynesian people inhabit New Zealand ?",
"processed": " What α inhabit β ",
"qtype": "what",
"α": "Polynesian people",
"β": "New Zealand"
}
As you can see in the output above, our question What Polynesian people inhabit New Zealand can be represented symbolically as What α Δ β
The equations of type α Δ β implies α (subject), Δ (verb), β (object). So a smart bot needs to look up for New Zealand and search for the words “Polynesian people” and “inhabit”.
Few more sample outputs:
[
{
"symbol": "What α Δ β",
"sentence": "What actor first portrayed James Bond ?",
"processed": " What α portrayed β ",
"qtype": "what",
"α": "actor first",
"β": "James Bond ?"
},
{
"symbol": "What α Δ β",
"sentence": "What Soviet leader owned a Rolls-Royce ?",
"processed": " What α owned β ",
"qtype": "what",
"α": "Soviet leader",
"β": "Rolls-Royce ?"
},
{
"symbol": "What α Δ β",
"sentence": "What crop failure caused the Irish Famine ?",
"processed": " What α caused β ",
"qtype": "what",
"α": "crop failure",
"β": "Irish Famine ?"
},
{
"symbol": "What α Δ β",
"sentence": "What country 's people are the top television watchers ?",
"processed": " What α are β ",
"qtype": "what",
"α": "country ' s people",
"β": "top television watchers ?"
},
{
"symbol": "Which α Δ β",
"sentence": "Which NBA players had jersey number 0 ?",
"processed": " Which α had β ",
"qtype": "which",
"α": "NBA players",
"β": "jersey number 0 ?"
},
{
"symbol": "Which α Δ β",
"sentence": "Which country did Hitler rule ?",
"processed": " Which α did β ",
"qtype": "which",
"α": "country",
"β": "Hitler rule ?"
},
{
"symbol": "Which α Δ β",
"sentence": "Which language has the most words ?",
"processed": " Which α has β ",
"qtype": "which",
"α": "language",
"β": "most words ?"
}
]
There are some sample outputs which are of type: α Δ β in γ
{
"symbol": "Which α Δ β in γ",
"sentence": "Which Ventura County police department seized the largest cocaine shipment in it 's history ?",
"processed": " Which α seized β in γ ",
"qtype": "which",
"α": "Ventura County police department",
"β": "largest cocaine shipment",
"γ": "it ' s history ?"
},
{
"symbol": "Which α Δ β in γ",
"sentence": "Which cats pursued Tweety Pie in his first cartoon appearance ?",
"processed": " Which α pursued β in γ ",
"qtype": "which",
"α": "cats",
"β": "Tweety Pie",
"γ": "his first cartoon appearance ?"
},
{
"symbol": "Which α Δ β in γ",
"sentence": "Which team won the Super Bowl in 1968 ?",
"processed": " Which α won β in γ ",
"qtype": "which",
"α": "team",
"β": "Super Bowl",
"γ": "1968 ?"
}
The formula still works but has an extra parameter γ which gives more meaning to our object.
After running my program on more than 5000 questions, I have around 100 different equations. While most of them fall into the above 3 types but there are few much longer like α Δ β as γ with δ and α ∈ β in which γ is Δ without δ which I am yet to equate to SVO.