Let's pause for a moment to demystify what Artificial Intelligence (AI) and Machine Learning (ML) are all about. Remember the basics of data analysis from high school or perhaps your introductory courses? You had data points plotted on a two-dimensional graph, with X and Y axes. Fast forward to LLMs and AI today, and we're dealing with multi-dimensional spaces. For example, advanced language models like GPT operate in a high-dimensional space consisting of hundreds of billions of machine learning parameters. 
Fortunately, machine learning excels at mapping functions in these complex, multi-dimensional spaces, allowing us to model and predict phenomena that are far too intricate for traditional statistical methods. According to the journal Nature, ML algorithms have helped identify patterns in healthcare, like predicting patient readmissions or even drug interactions, tasks that are practically impossible for human cognition alone to analyze due to the high-dimensionality and interdependencies of the data. Readmissions are the instances when patients are admitted back to a hospital shortly after being discharged, often for the same or related medical issues. In the healthcare industry, reducing readmissions is crucial for both improving patient care and managing costs. Machine learning algorithms can analyze vast sets of patient data, including medical history, treatment plans, and other variables, to predict the likelihood of readmission. This helps healthcare providers take preventive measures.
However, the power of AI and ML is inherently tied to the quality of the data you provide. A study from IBM indicates that bad data could cost businesses up to $3.1 trillion annually in the U.S alone and this figure encapsulates various forms of "bad data," including incorrect information, missing data, and data that is inconsistently formatted, among other issues. These kinds of data inaccuracies can lead to poor decision-making, operational inefficiencies, and increased risk, ultimately resulting in substantial financial losses for businesses. I often refer to this as dirty data. It's been polluted in some manner or another.
Whether you're making a simple linear regression or deploying a complex neural network, poor-quality or dirty data will yield inaccurate and poor-quality results. 
To sum up, AI and ML are potent tools for extracting nuanced insights from high-dimensional data, a capability increasingly being adopted across various sectors from finance to healthcare. But the key takeaway is this: it all circles back to the quality of your data. Whether you're graphing a line on paper or traversing a multi-dimensional data landscape with ML, your outcomes are only as good as the data you input.
High-dimensionality refers to data that exists in more than three dimensions. In a practical sense, this often means dealing with many different variables or features simultaneously. For example, if we're looking at customer behavior data, we could be considering age, income, spending habits, geographical location, and so on—all different dimensions in which the data exists.
Some implications for high-dimensional data are complexity, curse of dimensionality, and computation costs. 
Complexity: High-dimensional data is complex to analyze. It's much harder to visualize, and traditional statistical methods often struggle with it.
Curse of Dimensionality: As dimensions increase, the volume of the space increases exponentially, making the data sparse. This can lead to overfitting in machine learning models.
Computational Cost: Handling high-dimensional data can be computationally expensive and slow, requiring specialized hardware and software.
Here are a few analogies to Conceptualize High-Dimensionality;
Library Analogy: Imagine each book in a library represents a point in a two-dimensional space—the title and the author. Now, consider adding more dimensions like the year of publication, genre, number of pages, and reader reviews. The "space" in which this library exists becomes increasingly complex, more challenging to catalog, but also richer in information.
Social Media Profile: Think of your social media profile as existing in a high-dimensional space. Each aspect—your posts, your friends, the pages you like, your location, etc.—is a different dimension. The more dimensions, the more accurately the profile can represent you, but it also becomes more complex to analyze for patterns.
Travel Planning: Imagine planning a trip where you consider just two factors: cost and weather. That's a two-dimensional problem. But add in more—flight availability, hotel ratings, sightseeing options, food—and you're suddenly dealing with a high-dimensional problem that is more accurately solved using sophisticated algorithms.
Understanding high-dimensionality can be crucial for various fields like finance, healthcare, marketing, and more, as it allows for much more nuanced analysis and prediction. With machine learning and AI, we have tools capable of navigating these high-dimensional spaces more effectively than traditional methods.
The above concept is something which was generated from a conversation I had with ChatGPT 4, about a recent NASA YouTube video where David Nathaniel Spergel, the chair of NASA's UAP independent study team said something related to the theme, of the above high-dimensionality concept, during the YouTube video of the UAP independent study report. He was explaining AI and ML and their ability to work in the high-dimensional space. https://www.youtube.com/watch?v=TQcqOW39ksk
Back to Top