Founder at UpLook.AI, Mexico, Guadalajara
INNOVATIVE APPROACHES TO DATA ANALYSIS IN COMMERCIAL IT PROJECTS
ABSTRACT
This scientific article addresses the critical role of advanced data analysis in contemporary commercial IT projects. The world's data generation has escalated remarkably and is expected to surpass 180 zettabytes by 2025. The study focuses on the transition to edge computing due to the limitations of cloud computing in handling enormous volumes of real-time data. Over the past decade, interest in edge computing has grown sixtyfold, reducing latency and enhancing data processing efficiency. The article also discusses the importance of data analytics tools in decision-making and business strategies. Furthermore, it explores the rise of Data as a Service (DaaS) and the democratization of data access within organizations, enabling even non-technical users to analyze data efficiently. Additionally, the paper delves into the concept of data grids and the strategic use of synthetic data, particularly in AI and analytics, while maintaining privacy and security. The article emphasizes integrating innovative data analysis methods for competitive advantage and sustainable development in the rapidly evolving business landscape.
АННОТАЦИЯ
В данной статье рассматривается важнейшая роль расширенного анализа данных в современных коммерческих ИТ-проектах. Объем генерируемых данных в мире значительно возрос, ожидается, что к 2025 году он превысит 180 зеттабайт. Исследование сосредоточено на переходе к пограничным вычислениям из-за ограничений облачных вычислений в обработке огромных объемов данных в режиме реального времени. Пограничные вычисления, интерес к которым за последнее десятилетие вырос в 60 раз, сокращают время ожидания и повышают эффективность обработки данных. В статье также обсуждается важность инструментов анализа данных для принятия решений и бизнес-стратегий. Кроме того, в ней исследуется развитие Data as a Service (DaaS) и демократизация доступа к данным внутри организаций, что позволяет даже нетехническим пользователям эффективно анализировать данные. Кроме того, в статье рассматривается концепция сетей данных и стратегическое использование синтетических данных, особенно в области искусственного интеллекта и аналитики, при сохранении конфиденциальности и безопасности. В заключение в статье подчеркивается необходимость интеграции инновационных методов анализа данных для получения конкурентных преимуществ и устойчивого развития в быстро меняющемся бизнес-ландшафте.
Keywords: IT, programming, software, data analysis, big data, BIG data, IT projects.
Ключевые слова: IT, программирование, ПО, анализ данных, большие данные, BIG data, IT-проекты.
Introduction
The world generates more than 64 zettabytes of data annually — this corresponds to 64 trillion gigabytes of data from 23.8 billion connected devices. By 2025, the volume of global data is expected to exceed 180 zettabytes, with more than 41 billion connected devices.
Enterprises that provide their managers with the necessary tools for data analysis and platforms for making informed decisions based on this data can fully exploit the potential inherent in the information. Companies that ignore this aspect need a critical competitive advantage.
In the past ten years, interest in “peripheral computing" has grown more than 60 times. According to IDC's analysis, global peripheral computing costs are expected to reach $208 billion by the end of this year, which is 13.1% more than in 2022. Peripheral computing reduces latency and improves data processing efficiency [4].
At the same time, exponential growth in data volume has forced many organizations to implement cloud storage solutions. However, even cloud computing turned out to need to be more prepared to handle the vast and ever-growing stream of real data generated daily. Bandwidth limitations, data transmission delays, and network failures can seriously harm critical industrial and commercial data processing processes, increasing operating costs and risks.
According to data experts, peripheral computing is the best solution. These calculations represent a resource-intensive, often repetitive analysis of essential data directly on devices on the network's outer edge. Only the generalized data is sent to the cloud storage for deeper processing.
Data processing means the systematization of information, its logical organization, understanding, and visualization to extract valuable conclusions. This is important for making informed decisions based on digital facts. One of the main goals of data analysis is to identify patterns and patterns. For example, in the retail industry, finding connections in data is crucial for optimizing customer service.
Figurе 1. Directions of digital transformation
Data analytics and tools for this are specialized software specialists use for data analysis. They help to create analytical models, which contribute to more efficient business decisions while optimizing costs and increasing profits. Enterprises perceive such solutions as a competitive advantage. Using these tools enables working with vast amounts of data, providing information for analysis and forecasting. Big data analysis technologies increasingly integrate machine learning and artificial intelligence into the database for more accurate conclusions and forecasts [6].
Large amounts of data are not just a fashionable phenomenon. This is a crucial asset for any company. Ninety-five percent of organizations recognize that managing unstructured data is difficult in their field.
Figure 2. The Big Data market
Robust strategies for processing vast amounts of information are needed to make optimal use of the potential that lies in big data. Tools for analyzing voluminous data are of particular value in such conditions. They help identify patterns, uncover trends, and provide reliable information for decision-makers to make business decisions. Big data analysis tools provide companies with many opportunities to gain competitive advantages and are essential for interpreting this data [2].
1. Advantages of data analysis
One of the remarkable advantages of such analytics is its ability to analyze unstructured data such as phone calls.
For example, university hospitals in Cleveland, Ohio, receive over 400,000 phone calls monthly. Previously, employees had to listen to and document information manually.
However, after Invoca's artificial intelligence data analysis platform was introduced, hospitals could automate this process and save at least 40 hours of labor per week.
The Invoca platform can track conversions and phone call results. Advanced analytics also includes the ability to optimize pricing strategies and demand forecasting. For example, artificial intelligence can analyze customer data to identify purchasing trends and use dynamic pricing to increase revenue. It also enables tracking competitor data to adapt pricing strategies.
The advanced analytics market is multiplying. According to Research and Markets forecasts, the company expects an average annual growth rate of around 26% until 2027, which could increase market volume to more than $ 32 billion.
2. Using business intelligence to collect information
Business intelligence tools use raw data to identify meaningful patterns and practical ideas.
An example of data visualization from a business intelligence platform is a BI tool. Many business leaders consider BI essential for the survival and success of an organization.
According to the survey, about a quarter of organizations currently use BI, but if we consider companies with more than 5,000 employees, this number increases to 80%.
An example of a BI application is Delta Airlines, where the BI platform helps track baggage handling and identifies baggage-related problems and delays to improve the customer experience.
BI can also significantly improve the effectiveness of marketing campaigns by creating personalized sales and marketing based on customer profiles and segmentation.
BI platforms also play a vital role in the digitalization of the manufacturing industry, helping to improve supply chains, avoid delays, and optimize production processes while maintaining product quality. They also help to minimize disruptions and costs in the supply chain.
3. Diverse perspectives on the use of peripheral computing
With the increase in data volume in recent years and the need for operational analytics, many enterprises are turning to peripheral computing, processing data directly on the devices where they are created. According to Gartner forecasts, more than 50% of essential data will be created and processed outside data centers and enterprise cloud platforms by 2025.
The benefits of peripheral computing also include increased data security and privacy. Since data does not leave the device and is not transferred to the cloud, it remains more protected from security threats.
Peripheral analytics is vital in Industry 4.0, especially in industrial sectors where many IoT devices generate data daily. These data require instant processing, and peripheral computing provides their operational analysis.
One of the brightest examples of successful use of peripheral analytics is the United States Postal Service (USPS), which has literally put into force this technology in the course of searching for a lost parcel. Peripheral analytics allowed for processing millions of images of packages daily and shortened the time that was spent searching for the lost parcel by several days—from 8 to 10 people to just one specialist within several hours [7].
4. The role of data as a service
In other words, the data volume increases every day, and from this backdrop, business operators are forced to deploy data if at all they have to remain afloat, as far as competitiveness is concerned. However, very few companies can quickly receive and store data, and therefore, its analysis turns out to be as efficient as the most powerful technology giants. In this perspective, the importance of data-as-a-service (DaaS) multiplies.
The analysis has shown that, in the past five years, the frequency of requests for the words "Data as a service" has grown by 350%. DaaS providers usually offer paid services in data collection, storage, and analysis under a subscription system. They are implemented using cloud computing that provides access to the end-users across the network, and that also does not make any storage or analyzing of data being done at the user's end.
DaaS providers are either using the customer's internal data or providing access to datasets that a customer does not ordinarily have access to. With the advent of cloud technologies, there really does seem to be an obvious and rapid transition toward DaaS. Technavio estimates that the DaaS market will grow by almost $56.85 billion by 2027.
Figure 3. DaaS market
According to forecasts, the given market may expect an annual average growth rate at the level of about 40% until 2027. The average annual rate of market growth amounted to 28.64% in 2023. Snowflake is one of the most remarkable players in this market, offering data storage service and a DaaS provider.
Snowflake provides its customers with a platform for data storage and analysis, delivering vendors that offer products of information through its platform from over 600 active datasets. For instance, this works on the basis of 154 information products offered within the category of demand forecasting on a personal platform. Snowflake, based in New York City, closed its Series A round in April 2023 at $62.9 million. Cybersyn businesses collect and analyze information from publicly available sources, as well as private sources of economic data, in further production and sale of data sets to third parties. They are healthcare, mortgage, and retail, among others [5].
In a modern enterprise, information is often stuck in separate departments, depriving other business users of the opportunity to extract value from this data. The democratization of data is becoming an essential factor for business, assuming that information is open to everyone in the enterprise, regardless of their level of technical knowledge. This allows end users to access the necessary information without depending on the IT department. According to the study, 80% of business leaders confirm that access to data improves the decision-making process. The Harvard Business Review study also indicates that for 97% of business leaders, the democratization of data is a critical factor in the success of their business.
However, so far, only 60% of business leaders report that their companies effectively provide access to data and tools for their analysis. Almost all respondents recognize the importance of democratizing data for business success. Opening access to data for decision-makers means that company employees become data experts despite their primary specialization. For example, Coca-Cola is actively investing in improving the skills of managers to attract more data processing specialists.
The company trained more than 500 people in digital skills in the program's first year and plans to expand the program to more than 4,000 employees in the coming years. In addition, businesses use self-service tools for data analysis, allowing employees to query and analyze data without special training. Alteryx provides various software solutions for data analysis available to ordinary users without using code. The platform uses ML and NLP and has recently integrated generative artificial intelligence functions.
Inquiries on the topic of “generative AI" are increasing. These features include “Magic Documents,” which automates the synthesis of analytical data, and “Workflow Summary,” which uses ChatGPT to document workflows. The platform also integrates the OpenAI connector, offering businesses an open, generative artificial intelligence solution.
The new functions of generative artificial intelligence from Alteryx are aimed at improving the efficiency of data analysis and reporting. The company went public in 2017 and has since increased its value to $2.74 billion [1].
Improvements in data grid architecture are presented using the latest data. The data network supports self-service analytics and represents a significant increase in search queries over the past five years.
The concept of a data grid puts forward the idea of decentralizing information management, considering data as a valuable product, and contributing to creating domain-specific groups. The basic principle of the data network is to transfer responsibility for data to different teams within the enterprise, allowing each to manage their data independently and make informed decisions based on data.
In the context of this approach, which demonstrates data management, each corporate group can use appropriate tools and technologies depending on their specific needs. The result is to provide all teams with the necessary data and tools to drive innovation, experimentation, and effective strategic decisions.
The data network provides a flexible and scalable solution for enterprises with large amounts of information. This reduces the load on storage systems, facilitates interaction, and increases security and compliance with regulatory requirements.
Zhamak Degani, who represents the concept of a data network, announced the creation of her own data processing company, Nextdata, which aims to help enterprises decentralize such data through cellular data architecture and data product containers.
Nextdata aims to implement solutions for data networks under development and testing. One example of using a data grid architecture is in the financial sector, where the value of data is incredibly high, but its use carries security and privacy risks.
For example, JPMorgan Chase Bank developed a data network solution using AWS in 2022. This allows teams to extract and combine data from different systems in multiple data domains to create reports. Thanks to the data network, this data is available in data lakes, where other teams can access it through the corporate directory and request the necessary information.
The data network facilitates data exchange between teams and tracking their sources and origins.
The use of synthetic data is coming to the fore in the field of ensuring confidentiality and providing high-quality information. Synthetic data created by computer programs does not rely on real personalities or events but is invaluable for data analytics.
In recent years, requests for “synthetic data" have increased by more than 600%. With the development of artificial intelligence and machine learning systems, companies need data to train these systems. However, it is difficult for some enterprises to collect high-quality data. In this case, synthetic data comes to the rescue.
There are two types of synthetic data: fully synthetic and partially synthetic. They are created by using a database to build a machine learning model and creating a second dataset that preserves patterns and properties of real data but is not related to actual identifiers. Creating large amounts of synthetic data occurs automatically and is labeled for ease of use.
Synthetic data also solves security and privacy issues, especially in medicine. Projects like the University of California research have used synthetic data to predict morbidity, where data privacy plays a critical role.
New tools, such as the Massachusetts Institute of Technology Synthetic Data Vault, provide open-source tools for creating synthetic data. Synthetic data also covers health insurance, where such data has allowed the creation of analytical recommendation systems, saving time and respecting confidentiality requirements.
The City of Vienna has successfully used synthetic data to develop software applications, providing the necessary demographic data while complying with GDPR rules.
Gartner's analytical forecasts predict that by 2024, 60% of the data used in artificial intelligence and analytics systems will be synthetic, emphasizing the importance and prospects of synthetic data in the future [3].
Conclusion
In conclusion, innovative approaches to data analysis in commercial IT projects play a key role in the modern business environment. The use of advanced data processing, interpretation, and forecasting methods is becoming necessary for the effective functioning of companies in a rapidly changing market.
The development of machine learning technologies, artificial intelligence, and analytical tools makes it possible to predict trends more accurately, identify hidden patterns, and identify critical factors for business success. Integrating these innovative approaches makes it possible to adapt more precisely to customer needs, optimize processes, and improve products and services, ultimately contributing to increasing the company's competitiveness.
Thus, innovative approaches to data analysis are an integral part of modern commercial IT projects, providing companies with a competitive advantage and contributing to their sustainable development in a dynamic and competitive business environment.
References:
- Ensuring the availability of data and services: RPO, RTO indicators and SLA planning. [Electronic resource] - Access mode: https://habr.com/ru/companies/veeam/articles/328068/ – (accessed 01.01.2024).
- Islam M. Data analysis: types, process, methods, techniques and tools //International Journal on Data Science and Technology. – 2020. – Т. 6. – №. 1. – Pp. 10-15.
- Keen E. Gartner Identifies Top Trends Shaping the Future of Data Science and Machine Learning. 2023. [Electronic resource] - Access mode: https://www.gartner.com/en/newsroom/press-releases/2023-08-01-gartner-identifies-top-trends-shaping-future-of-data-science-and-machine-learning (accessed 01.01.2024).
- New IDC Spending Guide Forecasts Edge Computing Investments Will Reach $208 Billion in 2023 [Electronic resource] - Access mode: https://www.idc.com/getdoc.jsp?containerId=prUS50386323 – (accessed 01.01.2024).
- Obschonka M., Audretsch D. B. Artificial intelligence and big data in entrepreneurship: a new era has begun //Small Business Economics. – 2020. – Vol. 55. – pp. 529-539.
- Runkler T.A. Data analytics. – Wiesbaden : Springer Fachmedien Wiesbaden, 2020.
- Yershova O.L., Tomashevsky T.V. Peripheral Computations: The Basis for Data Processing in Internet of Things //Scientific Bulletin of the National Academy of Statistics, Accounting and Audit. – 2020. – №. 4. – Pp. 97-103.