Using Open Data To Predict Market Movements-PDF Download

  • Date:09 Apr 2020
  • Views:45
  • Downloads:1
  • Pages:31
  • Size:2.48 MB

Share Pdf : Using Open Data To Predict Market Movements

Download and Preview : Using Open Data To Predict Market Movements

Report CopyRight/DMCA Form For : Using Open Data To Predict Market Movements


Table of Contents,Table of Figures 3,1 Executive Summary 4. 2 Project Goals 5,3 Background Information 5,3 1 Common Crawl 5. 3 2 Gartner Magic Quadrant 5,3 3 The Register IT News Site 6. 4 Data Exploration and Insights 6,4 1 NetApp Monthly Product Trends 6. 4 2 Detecting Technology Trends using Bubble Charts 9. 4 3 Storage Articles by Topic over a 15 year Period 11. 5 Gartner s Magic Quadrant Movements 12,5 1 Focus of the Market Analysis 12.
5 2 The Correlation 12, 5 3 Use Case 1 Number of Register Articles Vs Gartner MQ 14. 5 4 Use Case 2 Correlating News Coverage with Gartner MQ 18. 5 5 Use Case 3 The Register Keywords N grams Vs Gartner MQ 19. 6 Application Architecture 21,6 1 Data Extraction and Analysis 21. 6 2 Technology and Tools 22,6 3 Data Analysis 25,6 4 Data visualization 26. 6 5 Design Choices with Storing Analyzing and Collaborating 26. 7 Conclusion 28,8 Future Scope 29,9 References 29, 2018 Dell EMC Proven Professional Knowledge Sharing 2. Table of Figures,Figure 1 NetApp Monthly Product Trends 6.
Figure 2 NetApp Monthly Solid State Arrays Keywords N grams 7. Figure 3 The Register annually section trends Storage Cloud 8. Figure 4 Technology Bi gram Bubble Chart 10,Figure 5 Storage Articles by Topic 11. Figure 6 Magic Quadrant with gridlines to detect vendor movements 13. Figure 7 Correlation between NetApp articles and NetApp MQ by Gartner 15. Figure 8 Hitachi Data Systems The Register Articles Vs Gartner MQ by Year 17. Figure 9 Pure Storage The Register Articles Vs Gartner MQ by Year 19. Figure 10 The Register N Grams Portion Vs Gartner MQ by year 20. Figure 11 Common Crawl Data Processing Flow Diagram 21. Figure 12 The Common Crawl Index 22,Figure 13 Data Processing Architecture Diagram 24. Figure 14 Snapshot of Monogram Frequency 26,Figure 15 Compute Instance Options 27. Figure 16 Growth in Intermediate Storage Requirement 28. Disclaimer The views processes or methodologies published in this article are those of the. authors They do not necessarily reflect Dell EMC s views processes or methodologies. 2018 Dell EMC Proven Professional Knowledge Sharing 3. 1 Executive Summary, As companies progress on their digital transformation journeys technology becomes a strategic. business decision In this realm consulting firms such as Gartner exert tremendous influence on. technology purchasing decisions The ability of these firms to predict the movement of market. players will provide vendors with competitive benefits. This paper will explore how with the use of publicly available data sources IT industry trends. can be mimicked and predicted, Big Data enthusiasts learned quickly that there are caveats to making Big Data useful.
Data source availability, Producing meaningful insights from publicly available sources. Working with large data sets that are frequently changing can become expensive and. frustrating The learning curve is steep and discovery process is long Challenges range from. selection of efficient tools to parse unstructured data to development of a vision for. interpreting and utilizing the data for competitive advantages. We will describe how the archive of billions of web pages captured monthly since 2008 and. available for free analysis on AWS can be used to mimic and predict trends reflected in. industry standard consulting reports, There could be potential opportunity in this process to apply machine learning to tune the. models and to self learn so they can optimize automatically There are over 70 topic area. reports that Gartner publishes Having an automated tool that can analyze across all those topic. areas to help us quickly understand major trends across today s landscape and plan for those to. come would be invaluable to many organizations, This paper will cover three potential use cases that demonstrate how the analyst press. coverage of a company product correlated to the market movements observed The crux of the. research paper focuses on the three use cases that best provide evidence of this correlation. 2018 Dell EMC Proven Professional Knowledge Sharing 4. 2 Project Goals, This is an ongoing project in a sales and marketing organization as part of an effort to better. understand the market landscape for server and storage products The specific goals of the. project are to, Gain market insights by performing text analysis on common crawl data of industry.
players using the publicly available data on their website. Understand how technology terms like machine learning big data hyper converged. all flash etc are evolving over the last 7 10 years. Understand how industry players use technical terms for marketing purposes and how. their choices effect their business results, Draw inferences as to which technologies are gaining traction today and predict those. that will in the future, The business value of this project is in sharing the findings and any custom comparisons with. the sales field team as a sales support tool for a product development team as a customer. requirement gathering tool and strategic planning team as a market landscape research tool. 3 Background Information,3 1 Common Crawl, Common Crawl is a non profit organization that builds and maintains an open repository of web. crawl data that is in essence a copy of the Internet The corpus includes web crawl data. collected over the last 10 years that can be accessed and analyzed by anyone for a very low. cost i This data is stored on Amazon s S3 storage service making it quick and easy to perform. analysis utilizing the cloud s scalable computing and analytics resources ii. 3 2 Gartner Magic Quadrant, Gartner the IT consulting firm releases market research reports that rely on their proprietary. qualitative data analysis methods to demonstrate market trends such as direction maturity. and participants The company publishes a Magic Quadrant MQ for each product category. which provides a graphical competitive positioning of four types of technology providers in. markets where growth is high and provider differentiation is distinct The four quadrants of the. MQ are Leaders execute well against their current vision and are well positioned for tomorrow. Visionaries understand where the market is going or have a vision for changing market rules. but do not yet execute well Niche Players focus successfully on a small segment or are. unfocused and do not out innovate or outperform others Challengers execute well today or. may dominate a large segment but do not demonstrate an understanding of market direction. 2018 Dell EMC Proven Professional Knowledge Sharing 5. 3 3 The Register IT News Site, The Register is a leading news site with thorough articles on technology They seem to have.
writers who truly understand the technologies they write about which makes this website a. great source of information on not only the trends in technology in terms of upcoming terms. like All Flash or Hyper converged or loosely defined ones like Server less but also see which. companies are being associated with these technologies in the press articles The website is. divided into sections many of which evolve over time The Register provides extensive. coverage of data center technologies and is a great resource for such data For this analysis we. used the Data Centre section of the website,4 Data Exploration and Insights. We selected some of the processed data NetApp domain Huawei domain The Register. domain performed keyword analysis n gram frequency analysis structured the data with. Spark DataFramesiii and used SQL to query the results The results were then visualized revealed. technologies and product trends over time Here are some examples of the macro trends we. could identify,4 1 NetApp Monthly Product Trends, Three NetApp general purpose storage product names were selected as keywords N grams e. series FAS Unified storage After we identified these keywords we visualized their count for. the past 17 months Figure 1,Figure 1 NetApp Monthly Product Trends. 2018 Dell EMC Proven Professional Knowledge Sharing 6. FAS series storage orange line is NetApp s major disk array product Visualized data from the. common crawl confirmed that it was much more popular than e series green line It showed. intense activities during 2014 2015 which correlated with major NetApp campaigns on FAS. series Once sales stabilized campaign activities went down in the second half of 2015. E series was the product line from Engenio which was acquired by NetApp in 2011 but it was a. relatively small part of the company s revenue Limited campaign efforts were shown in. October 2014 and popularity of the product improved after that. Unified Storage blue line was NetApp s popular marketing campaign before April 2014 It. stabilized following a decline in 2014, We also used NetApp solid state array keywords N grams all flash SolidFire to see the trends. in 17 month time series, Figure 2 NetApp monthly Solid State Arrays Keywords N grams.
NetApp used all flash description for FAS series however FAS series has all flash arrays which. we can t separate from the rest The all flash line blue line follows the FAS pattern. Discrepancies in the trends can be explained by the fact that FAS has not only all flash models. but hybrid models as well, SolidFire green line flash array showed up on February 2016 which is exactly the time it was. acquired by NetApp, 2018 Dell EMC Proven Professional Knowledge Sharing 7. This chart confirms our hypothesis that our process of separating and tagging keywords from. a particular vendor s domain obtained from the publicly available data has worked and can. be used to detect changes in strategies and investments of the companies. The Rise of Cloud Computing,The Register Data Centre Articles By Category. 400 Servers,Virtualization,Number of Articles, 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 Sep. Figure 3 The Register annually section trends Storage Cloud. We see from Figure 3 that Cloud category green line surged in 2010 Amazon Web Services. AWS division was founded in 2002 In 2006 the S3 simple storage service was launched. Microsoft launched their cloud service offering Azure in February 2010 By that time business. benefits of cloud computing were well recognized across companies Migration to the cloud. also raised a number of issues including security and data loss and data availability shown by a. dip in 2011 However as new cloud service providers started to enter the cloud market driven. by a cloud growth report in 2011 published by Forrester iv customers concerns were addressed. and demand for cloud computing was getting stronger Following the hype of the cloud. computing companies that were migrating or considering migration started to realize that cloud. is not always as cheap as they would like it to bev In the meantime storage was losing market. share due to cloud competition, Internet of Things IoT was introduced around 2014 notice a cloud popularity decline and in.
2015 Gartner predicted IoT market explosion by 2020 vi IoT raised a whole new issue of data. 2018 Dell EMC Proven Professional Knowledge Sharing 8. growth data analytics and device connectivity Storage innovation did not stop with offers of. solid stated disks SSD and all flash arrays now followed by NVMe offerings popularity was. gaining ground This story is well illustrated by the chart above. This analysis showed that the number of articles published in a particular section correlated. well with the level of investments and product announcements the companies in that sector. were making, 4 2 Detecting Technology Trends using Bubble Charts. Bubble charts allow for an efficient visualization of technology clustering of some vendors like. NetApp HPE Huawei Analyzing clusters we can see where the vendor is putting emphasis and. possibly investing more We will take you through the process of creating N grams for. organizing technology clusters of a vendor, We have seen from the bubble chart we generated that some major words like network data. cloud and mobile may well reflect key business technology areas of Huawei But we also saw. noise words like best need 2017 key etc which does not much value from analysis point of. view So we removed noise words and only kept the technology words. In addition to the keywords and N grams bubble chart we preferred to see more technology. words distributed in the bubble chart e g storage big data modular HPC etc We also wanted. to see more technology marketing phases that Huawei uses so we used bi grams to generate. the bubble chart below, 2018 Dell EMC Proven Professional Knowledge Sharing 9. Figure 4 Technology Bi gram Bubble Chart, In the Figure 4 bi grams bubble chart we see some interesting technology marketing words. like digital transformation big data cloud computing cloud data ICT infrastructure. software defined networking many good phrases that reflect Huawei s technology and. where their investments and strategy was taking them. This bubble chart turned out to be a useful tool for discovering the most important. technologies in a particular quarter year or decade This can be applied to vendor website or. news sites to cover the whole industry, 2018 Dell EMC Proven Professional Knowledge Sharing 10.
4 3 Storage Articles by Topic over a 15 year Period. In addition to custom code execution on clusters in Spark we al. USING OPEN DATA TO PREDICT MARKET MOVEMENTS Ravinder Singh Director of Analytics Dell EMC ravinder singh dell com Marina Levina Data Scientist Dell EMC

Related Books