Yihan Bian's website

Conclusion

This is quiet a long journey from we have started several months ago. We have found data in different format with different content and cleaned those dirty data to let them be helpful in the following processing task. Unfortunately, we have not used all the data that we gathered and cleaned since some of the data that we have found is not informative and some of the model can not process such a large amount of data. One of the reason is that too much data is not good for some model and the another main reason is that we do not have enough computational power to let the hardware to complete the computation. We have to choose which part of the data is more informative and more helpful to help us get a conclusion. We have spend an equal amount of time on each piece of data since we do not know what data is not useable at the time of preprocessing the data. Although some of data is not used in the rest of the study, the used data can still give a quite convinceable conclusion to our non-technical reader.

As we have already in the introduciton section, the topic we focused on is the metaverse and the key words that we try to study are 'metaverse' and 'NFT'. Now let me briefly explain these two words again in case that you are not familiar with these two terms. Broadly speaking, the technologies companies refer to when they talk about “the metaverse” can include virtual reality—characterized by persistent virtual worlds that continue to exist even when you're not playing—as well as augmented reality that combines aspects of the digital and physical worlds. Non-fungible tokens (NFTs) are cryptographic assets on a blockchain with unique identification codes and metadata that distinguish them from each other. Unlike cryptocurrencies which are identical to each other and, therefore, can serve as a medium for commercial transactions, they cannot be traded or exchanged at equivalency.

In statistics, exploratory data analysis (EDA) is an approach of analyzing data sets to summarize the main characteristics of the data which is quite important. However, for our task EDA can not help us get too much information since all the data is quite complicated and large, it is very hard to just use the easy data visulization tools to get a brief understanding of the whole data, but it still help us to find which problem we raised about the topic can be answered and which part can not be answered.

The first model we try to apply on our data is naive bayes. We choose to use vanilla Naive Bayes as our classifier, we use the train data to train the classifier. Unfortunately, the result of applying this model on our test data is completely a failure. Since the data itself is not suitable for this algorithm and we only used the easiest version of naive bayes algorithm to train the classifier, it's not surprising that the classifier can not work well on the test data. The plot on the corresponding tab shows that the classifier does not work well. According to the assumption of the model, we already know that this model may not work well. However, we still have many other model that can help us.

The decision tree is quite a good model for our task according to the confusion matrix. The accuracy of the model is quite good so I think the conclusion given by this model is quite convincing. And we find that the most overall attitude of tweets towards the metaverse is positvive which largely answer the question we have previously raised. And after this model, we also use the svm model to do the similar work and the accuracy is also very high comparing to the bad result from naive bayes model. And we used another part of the tweets data on the svm model. And we get the same conclusion that the other part of the tweets data is also mostly positive.

Then we use the NFT data to find some important conclusion about the NFT in the metaverse. We use the clustering algorithm to get this conclusion and to make the conclusion more convinceable we used different algorithm of clustering to get a general result. One conclusion is about the location of the trading estate that in some sepcial location the price is much more higher. And another associated conlusion is that the label of the land will affect the price of that land.

The final part of my study is the association rules mining analysis about the tweets of both metaverse and NFT. By observing the directed graph and the table which is also presented in the corresponding tab we can know a lot of information about these text data. According to these information we know that most of the people who tweets about this topic thought that 'metaverse' is a project and most of them have a positive attitude about this project, since there are some rules in this rule table shows that the word 'good', 'best' and 'great' is highly associated with the word 'project'. And it seems that this project is about the market and the holder of NFT can be selled and buyed in this market.

These are the some big conclusion about the whole study and there are also some small conclusion which is stated in the corresponding tab can be helpful to get the final big conclusion. Unfortunately, we find that there are also some problem can not be addressed now due to the lack of corresponding data. By using the data science method and data we have now these conclusions are the best we can get now. We think that these conclusions are still very valuable and informative. We wish these conclusions can help others to get some insight about this topic.