Unstructured Data Processing using Spark for Topics Modelling
Information Technology domain is facing changes day by day. Furthermore, the size of data increases, as well as the demand to process them. There are two types of data: structured and unstructured data. The multiple sources and the variety of data today involve the use of “Big data” instead of data....
Saved in:
Published in | International journal of engineering and advanced technology Vol. 9; no. 5; pp. 1060 - 1063 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
30.06.2020
|
Online Access | Get full text |
ISSN | 2249-8958 2249-8958 |
DOI | 10.35940/ijeat.E9992.069520 |
Cover
Loading…
Abstract | Information Technology domain is facing changes day by day. Furthermore, the size of data increases, as well as the demand to process them. There are two types of data: structured and unstructured data. The multiple sources and the variety of data today involve the use of “Big data” instead of data. It is related that 80% of enteUprise’s data is unstructured [1]. However, the procedures to handle unstructured data are more complex than those for structured data. Thus, it becomes necessary to have a clear idea about this type of data and to know how to extract useful information from this data set. In this paper we will study how to retrieve useful information from unstructured data in E-commerce area using data analysis tools: Spark. To solve this issue, first an overview on structured and unstructured data and data analysis is provided, then information retrieval algorithm will be implemented using Spark MLlib tool in order to determine for a set of reviews, negative or positive, which subjects are more discussed by the customers. This study is needed in order to improve business based on customer satisfaction reviews. In that case, Unsupervised Machine Learning Latent Dirichlet Allocation (LDA) algorithm constitutes our model. Finally, the evaluation of the model will be given based on some parameters. |
---|---|
AbstractList | Information Technology domain is facing changes day by day. Furthermore, the size of data increases, as well as the demand to process them. There are two types of data: structured and unstructured data. The multiple sources and the variety of data today involve the use of “Big data” instead of data. It is related that 80% of enteUprise’s data is unstructured [1]. However, the procedures to handle unstructured data are more complex than those for structured data. Thus, it becomes necessary to have a clear idea about this type of data and to know how to extract useful information from this data set. In this paper we will study how to retrieve useful information from unstructured data in E-commerce area using data analysis tools: Spark. To solve this issue, first an overview on structured and unstructured data and data analysis is provided, then information retrieval algorithm will be implemented using Spark MLlib tool in order to determine for a set of reviews, negative or positive, which subjects are more discussed by the customers. This study is needed in order to improve business based on customer satisfaction reviews. In that case, Unsupervised Machine Learning Latent Dirichlet Allocation (LDA) algorithm constitutes our model. Finally, the evaluation of the model will be given based on some parameters. |
Author | Sokegbe, Adjovi Irène Nainwal, Ayushi |
Author_xml | – sequence: 1 givenname: Adjovi Irène surname: Sokegbe fullname: Sokegbe, Adjovi Irène – sequence: 2 givenname: Ayushi surname: Nainwal fullname: Nainwal, Ayushi |
BookMark | eNp9kMtOwzAQRS1UJErpF7DJD6T4ncyCBSrlIRWBRFlHE8dBLiGObHfB3xOlLBALZnFnpNG5i3NOZr3vLSGXjK6EAkmv3N5iWm0AgK-oBsXpCZlzLiEvQZWzX_cZWca4p-MUigvK5uT6rY8pHEw6BNtkt5gwewne2Bhd_54dpnwdMHxkrQ_Zzg_OxOzJN7brxtcFOW2xi3b5sxdkd7fZrR_y7fP94_pmmxsmgOZSaIa1ZoBS66IwgMDrxipEkIpq0ahCc1M2QrRYNwxLhJpRiYWurTRCLIg41prgYwy2rYbgPjF8VYxWk4NqclBNDqqjg5GCP5RxCZPzfQroun_Zb5fJZW4 |
CitedBy_id | crossref_primary_10_35940_ijeat_C4564_14030225 |
ContentType | Journal Article |
CorporateAuthor | Computer science and Engineering, Alakh Prakash Goyal Shimla University, Shimla, India |
CorporateAuthor_xml | – name: Computer science and Engineering, Alakh Prakash Goyal Shimla University, Shimla, India |
DBID | AAYXX CITATION |
DOI | 10.35940/ijeat.E9992.069520 |
DatabaseName | CrossRef |
DatabaseTitle | CrossRef |
DatabaseTitleList | CrossRef |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISSN | 2249-8958 |
EndPage | 1063 |
ExternalDocumentID | 10_35940_ijeat_E9992_069520 |
GroupedDBID | AAYXX ALMA_UNASSIGNED_HOLDINGS CITATION M~E |
ID | FETCH-LOGICAL-c1390-4361ab619a46677c9a92bde5aa945063d5762c8d33fabd1a8a9b104a76be4c33 |
ISSN | 2249-8958 |
IngestDate | Tue Jul 01 00:38:24 EDT 2025 Thu Apr 24 23:05:24 EDT 2025 |
IsDoiOpenAccess | false |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | true |
Issue | 5 |
Language | English |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-c1390-4361ab619a46677c9a92bde5aa945063d5762c8d33fabd1a8a9b104a76be4c33 |
OpenAccessLink | https://doi.org/10.35940/ijeat.e9992.069520 |
PageCount | 4 |
ParticipantIDs | crossref_primary_10_35940_ijeat_E9992_069520 crossref_citationtrail_10_35940_ijeat_E9992_069520 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2020-6-30 |
PublicationDateYYYYMMDD | 2020-06-30 |
PublicationDate_xml | – month: 06 year: 2020 text: 2020-6-30 day: 30 |
PublicationDecade | 2020 |
PublicationTitle | International journal of engineering and advanced technology |
PublicationYear | 2020 |
SSID | ssj0000752301 |
Score | 2.1113486 |
Snippet | Information Technology domain is facing changes day by day. Furthermore, the size of data increases, as well as the demand to process them. There are two types... |
SourceID | crossref |
SourceType | Enrichment Source Index Database |
StartPage | 1060 |
Title | Unstructured Data Processing using Spark for Topics Modelling |
Volume | 9 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LTxsxELYCvfRCaWlVKK186K1surG9Dx8RUEEluBAkbis_aQAlESRU5cBvZ2zvw0CEoBcra-2ONp5P9nzj2c8IfdeyTLUVNpGFkgmzLE0klTYRlnLGmZOHcTu6h0f5_gn7fZqd9no3UdXSfCb76nbhdyX_41XoA7-6r2Rf4dnWKHTAb_AvtOBhaF_k45Na_nXuish3xUw0df-O_899ezwVVxe-lnA4mTpFZnf4mdfhjsPSh3nBSE3CdHKFQda1KRmYPUnJH08uzJkM-VF9PrkZ_TgI2_Blt3N_JEbjvyLUBfybX_8ZxUkHkjYVcs3cBAs_T0oeVNf7ZkFfPbnyCENZNFECE02jRRcu6aIJnWacuRLI0TmsTP09iGZJP815RtJu_Wr27B8ta22xIdAcb6byRipvpApGltAbAvTCnXxxeNfl5iCMAmbmuHr7n4Jglbfz8-nLREFNFJ0MV9FKTSvwdsDIe9Qz4w_oXXNkB65n8DX0ADLYQQZ3kMEeMthDBgNkcIAMbiHzEQ1_7Q139pP6BI1EQWSfJozmAyGBIwuW50WhuOBEapMJwVkGQ66BbRJVakqtkHogSsEl8HNR5NIwRekntDyejM1nhBVJVcENtVoCJS8LCGwGRhGaW8t1SeQ6Is0IVKpWl3eHnFxWzwz_OtpqH5oGcZXnbt943e1f0NsOuptoGUbXfIUIcia_eXffAy7McSM |
linkProvider | ISSN International Centre |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Unstructured+Data+Processing+using+Spark+for+Topics+Modelling&rft.jtitle=International+journal+of+engineering+and+advanced+technology&rft.au=Sokegbe%2C+Adjovi+Ir%C3%A8ne&rft.au=Nainwal%2C+Ayushi&rft.date=2020-06-30&rft.issn=2249-8958&rft.eissn=2249-8958&rft.volume=9&rft.issue=5&rft.spage=1060&rft.epage=1063&rft_id=info:doi/10.35940%2Fijeat.E9992.069520&rft.externalDBID=n%2Fa&rft.externalDocID=10_35940_ijeat_E9992_069520 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2249-8958&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2249-8958&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2249-8958&client=summon |