Improved lexicon-based sentiment analysis for social media ...

文章推薦指數: 80 %
投票人數:10人

The algorithm consists of two key components, namely sentiment normalisation and evidence-based combination function, which have been used in ... Skiptomaincontent Advertisement SearchallSpringerOpenarticles Search DownloadPDF Research OpenAccess Published:09December2015 Improvedlexicon-basedsentimentanalysisforsocialmediaanalytics AnnaJurek1,MauriceD.Mulvenna1&YaxinBi1  SecurityInformatics volume 4,Article number: 9(2015) Citethisarticle 35kAccesses 61Citations 1Altmetric Metricsdetails AbstractSocialmediachannels,suchasFacebookorTwitter,allowforpeopletoexpresstheirviewsandopinionsaboutanypublictopics.Publicsentimentrelatedtofutureevents,suchasdemonstrationsorparades,indicatepublicattitudeandthereforemaybeappliedwhiletryingtoestimatethelevelofdisruptionanddisorderduringsuchevents.Consequently,sentimentanalysisofsocialmediacontentmaybeofinterestfordifferentorganisations,especiallyinsecurityandlawenforcementsectors.Thispaperpresentsanewlexicon-basedsentimentanalysisalgorithmthathasbeendesignedwiththemainfocusonrealtimeTwittercontentanalysis.Thealgorithmconsistsoftwokeycomponents,namelysentimentnormalisationandevidence-basedcombinationfunction,whichhavebeenusedinordertoestimatetheintensityofthesentimentratherthanpositive/negativelabelandtosupportthemixedsentimentclassificationprocess.Finally,weillustrateacasestudyexaminingtherelationbetweennegativesentimentoftwitterpostsrelatedtoEnglishDefenceLeagueandthelevelofdisorderduringtheorganisation’srelatedevents. BackgroundSocialmediaisoneofthemostsignificantinformationexchangetechnologyofthe21stcentury.Peopleofallagesusesocialmediatopostmessages,photosandvideosabouttheirdailyactivities.Socialmediachannels,suchasTwitterandFacebook,provideveryconvenientandefficientwaysofcommunicatingandsharinginformationpublically.Consequently,theroleofsocialmediaincrimeinvestigationandpreventionisgrowingrapidly.Socialmediaarerapidlybecomingasourceofinformationforearlywarningsystemsinpublicsafety.AccordingtotheLexisNexisreport[1]fouroutoffivelawenforcementprofessionalsutilisesocialmediaforinvestigationpurposes.Accordingtothestatisticsgivenin[1],69 %areusingsocialmediatoolsforgatheringinformationaboutcrimesandabout41percentareusingsocialmediaforcrimeanticipation.Sentimentanalysishasbeenalreadyappliedinanumberofdifferent,non-securitydomainsformonitoringandforecastingpublicopinions.In[2]theauthorsappliedadomain-specificlexiconinordertoclassifycustomerreviewsofhotelsintofivestarcategories.SentimentanalysisperformedonTwitterwasappliedin[3]inanefforttoforecastbox-officerevenuesformovies.Followingtheirstudyitwasfoundthattherewasarelationshipbetweentherateofmovietweetsandthereal-worldbox-officeperformance.Asimilarapplicationofsocialmediaanalysiswaspresentedin[4].IntheirstudytheauthorsuncoveredarelationshipbetweenonlinediscussionontheInternetMovieDatabaseandtheAcademyAwardsnominationsandthebox-officesuccess.In[5]theauthorsdevelopedanewmodelforeventanalytics.TheirproposedframeworkcharacterizesthesegmentsandtopicsofaneventviaTwittersentiment.Intheirstudytheyfocusedontwopublicevents,namelytheU.S.Presidentialdebatein2012andPresidentObama’sMiddleEastspeechin2011.Theapplicationofsentimentanalysisinthetourismdomainwaspresentedin[6].TheauthorsintroducedtheuseoflexicondatabasesforsentimentanalysisofuserreviewsacquiredfromTripAdvisorforaccommodationandfood.In[7]socialmediawaspresentedasanewopportunitytostudybullyinginthephysicalandcyberworlds.Intheirstudytheauthorsdevelopedatextclassificationmodelthatrecogniseddifferentemotions(anger,embarrassment,empathy,fear,pride,relief,sadness)inTwitterposts.In[8]Twitterdatawasappliedinanefforttoidentifythecorrelationbetweenpublicandmarketsentiment.Theauthorsclassifiedmessagesintofourdifferentmoodclasses,namelycalm,happy,alertandkind.Followingthis,theidentifiedmoodsandpreviousday’sDowJonesIndustrialAveragewereusedtopredictfuturestockmovements.Theaforementionedstudiesdemonstratethatsocialmediaandsentimentanalysishavebeenconsideredinmanydifferentapplicationdomains.Someresearchhasbeenalreadydirectedtowardsdesigningsocialmedia-basedintelligentsystemsforthepurposeofsupportingdecisionsintheareaofpublicsafety.In[9]atopicdetectiontechniquewasproposedthatallowstheretrievalinreal-timeofthemostemergingtopicsexpressedbyacommunitythroughsocialmedia.Aprobabilisticmodelwasdevelopedin[10]thatcanpredicttheriskoffallingillforindividualsonthebasisoftheirsocialtiesandco-locationwithotherpeople,asrevealedbytheirTwitterposts.Twittercorpuswasalsosuggestedasasourceofinformationthatcanbeappliedinmonitoringthediffusionofanepidemicdiseasesuchasseasonalinfluenza[11].ThegeneralproblemofWeb-basedsecurityinformaticswasaddressedin[12].Intheirworktheauthorsreferredtothreefundamentalobjectives,namelythediscoveryofsecurity-relevantdataandinformation,targetsituationalawarenessandpredictiveanalysis.Theyproposedananalysismethodologyandevaluateditthroughaseriesofreal-worldexamples,suchasdetectionofcyberincidentsinnearreal-time,estimationofpublicopinionsincontentioussituations,discoveryofemergingtopicsandtrends,andearlywarninganalysisformobilizationandprotestevents.In[13]Twitterdatawasappliedinordertodetectonlinecommunitiesinvolvedinconversationsaroundthe2013SyrianSaringasattacktopic.Followingthis,differenttypesofleaderswereidentifiedwithinthecommunities.Aworkrelatedtopredictingpopularityofforumthreadsrelatedtopubliceventswasundertakenin[14].In[15]amethodbasedontriggerkeywordsandcontextualcueswasproposedfordetectingthreateningmessagesonsocialmedia.AViolenceDetectionModelwasproposedin[16]foridentificationofviolencerelatedtopicsbeingdiscussedonamicroblog.SocialmediatrafficaroundtheGreatEasternJapanEarthquakewasanalysedin[17]inordertoinvestigatetherelationbetweenpeople’sactivitiesandtheseriesofeventsoccurringintheevent’saftermath.Similarresearchwithrespecttothe2011TohokuEarthquakehasbeenundertakenin[18],whereboth,EnglishandJapanesetweetswereanalysedtodeterminedifferentreactionattitudesbetweenlocalandforeignresidents.Todate,littleresearchhasfocusedoninferringthesentimentofsocialmediacontentforthepurposeofsecurityanalysis.In[19,20]lexicon-basedsentimentanalysisalgorithmswereintroducedandpresentedinanumberofdifferentcasestudies.In[19]thepotentialofthemethodswasillustratedbyestimatingregionalpublicopinionregardingtwoevents:the2009Jakartahotelbombingandthe2011Egyptianrevolution.In[20]authorsinvestigatedtherelationshipbetweenregionalonlinesentimentaboutPalestiniansuicidebombingattacksagainstIsraelandactualbombingevents.InthesameworktheyalsostudiedtheimpactofpublicsentimentontheepidemicriskofH1N1vaccination.Sentimentanalysiswasappliedin[21]foridentifyingtopmalwaresellersandstolencreditcardsellersintheonlineundergroundeconomy.Publicopinionaroundthe2012PussyRioteventwasevaluatedthroughsentimentanalysisofsocialmediapostsin[22].Insomework[23,24],sentimentanalysiswasstudiedasatechniquefordetectingradicalisationinsocialmedia.Inworkpresentedin[23],sentimentanalysistogetherwithlexicalandsocialnetworkanalysiswasappliedtoexamineandcharacterisetheusersofradicalisedforums.In[23]sentimentanalysiswassuggestedasoneofasetoflinguisticmarkersthatcouldbeappliedforidentifyingpotentiallonewolfterrorism.InthisworkwefocusontheapplicationofsentimentanalysisofTwittercontentinestimatingthelevelofdisruptionanddisorderduringpublicevents.Wedevelopedalexicon-basedsentimentanalysisalgorithmthatdiffersfromexistingmodelsinthewaythatitaggregatesthesentimentvaluesofpositiveandnegativewordswithinamessage.Throughtheapplicationofanormalisationfunctionthesentimentofamessageisrepresentedasavaluefromarangeof−100to100.Consequently,amorecomprehensiveanalysiscanbeundertakenregardingthesentimentasopposedtopositive–negative-neutralclassification.Besidesthis,suchanapproachismoreappropriateforrealtimeanalysisgiventhatitallowsdetailedvisualisationofthesentimentovertime.Inanefforttoincreasetheaccuracyofthealgorithmweproposedanevidence-basedcombinationfunctionthatisappliedinthecasewhenpositiveandnegativewordsco-occurinamessage.Furthermore,amodifiedmannerofhandlingnegationandintensificationwithinamessagewasintroduced.Finally,weillustrateacasestudyexaminingtherelationshipbetweensentimentaboutEnglishDefenceLeague(EDL)priortoEDLdemonstrationandthelevelofdisruptionanddisorderduringtheevent.Themethodhasbeenalreadyintroducedinoneofourpreviouspapers[32].Inthismanuscriptwedescribedinmuchmoredetailthetheoreticalaspectsoftheapproach.Weexplainedstepbystephowthesentimentnormalisationfunctionhadbeendeveloped.Followingthis,weevaluatedhowthemethodperformedwithlongmessages,suchasmoviereviews.Inthiswork,thealgorithmhasbeenmodifiedinthewaythatitcanperformsentimentanalysisonsentencelevel.Inthismannerwewishedtoinvestigatehowsentencelevelanalysisaffectsthesentiment’sclassificationaccuracy.Thepaperisorganisedasfollows.Inthefollowingsectionwepresentthenewlexicon-basedapproach.Resultsoftheempiricalevaluationofthealgorithmaredemonstratedin“Empiricalevaluation”.In“Discussion”weillustratethecasestudyfollowedbysummaryandfutureworkin“Casestudy:Englishdefenceleague”.Lexicon-basedsentimentanalysisApplicationofalexiconisoneofthetwomainapproachestosentimentanalysisanditinvolvescalculatingthesentimentfromthesemanticorientationofwordorphrasesthatoccurinatext[25].Withthisapproachadictionaryofpositiveandnegativewordsisrequired,withapositiveornegativesentimentvalueassignedtoeachofthewords.Differentapproachestocreatingdictionarieshavebeenproposed,includingmanual[26]andautomatic[27]approaches.Generallyspeaking,inlexicon-basedapproachesapieceoftextmessageisrepresentedasabagofwords.Followingthisrepresentationofthemessage,sentimentvaluesfromthedictionaryareassignedtoallpositiveandnegativewordsorphraseswithinthemessage.Acombiningfunction,suchassumoraverage,isappliedinordertomakethefinalpredictionregardingtheoverallsentimentforthemessage.Apartfromasentimentvalue,theaspectofthelocalcontextofawordisusuallytakenintoconsideration,suchasnegationorintensification.Inourworkwehavedecidedtoapplyalexicon-basedapproachinordertoavoidtheneedtogeneratealabelledtrainingset.Themaindisadvantageofmachinelearningmodelsistheirrelianceonlabelleddata.Itisextremelydifficulttoensurethatsufficientandcorrectlylabelleddatacanbeobtained.Besidesthis,thefactthatalexicon-basedapproachcanbemoreeasilyunderstoodandmodifiedbyahumanisconsideredasignificantadvantageforourwork.Wefounditeasiertogenerateanappropriatelexiconthancollectandlabelrelevantcorpus.Giventhatthedatapulledfromsocialmediaarecreatedbyusersfromallovertheglobe,thereisalimitationifthealgorithmcanonlyhandleEnglishlanguage.Consequently,sentimentanalysisalgorithmshouldbemoreeasilytransformableintodifferentlanguages.Laterinthepaperwediscusshowalexicon-basedsentimentanalysisalgorithmcanbeadaptedtodifferentlanguagesbyanappropriatetranslationofthesentimentlexiconandapplicationofstringsimilarityfunctions.Thefollowingfivesub-sectionsdescribeindetailsthedevelopmentofthealgorithmappliedinthisstudy.SentimentlexiconThesentimentlexiconconstructedcontainsabout6300words.ItwasgeneratedmanuallywithapplicationofSentiWordNet[28]asabaseline.Eachwordinthelexiconhasassignedavaluerepresentingsentimentintherangeof−100(mostnegative)to100(mostpositive).Fromanempiricalknowledgeitisknownthatsomeofthepositiveandnegativewordssometimeoccurwithneutralmeaninginasentencecontext.Forexample,sentence“EnjoyingmylazySunday!!”representsapositivemessagethatcontainsonepositive(enjoying)andonenegative(lazy)word.Itmaybedifficultinsuchacasetodecidebetweenpositiveandnegative.Inanefforttoalleviatethisissue,besidesthesentimentvalue,foreachwordfromthelexiconweestimatedaconditionalprobability(denotedbyP)aspresentedinEq. 1.$$\begin{aligned}P\left({positive\,|\,w}\right)&\;for\;positive\;w\\P\left({negative\,|\,w}\right)&\;for\;negative\;w\\\end{aligned}$$ (1) Basedonasetoflabelleddata,foreachpositivewordweestimatedtheprobabilitythatarandommessagecontainingthiswordispositive.Inthesamemannertheprobabilitieswereestimatedforeachnegativeword.Weintendedtoinvestigateifapplicationofsuchinformationinthesentimentclassificationprocesscanhelptohandlemessageswithmixed(positiveandnegative)sentiment.ForthepurposeofcalculatingtheprobabilitiesweappliedatrainingdatasetprovidedbyStanford[29]thatcontains1.6million(including800,000positiveand800,000negative)labelledtweets.Thetrainingdatasetwascreatedautomaticallybasedontheabsenceofemoticonswithinamessage.Itwasassumedthatanytweetswithpositiveemoticonswerepositiveandtweetswithnegativeemoticonswerenegative.ListofemoticonswasappliedasqueryforTwitterAPIandthecollectedmessageswereautomaticallylabelledaspositiveornegative,dependingonthetypeofemoticontheycontained.Theprocessofcalculatingtheprobabilitieshasbeenperformedasfollow.Asampleof100,000positiveand100,000negativetweetshasbeenselectedrandomly.Followingthis,foreachwordformthelexicon,denotedasw,itsfrequencyamongtheselectedpositiveandnegativemessageswascalculated.Dependingifthewordwaspositiveornegative,theconditionalprobabilitywascalculatedaspresentedinEq. 2.$$\begin{aligned}P\,\left({positive|w}\right)\,\,=\,\,\frac{{P\left({positive\mathop{\,\bigcapw}\nolimits}\right)}}{P\left(w\right)}=\frac{{\#w_{P}}}{\#w}\hfill\\P\,\left({negative|w}\right)\,\,=\,\,\frac{{P\left({negative\mathop{\,\bigcapw}\nolimits}\right)}}{P\left(w\right)}=\frac{{\#w_{N}}}{\#w}\hfill\\\end{aligned}$$ (2) where#w P and#w N standforthenumberofmessagesfromthesamplethatcontainswordwandarepositiveandnegative,respectively.Thetwoformulaswereappliedinordertoestimatetheprobabilitiesforpositiveandnegativewords,respectively.Inordertoobtainmorepreciseresult,theprocesswasrepeated100timesandtheaverageprobabilityobtainedforeachwordhasbeenstoredinthelexicon.Theprobabilitiesarereferredtoaspiecesofevidencelaterinthepaper.NegationThemostcommonapproachtohandlingnegationwithalexicon-basedapproachisbyreversingthepolarityofthelexiconitemthatstandsnexttothenegatorinasentence[30](e.g.good:100andnotgood:−100).Inourworkweproposedtotakeadifferentapproach.Ratherthanreversingthesentimentvalueweproposedtoformulateanegatingfunctionthatcalculatesthesentimentvalueofanegatedword.First,wemanuallycreatedalexiconcomposedof38negatingwords.Followingthis,applyingtheTwittercorpus,weselectedmostcommonlyusedphrasescontainingnegationofverbsandadjectives.Inthenextstep,agroupof20peoplewasaskedtoranktheexpressionsfrombothofthelistfrommostpositivetomostnegative.Takingunderconsiderationalltheresults,thefinaltworankingswereestimated.Basedonthetworankingswedeterminedthemostcorrespondingnegatingfunctionrepresentedasfollow:$$F_{N}(S)=\left\{{\begin{array}{*{20}c}{max\left\{{\frac{S\;+\;100}{2},\;10}\right\}}&{if\;S<0}\\{min\left\{{\frac{S\;-\;100}{2},\;-10}\right\}}&{if\;S>0}\\\end{array}}\right.$$ (3) wherefinalnegationisdenotedbyF N andSrepresentsasentimentvaluefromthelexicon.Onceanegationisrecognisedinasentence,thefirstnon-neutralwordthatoccurswithinthefollowingthreepositionsafterthenegatorissearched.Ifapositiveornegativewordisidentified,itsnewsentimentvalueiscalculatedbyusingEq. (3)(e.g.enjoy:20,donotenjoy:−40).Theadvantageofourapproach,incomparisontothepolarityreversion,istheresultingmoreaccuratemannerofassigningthesentimentvaluestonegatedwords.Forinstance,insentence“Idon’thatethiscity”,thesentimentassignedtothesentenceaccordingtotheinversionrulewillbe100(“hate”hasvalue−100inthelexicon)andthesentencewillbeconsideredaspositive.Infact,itwillhavethesamesentimentassentence“Ilovethiscity”,whatisnottheexpectedresult.Withtheintroductionofanegatingfunctionthesentimentofthesentencewillbe10.Aswewillseelaterinthepaper,asentenceisclassifiedaspositiveifthetotalsentimentisgreaterthan25.Consequentlytheabovesentencewouldnotbeconsideredaspositive.Dividingthevalueby2inEq. 2,itensuresthataveryhighorlowsentimentcannotbeobtainedbynegation.IntensityIntensifiersrefertowordssuchasvery,quite,most,etc.Thesearethewordsthatchangesentimentoftheneighbouringnon-neutralterms.Theycanbedividedintotwocategories[29],namelyamplifiers(very,most)anddowntoners(slightly)thatincreaseanddecreasetheintensityofsentiment,respectively.Inourapproach25mostfrequentlyappliedintensifierswereselectedandthen,dependingontheirpolarity,theyweredividedinto3categories,namelydowntoners,weakamplifiersandstrongamplifiers.Empiricallydowntonersrepresentintensifiersthatdecreasevalueofthesentimentby50 %.Weakandstrongamplifiersincreasesentimentby50and100 %,respectively.Noneofthenegatorsandintensifiersisincludedinthesentimentlexicon.Consequently,iftheyappearinasentencesurroundedbyonlyneutraltext,theyareconsideredasneutralwords.However,iftheyappearinaneighbourhoodofpositiveornegativewordstheyareconsideredasnon-neutralgiventhattheyinfluencethefinalsentimentofasentence.CombiningfunctionOnceallpositiveandnegativewordsareidentifiedinasentenceandtheirlocalcontextisverified,acombiningprocessisperformedinordertoobtainthefinalsentimentvalue.Inmostoftheexistingapproachestosentimentanalysis,theoutputoftheprocessisrepresentedasapositiveornegativeclasslabel.Inourworkweattemptedtodesignasentimentcombiningfunctionthat,basedonthesentimentofsinglewords,providestheabsolutesentimentofthemessageasanormalisedvaluefromtherangeof−100to100.Themotivationforsuchanapproachwasthepossibilitytoanalysethesentimentinthedegreeofintensityasopposedtopositiveandnegativeonly.Apartfromthepolarity,wewantedtobeabletodeterminehowstronglypositive/negativeasentenceisandwhichofanytwosentencesismorepositive/negativethantheother.Consequently,thecombiningfunctionshouldbeabletomodeltherelationbetweensentencesdependingonthenumberofnon-neutralwordsandthevalueofthesentimenttheycontain.Inthefirstattemptanaveragewasconsideredasacombiningfunctionforthesentimentwithinamessage.Thissolution,however,didnotprovideanaccuratedifferentiationbetweensentences.Forexample,forthetwosentencespresentedbelow,basedontheaveragewearenotabletorecognisecorrectlywhichsentenceexpressmorepositiveopinion. “Thehotelisbeautiful(100).” “Thehotelisbeautiful(100)andthestaffareoutstanding(80)” Thenumbersinbracketsrepresentthesentimentvaluestakenfromthesentimentlexicon.Wecanfindoutfromtheaboveexamplethatbothwords,beautifulandoutstanding,arepositivewithsentimentvaluesof100and80,respectively.Intuitivelywecansaythatthesecondsentenceexpressesstrongerpositiveopinionthanthefirstone.However,takingunderconsiderationtheaveragesentiment,thefirstsentenceismorepositive.Consequently,wecaninferfromthisthattheaveragecannotbeanoptimalsentimentcombiningfunction.Inthisstudy,weproposedanewnormalisationformulathatcombinestheaveragesentimentofasentenceandthenumberofwordstocalculatetheaverage.Theideawasthat,foragivenaveragesentimentofamessage,thedifferencebetweentheoverallpositiveandoverallnegativesentimentsshouldalsodependonthenumberofpositiveandnegativewordsinthemessage.Therefore,theoverallpositive/negativesentimentshouldberepresentedasaproductoftheaveragesentimentandacoefficientthat’svaluedependsonthenumberofpositive/negativewords.Followingthisrationale,wedevelopedthenormalisationformulas,denotedbyF P andF N thatcalculatetheoverallpositiveandnegativesentimentinasentenceasfollow:$$\begin{aligned}F_{P}=min\left\{{\frac{{A_{P}}}{{2-log\left({p\timesW_{P}}\right)}},\,100}\right\}\hfill\\F_{N}=max\left\{{\frac{{A_{N}}}{{2-log\left({p\timesW_{N}}\right)}},\,-100}\right\}\hfill\\\end{aligned}$$ (4) whereA P ,A N standforanaverageofpositiveandnegativesentimentrespectively,andW P ,W N representthenumberofpositiveandnegativewordsappliedwhilecalculatingA P andA N ,respectively.Theideawastoapplythelogarithmicfunctioninordertomodeltherelationbetweenthenumberofpositive/negativewordsandF P /F N foragivenvalueoftheaveragepositive/negativesentimentinasentence.Theparameterpdeterminesshapeofthelogarithmicfunction.ThegreaterthevalueofpthefasterthevalueofF P /F N increasesasthenumberofnon-neutralwordschanges.Inordertodeterminetheoptimalvalueofpweperformedasimplestatisticalanalysisof13,500tweetsandanalysenon-neutralwords’distributionacrossmessages.Figure 1demonstratedtheresultsweobtained.Fig. 1Distributionofnon-neutralwordsamong13,500TwittermessagesFullsizeimage Itcanbeobservedthatverysmallnumberofmessagescontainmorethan3non-neutralwords.Consequentlyweassumedthatforthevalueofaveragesentimentequals100,theF P /F N isequals100forthenumberofnon-neutralwordsbeingequalto3.Inordertoachievethis,weneedtoselectvaluepthatwillgiveavalueofthecoefficientinEq. 5equalling1forW p  = 3.ThegraphsinFig. 2demonstratehowthevalueofthecoefficientchangesfordifferentvaluesofp.Fig. 2ValuesofthecoefficientpresentedinEq. 5fordifferentnumbersofnon-neutralwordsanddifferentvaluesofp Fullsizeimage $$\frac{1}{{2-log\left({p\timesW_{P}}\right)}}$$ (5) Wecanobserveforthetablethatforthreenon-neutralwordsinamessage,thecoefficientisequalto1ifp = 3.5.Consequently,weselectedp = 3.5tobeappliedinthesentimentcombiningformulaEq. 4.Figure 3demonstrateshow,foranaveragevalueofsentimentbeingequalto20,40,60,80and100,thevalueofFP/FNchangesfordifferentnumberofnon-neutralwords(1…5).EachlinerepresentsvalueofFP/FNfordifferentvaluesoftheaveragesentiment.Wecanobserve,forexample,thatforthemessageswithaveragesentimentequalling20,thevalueofFP/FNchangesfromaround15upto30.Fortheaveragesentiment100,thevaluechangesfrom70to100.Fig. 3ValuesofFP/FNobtainedfordifferentnumbersofwordsanddifferentvaluesoftheaveragesentimentFullsizeimage Followingtheaforementionedevaluation,theformulasforcalculatingtheoverallpositiveandnegativesentimentofasentencewerewrittenasEq. 6.$$\begin{aligned}F_{P}=min\left\{{\frac{{A_{P}}}{{2-log\left({3.5\,\,\times\,\,W_{P}+I_{P}}\right)}},\,\,100}\right\}\hfill\\F_{N}=max\left\{{\frac{{A_{N}}}{{2-log\left({3.5\,\,\times\,\,W_{N}+I_{N}}\right)}},\,\,-100}\right\}\hfill\\\end{aligned}$$ (6) whereI P andI N standforthenumberofintensifiersthatreferrespectivelytopositiveandnegativewordsinasentence.Insteadofdecreasingorincreasingvaluesofword’ssentimentby50or100 %,wesimplydecreaseorincreasethenumberofwordsbyappropriatevaluesof0.5or1,respectively.Asanoutputofthesentimentcombinationandnormalizationprocessweobtaintwovalues.Oneisfromrange0–100representingtotalpositivesentimentofatweetandanotherfromrange−100to0standingforthetotalnegativesentiment.Initiallythealgorithmcomparedtheabsolutevaluesofthetwosentimentsandclassifiedtweetaspositiveornegative,dependingonwhichofthevalueswasgreater.Thenormalisedvaluerepresentingtheintensityofthepositiveornegativesentimentwasthenprovidedasanoutput.ThesameformulaEq. 6hasbeenappliedtocombinepiecesofevidenceforallpositiveandnegativewordsincasethatamessagecontainsmixsentiment.Thecombinationformulaisonlyappliedforwordswiththeevidencegreaterthan0.5.Weassumedthatprobabilitieslowerthan0.5shouldnotbeconsideredasevidence.Whilecombiningpiecesofevidence,themaxandminpossiblevalueswereconsideredas1and−1ratherthan100and−100,asgiveninEq. 7.$$\begin{aligned}e_{P}=min\left\{{\frac{{A_{P}}}{{2-{log}\left({3.5\timesW_{P}}\right)}},\,\,1}\right\}\hfill\\e_{N}=max\left\{{\frac{{A_{N}}}{{2-{log}\left({3.5\timesW_{N}}\right)}},\,\,-1}\right\}\hfill\\\end{aligned}$$ (7) wheree P ande Nrepresentoverallpositiveandnegativeevidenceinasentence.Positiveandnegativeevidencewerecombinedseparatelyandtheoutputswereconsideredasthefinalevidencethatthemessageispositiveornegative.Thesetwovaluesweretakenunderconsiderationinthesentimentclassificationprocess.SentimentclassificationForagivenmessage,inthefirststepoftheclassificationprocess,allevidenceandsentimentvaluesarecombinedbyusingEqs. 6and7.Followingthis,thedecisionprocessisperformedaspresentedinFig. 4.Fig. 4SentimentclassificationprocessFullsizeimage ThefinalSentimentfunctionvalidatesthevalueofF P /F N ande P /e N.Dependingoniftheabsolutevalueofthesentimentisgreaterthan25ortheabsolutevalueoftheevidenceishigherthan0.5,itreturnsthesentimentor0.Ifthereareonlypositivewordsinthemessage,thefinalvalueofthesentimentisselectedbasedonF P ande P only.Thesamehappensifthereareonlynegativewordsinthemessage.Incasewhenthereisamixtureofpositiveandnegativewords,themessageisclassifyaspositiveornegative,dependingonwhich,positiveornegative,wordsarestronger.First,thedifferencebetweenpositiveandnegativeevidenceiscalculated.Ifonepieceoftheevidenceismuchhigherthantheother(greaterthan0.1)thenthepositiveornegativesentimentisreturned,respectively.Incasewhenthereisnoevidenceavailableortheydonotdifferstronglyenoughfromeachother,thefinaldecisionismadebasedonthedifferencebetweenpositiveandnegativesentiment.Ifthepositivesentimentisgreaterthanthenegativesentimentthesentenceisclassifyaspositiveandviceversa.EmpiricalevaluationThepurposeofthisstudywastoevaluatetheperformanceofthenewlexicon-basedsentimentanalysisalgorithmwithinthedomainofsecurityandsocialmediaanalytics.Inaddition,weevaluatedhowthealgorithmperformedwithlongermessages,suchasmoviereviews.Thesamelexiconhasbeenappliedwithbothofthedatasets.Intheexperimentwecomparedperformancesoffivesentimentanalysisalgorithms,namely: Lgenerallexiconbasedtechniquethatincludesnegationandintensification.InsteadofapplicationofEq. 6itsimplysumssentiment’svaluesofallpositiveandnegativewordswithinasentence.Itfurtherclassifiesmessagesaspositive,negativeorneutraliftheobtainvalueispositive,negativeorequalzero,respectively. LNperformsinthesamemanneranalysisasL,however,insteadofsummingitappliesEq. 6tocombinethesentiment’svaluesofpositiveandnegativesentiment. LNSperformsLNforeachsentencewithinamessageandcalculatedoverallpositive/negativesentimentofasentenceasanaverageofthevaluesobtainedforallofthesentenceswithinamessage. LNWperformsasLNbutincaseofmixedsentimentwithinamessageitappliestheevidence-basedfunctionpresentedinEq. 7andfollowstheprocessfromFig. 4toclassifythemessageaspositiveornegative. LNWSperformsLNWforeachsentencewithinamessage.TheprocessfromFig. 4isrepeatedforeachofthesentences.Thefinalsentimentiscalculatedasanaverageofthevaluesobtainedforallthesentences.Allthealgorithmswereevaluatedwithtwodatasets.Theevaluationresultsarepresentedinthetwofollowingsections.Inordertoprovidemoreinsight,foreachdataset,thebestperformingmethodwasfurtherevaluatedintermofprecision,recallandF-measure.SocialmediaTheaforementionedtechniqueswereevaluatedwiththeStanfordtestTwittercorpus[29].WiththeStanfordtraindatasetthatwasusedforgeneratingthelexicon,thesentimentwasassignedautomaticallybasedonthepresenceofemoticonsinthemessages.Thereforeitisnotguaranteedthatthelabelsweredeterminedwith100 %accuracy.Asopposedtothetraindataset,theStanfordtestdatasetwasmanuallycollectedandlabelledhenceitismoreappropriateforevaluationoftheclassificationmodels’performance.Itcontains177negative,182positiveand139neutralmanuallylabelledtweets.TheclassificationaccuracyofallthealgorithmsdescribedintheprevioussectionispresentedinTable 1.Table 1ClassificationaccuracyofthefivealgorithmsappliedwithStanfordTwitterdatasetFullsizetable Inanefforttogainabetterinsightintotheobtainedresults,aconfusionmatrixwasconstructedfortheLNWmethodthatobtainedthebestresults.Table 2presentsresultsobtainedbytheLNWmethodappliedwiththeStanfordTwitteddataset.Table 2ConfusionmatrixgeneratedbasedontheresultsobtainedbytheLNWmethodFullsizetable Columnsinthetablerefertoactualsentimentofthetweetsfromthetestingset.RowsrepresentthesentimentpredictedbytheLNWmethod.Thediagonalrepresentsthetruepositiveindicatingtheinstances,whichwerecorrectlyclassifiedbythemethod.Basedontheconfusionmatrixtheprecision,recallandF-MeasureofthemethodwerecalculatedandpresentedinFig. 5.Fig. 5Precision,recallandF-measureoftheLNWmethodforeachoftheclassesFullsizeimage MoviereviewsTheproposedsentimentanalysismethodwasdesignedparticularlyforsocialmediadata.InthefuturewewishtoevaluatethemethodwithdatapulledfromdifferentsourcessuchasFacebook,wheremessagesarelongerandcontainmultiplesentences.However,forthisworkTwittercorpuswastheonlyone,manuallylabelledandpubliclyavailable,thatwecouldfind.Inordertoevaluatespecificpartsofthealgorithminmoredetails,itwasnecessarytoapplyouralgorithmwithasetofmorecomplexdocuments.Becauseoflackofsocialmediadata,theInternetMovieDatabase(IMDB)[31]with25,000moviereviewsincluding12,500positiveand12,500negativewasselectedforthispurpose.Themainobjectiveforthisexperimentwastoinvestigatehowtheproposedmethodperformsonsentencelevelincomparisontodocumentlevel.ItwasobservedduringtheexperimentwiththeTwitterdatasetthatbetterresultcouldbeobtainedwhennosentenceanalysiswasapplied.Inthisexperimentwewantedtoexaminewhetherthesamesituationtakesplaceforlongermessagessuchasmoviereviews.GiventhatweaimedtocompareLNwithLNSandLNWwithLNWS,ratherthantotestthemethodinthemoviedomain,weusedthelexicontrainedwiththeTwitterdata.TheclassificationaccuracyofallthealgorithmsdescribedintheprevioussectionispresentedinTable 3.Table 3ClassificationaccuracyofthefivealgorithmsappliedwithStanfordIMDBFullsizetable Table 4representsresultsobtainedbytheLNWSmethodthatobtainedthegreatestaccuracywiththeIMDBdataset.Table 4ConfusionmatrixgeneratedbasedontheresultsobtainedbytheLNWSmethodFullsizetable Figure 6representstheprecision,recallandF-Measurecalculatedforeachoftheclassesbasedontheconfusionmatrix.Fig. 6Precision,recallandF-measureoftheLNWSmethodforeachoftheclassesFullsizeimage DiscussionTwitterdatasetWecanobservefromTable 1thatthetraditionallexiconbasedmethodobtainedaccuracyof69.1 %withtheTwitterdataset.Itcanbenoticedthattheaccuracyincreasedwhenthesentimentnormalisationhadbeenapplied,indicatingthatthenormalizationfunctionexpressesmoreaccuratelytheintensityofthesentimentofmessagesincomparisontothesumfunction.Besidesthis,wecanfindfromtheresultspresentedinTable 1thatapplicationoftheevidence-basedfunctionimprovestheperformanceoftheproposedmethod(77.3 %).Followingthis,weinvestigatedwhateverbetterperformancecanbeachievedwhileperformingsentenceormessagelevelsentimentanalysis.Fromtheobtainedresultsitcanbenoticedthatforshortmessagessuchastweetsbetteraccuracywasachievedformessagelevelsentimentanalysis.Finally,itcanbeinferredfromtheresultsthatLNWwasthemostappropriatesentimentanalysismethodforTwitterdata.Itcanbeobserved,basedontheresultspresentedinFig. 5thatintermofF-measureLNWperformedtheworstwiththeneutralmessages.TheprecisionobtainedbyLNWfortheneutralclasswasonly0.64,whichindicatesthatthemostcommonmistakemadebythemethod,ismisclassifyingpositiveandnegativemessagesasbeingneutral.WecanseefromTable 2that32ofpositiveand29ofnegativemessageswereclassifiedasneutral.Themisclassifiedpositiveandnegativetweetswereassignedasneutralinmajorityofthecases.Only15positivemessageswereclassifiedasnegativeandonly8ofnegativetweetswereassignedaspositive.MoviesreviewdatasetItcanbeobservedfromtheresultspresentedinTable 3that,asopposedtotheTwitterdata,applicationofthenormalisationfunctionwiththelexicon-basedmethodcauseddecreaseintheaccuracyfrom67.5to51.4 %.Atthesametime,applicationofnormalisationsimprovedtheperformancesignificantly(71 %)whilethesentimentanalysiswasperformedonsentencelevel.Thisindicatesthatthenormalisationprocessismoreappropriateforshortmessagesorsentences,ratherthanalongdocument.Followingthis,wecanseethatLNWandLNWSachieved60and74.2 %accuracy,respectively.Thisshowsthatapplicationoftheevidencefunctionimprovetheperformanceonlyinthecaseofsentencelevelsentimentanalysis.Lastly,itcanbeinferredfromtheobtainedresultsthattheLNWSmethodwasthemostaccuratewhileappliedwiththeIMDBdataset.ItcanbenoticedfromTable 4thatthemisclassifiedpositivereviewsweremoreoftenassignedasnegative(3338)ratherthanneutral(442).Similarly,negativereviewsweremoreoftenmisclassifiedaspositive(2532)thanasneutral(324).Intermofprecisionandrecall,theLNWSmethodperformedonasimilarlevelforboth,positiveandnegativeclasses.Followingtheaforementionedevaluationofthenewlexiconbasedapproach,wecanconcludethatforshortmessages,suchastweets,themethodperformsbetterondocumentlevel(LNW).Forlongermessages,ontheotherhand,themostoptimalresultsareobtainedwhenthemethodisperformedonthesentencelevel(LNWS).Casestudy:EnglishdefenceleagueEnglishdefenceleagueTheEnglishdefenceleague(EDL)isarightwingpoliticalorganisationthatopposeswhatisconsideredtobethe“SpreadofIslamismintheUnitedKingdom”(http://www.englishdefenceleague.org).EDLwasformedin2009anditsprincipalactivitieshavebeenregularstreetdemonstrationsinEnglishandWelshtownsandcities.Inthismannerthegroupattemptstoinfluencepublicopinion.EDLhasnumberofopponents,suchasUniteAgainstFascism(UAF),thatattendtocountertheirdemonstrations.Eventhoughitaimstodemonstratepeacefully,conflictswiththecounterdemonstratorsoftenledtostreetviolence,anti-socialbehavioursandarrests.DuetothehighlikelihoodofviolencethereisusuallyheavypolicingrequiredduringEDLoropposingdemonstrations.Thecostofpolicingthesedemonstrationsisestimatedtobefrom£300,000to£1millionforanevent.Inthepast5 years,anumberofEDLandopposingdemonstrationstookplaceinEngland.Someofthemwereverypeacefulwithoutanymajorincidents.Afewofthem,however,requiredalargepolicepresence.WeselectedtheEDLrelatedeventsasacasestudyforourwork.WeaimtoinvestigatetherelationbetweennegativesentimentofmessagesrelatedtotheeventsbeingpostedonTwitterandtheamountofdisorderduringthedemonstrations.ForthispurposeweselectedfourEDLeventsdescribedbelow.20thJuly2013BirminghamTheBirminghamdemonstrationwasorganizedbyEDLandviolentdisorderwithanumberofclashesbetweenEDLsupporters,anti-fascistprotestersandpolicewasreportedinthepress.Smokebombs,stonesandbottleswerethrownatthepoliceastheEDLandtheopponentsgatheredinthecitycentreforsimultaneousdemonstrations.AccordingtotheBirminghamMail,Footnote1closeto50peoplehavebeenchargedbyWestMidlandsPoliceforcriminaldamageandassaultrelatingtotheprotests.6thFebruary2014SloughTheSloughMarchwasorganizedbyEDLanditinvolvedanumberofcounterprotests.Thetwoopposingdemonstrationspassedoffwithoutincidentforthepolice.OnlyaverysmallamountofdisorderbrokeoutduringtheMarch.ThelocalpolicecommanderforSloughsaidFootnote2:“Iampleasedthatthesedemonstrationshavepassedoffwithoutmajorincident.Disruptionwaskepttoaminimumandwearegratefulforthesupportreceivedfromlocalcommunities.”27thApril2014BrightonTheBrightondemonstrationwasorganizedbythe‘MarchforEngland’(MfE)organisation.Duringtheprotestthepoliceweretryingtoseparate150nationalistsfrommorethanathousandanti-fascistsdemonstrators.ThiswasconsideredasoneofthelargestpoliceoperationsinBrighton.Anumberofviolentclashesbetweenmembersofeachgrouptookplacefollowedby27arrests.Footnote3 10thMay2014RotherhamTheRotherhamdemonstrationwasorganizedbyEDLanditinvolvedhundredsofpeoplemarchingthroughthetowncentre.EventhoughalargegroupofUAFmemberswasholdingacounter-protestatthesametime,policesaidtheeventsawminimaldisruptionandnodisorder.Footnote4 PublicsentimentforEDLThefocusofthisstudyhasbeendirectedtowardstheanalysisoftherelationshipbetweenpublicsentimentandtensionofEDLrelatedevents.InourworkweattemptedtoinvestigateifthepublicsentimentregardingtheEDLcanbeappliedtopredict(toacertainextent)thelevelofdisruptionduringtheevent.AsthefirststepwedecidedtoconsiderTwitterasadatasource,giventhatitisconsideredasthemostpopularsocialmediachannel.OurgoalwastoanalysethenegativesentimentofallthemessagesrelatedtoEDLthathadbeenpostedonTwitterpriortoeachofthefoureventsmentionedintheprevioussub-section.Followingthis,wewereabletoobserveifthereisanyrelationshipbetweenthenegativesentimentandthelevelofdisruptionanddisorder.Forthepurposeofthisstudywegathereddatafrom24 days(6 dayspriortoeachoftheevents).ThedatawasgatheredthroughtheRepKnightplatform.AlltheobtainedtweetswereassociatedtoEDLandtheywereidentifiedthroughkeywordsearches.ThedataissummarizedinTable 5below.Table 5EDLrelatedtweetsFullsizetable Foreachtweet,thelexicon-basedsentimentanalysisalgorithmintroducedin“Empiricalevaluation”wasapplied.Wedecidedtodecreasethesizeofthesentimentlexiconandmakeitmoredomain-specificbyremovingirrelevantwords.Forthispurposeweappliedadatasetwith1millionEDLrelatedtweets.Foreachwordfromthegenerallexiconwecalculateditsfrequencyoccurredinthecorpus.Itappearedthatonlyasubsetof1500wordsoutof4000hadbeenused.Fromthisobservation,wereducedthesizeofthelexiconfrom6000to1500words.Giventhatweareinterestedinpredictinglevelsofviolenceanddisorderduringpublicevents,wetakeunderconsiderationonlythenegativesentiment.Asaresultofthesentimentanalysisprocess,eachtweetwasassignedwithanormalisedvaluefromarange0–100,where100representsthegreatestnegativesentiment’svalue.Thetwofactorsthatweintendtoanalysewerethenumberofnegativetweetspostedduring6 dayspriortotheeventandthelevelofnegativesentimentwithinthesemessages.Thedataselectedforeachdaywasfirstanalysedseparatelyandthenaggregated.Tweetsselectedfromeachdayweregroupedintofivecategoriesrelatedtothestrengthofthenegativesentiment(0–20,20–40,40–60,60–80and80–100).Messagesfromthecategories0-20and80-100areconsideredtobetheleastandthemostnegative,respectively.EachofthefiguresbelowpresentstwodiagramsrepresentingdatarelatedtooneofthefoureventsfromTable 5.Thediagramsonthelefthandside(doughnuts)presentallthedataselectedduringthe6dayperiod.Eachofthedoughnutsdemonstratesthedistributionofthenegativetweetsoverthefivecategories.Eachcategoryisrepresentedbyadifferentcolour.Thediagramsontherighthandsidepresentthedistributionofthetweetsfromeachcategoryoverthe6dayspriortotheevent. Followingtheinformationthathasbeenprovidedin“PublicsentimentforEDL”wecaninferthattheBirminghamEDLdemonstrationwasthemostviolentone.Itcausedthehighestlevelofdisorderandwasfollowedbythehighestnumberofarrests.BasedontheresultspresentedinFig. 7itcanbenoticedthattheBirminghamdemonstrationobtainedthehighestattentionofTwitter’suserscomparingtotheotherevents.Thenumberofnegativemessagespostedduringthe6 dayspriortotheprotestinBirmingham(18,038)isalmostthreetimeshigherthanthoseinBrighton(5558)andRotherham(5352).Itcanbefoundthatthenumberofthemostnegativetweets(fromcategories60–80and80–100)isgreaterthanthenumberofallnegativemessagesgatheredfortheotherevents.TheseconddemonstrationthatcausedstreetviolentanddisorderwastheMfEinBrighton.Theothertwoevents,namelyEDLSloughandEDLRotherhamrevealedminimaldisruptionsandcanbeconsideredaspeaceful.WecanobservefromFig. 8thattheprotestinSloughobtainedtheleastattentionfromTwitter’susers.Thenumberofnegativemessagesandthelevelofthenegativesentimentweremuchlowercomparingtothetwodemonstrationswhereviolencewasreported.ThesignificantdifferencethatcanbefoundbetweenthegraphspresentedinFigs. 7and8indicatesthatthereissomecorrelationbetweenthelevelofnegativesentimentaroundademonstrationandtheleveloftensionduringtheevent.Therelationbetweenthenegativesentimentandthedegreeofviolenceisnot,however,noticeablefromFigs. 9and10.Bothoftheevents,EDLBrightonandEDLRotherham,obtainedsimilaramountofnegativetweets,namely5558and5352respectively.Inthesametime,theeventinBrightonwasviolentwhilethedemonstrationinRotherhamwaspeaceful.Fig. 7NegativesentimentanalysisresultsforEDLfromtheperiod14–19July2013priortotheEDLdemonstrationinBirmingham.Totalnumberofnegativetweetscollectedis18,038Fullsizeimage Fig. 8NegativesentimentanalysisresultsforEDLfromtheperiod26–31January2014priortotheEDLdemonstrationinSlough.Totalnumberofnegativetweetscollectedis2849Fullsizeimage Fig. 9NegativesentimentanalysisresultsforEDLfromtheperiod21–26April2014priortotheMfEdemonstrationinBrighton.Totalnumberofnegativetweetscollectedis5558Fullsizeimage Fig. 10NegativesentimentanalysisresultsforEDLfromtheperiod4–10May2014priortotheEDLdemonstrationinRotherham.Totalnumberofnegativetweetscollectedis5352Fullsizeimage Theobtainedresultssuggestthatthenegativesentimentaroundaneventcanbe,toacertainextent,appliedasanindicatorforthelevelofdisorder.Suchatoolcouldbeusefulasanadditionalsupportforthepoliceserviceswhileplanningtheresourcesaroundsafeguardingofpublicevents.Socialmediaistheeasiestandfastestwaytosourceandidentifywhatpeoplearesayingandhowtheyarefeelingaboutdifferentevents,forexample.Analysisofnegativesentimentofmessagesrelatedtoapubliceventprovidesinformationrelatedtothestateofmindofpeoplethataregoingtoattendorareattendingtheevent.Suchananalysiscanbeappliedforpredictionaswellasmonitoringdisorderduringpublicevents.ConclusionsandfutureworkInthisworkwepresentedanewapproachtolexicon-basedsentimentanalysisofTwittermessages.Inthenewapproach,thesentimentisnormalised,whichallowsustoobtaintheintensityofsentimentratherthanpositive/negativedecision.Anewevidence-basedcombiningfunctionwasdevelopedinanefforttoimproveperformanceofthealgorithminthecaseswhereamixedsentimentoccursinamessage.TheevaluationwasperformedwiththeStanfordTwittertestsetandIMDBdataset.Itwasfoundfromtheresultsthatthetwonewfunctionsimproveperformanceofthestandardlexicon-basedsentimentanalysisalgorithm.Itcouldbenoticedthatthemethodismoreappropriateforshortmessagessuchastweets.Whenappliedwithlongdocumentsthemethodperformedsignificantlybetteronthesentencethanonthedocumentlevel.Followingthis,ourintentionwastoinvestigatetherelationshipbetweentheamountandthelevelofnegativesentimentrelatedtoapublicdemonstrationandthelevelofviolenceanddisorderduringtheevent.Inotherwords,weaimedtoascertainifsentimentanalysiscouldbeappliedasasupportivetoolwhilepredictingalevelofdisruptionpriortopublicevents.AsafirststepinthisstudywedecidedtoexamineTwitterasasourceofdata.Fourdifferentdemonstrationswereselectedandthenegativesentimentrelatedtotheseeventswasanalysedover6 dayspriortoeachevent.FollowingthecasestudyandanumberofanalyseswewereabletorevealthattherewasarelationshiptosomeextentbetweenthenegativesentimentandthelevelofdisorderduringtheEDLevents.Furtherresearchishoweverrequiredinthisareainanefforttoprovidemoreaccuratefindingsandconclusions.Atthecurrentstagewecan,however,conjecturethatsentimentanalysisofsocialmediacontentcanprovidevaluable,security-relatedinformationregardingsomeupcomingpublicevents.Inthenextstepwewishtocollectmoredatarelatedtopubliceventsandfurtherinvestigatetherelationshipbetweennegativesentimentandthelevelofviolenceanddisorderduringevents.Followingthis,weaimtodevelopapredictivemodelthatcanbeusedbypoliceservicesasasingletooltohelpindicateviolencepropensity.Infutureworkwewishtofocusmoreonmultilingualsentimentanalysis.Giventhatdatapulledfromsocialmediaarecreatedbyusersfromallovertheglobe,thereisaconsequentdemandtoperformsentimentanalysisinmorethanjustonelanguage.Themostchallengingproblemwhiletryingtotranslatesentimentlexiconinadifferentlanguageisinflectionandconjugationofwordsappliedinsomeofthelanguages.UnlikeinEnglish,somelanguagesmakeuseofgrammaticalgenderandplural.Followingthis,verbs,nounsandadjectivesareinflectedforpersonornumberandverbsaremarkedfortense.Forexample,whileinEnglishtheverb“love”canbeusedin4differentforms(love,loved,loving,loves),inPolishlanguagethereare20differentformsdependingoftenseandperson.Besidesthis,theadjective“nice”,forexample,inPolishlanguagecanbeusedin5differentforms.Consequently,itwouldbeveryinefficienttoincludeallthedifferentformsofwordsinthelexicon,especiallywhentalkingaboutrealtimeanalysis.Insomepreliminaryworkwewereabletodemonstratethatbyapplicationofanappropriatestringsimilarityfunctionitispossibletoperformsentimentanalysiswiththelexiconcontainingonlyregularformofwords.Anotherimportantissuewhiletranslatingalexiconintoanotherlanguageisdisambiguation.Itisimportanttoensurethatforambiguouswords,theappropriatemeaningshavebeentranslatedandincludedintonewlexicon.Consequently,anautomatictranslationmaynotprovidethedesiredresults.Inourwork,semi-automatictranslationhasbeenappliedwhereallambiguouswordsweretranslatedmanually.WewereabletoshowthatbytranslatingwordsfromtheEnglishlexiconintoregularPolishandPortuguesewordsandbyapplicationofastringsimilarityfunction,thesentimentanalysisofPolish/PortuguesetweetscanbeperformedonasimilarlevelofaccuracyasfortheEnglishlanguage.Atthesametime,somepreliminaryexperimentsdemonstratedthattheproposedmethodcouldbeeasilyadaptedtolanguagessuchasMalay,wherenoinflectionorconjugationisbeingappliedtothewords.Inthefutureworkweintendtoevaluatethemultilingualversionofthemethodinmoredetails. Notes http://www.birminghammail.co.uk/news/local-news/violence-at-edl-birmingham-rally-5165256. http://www.bbc.co.uk/news/uk-england-berkshire-25999527. http://www.theargus.co.uk/news/11175736.Violent_clashes_as_March_for_England_returns_to_Brighton/?ref=var_0. http://southyorks.police.uk/news-syp/protest-rotherham-town-centre.ReferencesLexisNexis®RiskSolutions(2012)Surveyoflawenforcementpersonnelandtheiruseofsocialmediaininvestigations.http://www.lexisnexis.com/investigations GrabnerD,ZankerM,FliedlG,FuchsM(2012)Classificationofcustomerreviewsbasedonsentimentanalysis.In:proceedingofInternationalConferenceonInformationandCommunicationTechnologiesinTourism,pp460–470KraussJ,NannS,SimonD,FischbachK,GloorP(2008)Predictingmoviesuccessandacademyawardsthroughsentimentandsocialnetworkanalysis.In:ProceedingsofEuropeanConferenceonInformationSystems(ECIS)AsurS,HubermanBA(2010)Predictingthefuturewithsocialmedia.In:ProceedingsofIEEE/WIC/ACMInternationalConferenceonWebIntelligenceandIntelligentAgentTechnology,pp492–499HuY,WangF,KambhampatiS(2013)Listeningtothecrowd:automatedanalysisofeventsviaaggregatedtwittersentiment.In:ProceedingofInternationalJointConferenceonArtificialIntelligence,pp2540–2646GarciaA,GainesS,LinazaMT(2012)Alexiconbasedsentimentanalysisretrievalsystemfortourismdomain.ExpertSystApplIntJ39(10):9166–9180Article  GoogleScholar  XuJ,ZhuX,BellmoreA(2012)Fastlearningforsentimentanalysisonbullying.In:ProceedingofInternationalWorkshoponIssuesofSentimentDiscoveryandOpinionMiningMittalA,GoelA(2013)Stockpredictionusingtwittersentimentanalysis.In:ProceedingofIEEE/WIC/ACMInternationalConferenceonWebIntelligenceandIntelligentAgentTechnologyCataldiM,CaroLD,SchifanellaC(2010)Emergingtopicdetectionontwitterbasedontemporalandsocialtermsevaluation.In:ProceedingofInternationalWorkshoponMultimediaDataMiningLamposV,BieTD,CristianiniN(2010)Fludetector—trackingepidemicsonTwitter.In:ProceedingsofEuropeanConferenceonMachineLearningandKnowledgeDiscoveryinDatabases,pp599–602SadilekA,KautzH,SilenzioV(2012)Predictingdiseasetransmissionfromgeo-taggedmicro-blogdata.In:ProceedingsofAAAIConferenceonArtificialIntelligenceGlassK,ColbaughR(2011)Webanalyticsforsecurityinformatics.In:ProceedingsofEuropeanIntelligenceandSecurityInformaticsConference,pp214–219TyshchukY,WallaceW,LiH,JiH,KaseS(2014)ThenatureofcommunicationsandemergingcommunitiesonTwitterfollowingthe2013SyriaSarinGasAttack.In:ProceedingofIEEEJISICKongQ,MaoW,DajunZengD,WangL(2014)Predictingpopularityofforumthreadsforpubliceventssecurity.In:ProceedingofIEEE JISIC,pp99–106SpittersM,EendebakPT,WormDTH,BoumaH(2014)ThreatdetectioninTweetswithTriggerpatternsandcontextualcues.In:ProceedingofIEEEJISIC,pp216–219CanoAE,HeY,LiuK,ZhaoJ(2013)Aweaklysupervisedbayesianmodelforviolencedetectioninsocialmedia.In:ProceedingofIJCNLPSakakiT,ToriumiF,MatsuoY(2011)Tweettrendanalysisinanemergencysituation.ProcACMSWID3:1–3:8DoanS,VoBKH,CollierN(2011)AnanalysisofTwittermessagesinthe2011TohokuEarthquake.eHealth58–66ColbaughR,GlassK(2011)Agilesentimentanalysisofsocialmediacontentforsecurityinformaticsapplications.In:ProceedingsofEuropeaninIntelligenceandSecurityInformaticsConference,pp327–331ColbaughR,GlassK(2013)Analysingsocialmediacontentforsecurityinformatics.In:ProceedingofEuropeanIntelligenceandSecurityInformaticsConference,pp45–51LiW,ChenH(2014)Identifyingtopsellersinundergroundeconomyusingdeeplearning-basedsentimentanalysis.In:ProceedingIEEEJISIC,pp64–67WestlingA,BrynielssonJ,GustaviT(2014)Miningthewebforsympathy:thepussyriotcase.In:ProceedingIEEEJISIC,pp123–128BerminghamA,ConwayM,McInerneyL,O’HareN,SmeatonAF(2009)Combiningsocialnetworkanalysisandsentimentanalysistoexplorethepotentialforonlineradicalization.ASONAMCohenK,JohanssonF,KaatiL,MorkJC(2014)Detectinglinguisticmarkersforradicalviolenceinsocialmedia.TerrorPolitViolence26(1):246–256Article  GoogleScholar  TaboadaM,BrookeJ,TofiloskiM,VollK,StedeM(2011)Lexicon-basedmethodsforsentimentanalysis.ComputLinguistJ267–307TongRM(2001)Anoperationalsystemfordetectingandtrackingopinionsinon-linediscussions.In:WorkingNotesoftheSIGIRWorkshoponOperationalTextClassification,pp1–6TurneyP,LittmanM(2003)Measuringpraiseandcriticism:inferenceofsemanticorientationfromassociation.ACMTransactInformSystJ21(4):315–346Article  GoogleScholar  EsuliA,SebastianiE(2006)SentiWordNet:apubliclyavailablelexicalresourceforopinionmining.In:Proceedingsoflanguageresourcesandevaluation(LREC)GoA,BhayaniR,HuangL(2009)Twittersentimentclassificationusingdistantsupervision.TechnicalProject,StanfordDigitalLibraryTechnologiesProject GoogleScholar  SauriR(2008)Afactualityprofilerforeventualitiesintext.PhDThesisMaasAL,DalyRE,PhamPT,HuangD,NgAY,PottsC(2011)Learningwordvectorsforsentimentanalysis.In:ProceedingsofACL,pp142–150JurekA,BiY,MulvennaMD(2014)Twittersentimentanalysisforsecurity-relatedinformationgathering.In:ProceedingsofIEEEJISIC,pp48–55DownloadreferencesAuthors’contributionsAJwastheleadresearcherforthisworkandundertookthedesignanddevelopmentofthesentimentanalysisalgorithm, dataanalysisandpreparationofthemanuscript. MMandYBsupportedthedesign,fromatheoreticalperspective,ofthelexiconbasedapproachandsupportedthepreparationofthemanuscript.Allauthorsreadandapprovedthefinalmanuscript. Acknowledgements Thiswork undertakenbetweenUlsterUniversityandRepknightLtdgratefullyacknowledgessupportfromInnovateUK’s KnowledgeTransferPartnership,projectNo.KTP009125. Competinginterests Theauthorsdeclarethattheyhavenocompetinginterests. AuthorinformationAffiliationsFacultyofComputingandEngineering,SchoolofComputingandMathematics,UlsterUniversity,Newtownabbey,BT370QB,UKAnnaJurek, MauriceD.Mulvenna & YaxinBiAuthorsAnnaJurekViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarMauriceD.MulvennaViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarYaxinBiViewauthorpublicationsYoucanalsosearchforthisauthorin PubMed GoogleScholarCorrespondingauthorCorrespondenceto AnnaJurek.Rightsandpermissions OpenAccessThisarticleisdistributedunderthetermsoftheCreativeCommonsAttribution4.0InternationalLicense(http://creativecommons.org/licenses/by/4.0/),whichpermitsunrestricteduse,distribution,andreproductioninanymedium,providedyougiveappropriatecredittotheoriginalauthor(s)andthesource,providealinktotheCreativeCommonslicense,andindicateifchangesweremade. ReprintsandPermissionsAboutthisarticleCitethisarticleJurek,A.,Mulvenna,M.D.&Bi,Y.Improvedlexicon-basedsentimentanalysisforsocialmediaanalytics. SecurInform4,9(2015).https://doi.org/10.1186/s13388-015-0024-xDownloadcitationReceived:13March2015Accepted:19November2015Published:09December2015DOI:https://doi.org/10.1186/s13388-015-0024-xSharethisarticleAnyoneyousharethefollowinglinkwithwillbeabletoreadthiscontent:GetshareablelinkSorry,ashareablelinkisnotcurrentlyavailableforthisarticle.Copytoclipboard ProvidedbytheSpringerNatureSharedItcontent-sharinginitiative KeywordsSentimentanalysisSocialmediaSecurity DownloadPDF DownloadePub AssociatedContent Collection JISIC2014SpecialIssue Advertisement



請為這篇文章評分?